The Not-Quite-So Legal Future for Web Scraping

Jonathan BaileyApril 20, 2022

4 minutes read

Web scraping has been one of the longest-running themes on this site, with the first articles about it going live as far back as 2006.

The truth is that web scraping has always been controversial, but equally fraught legally. A wide myriad of laws impact, or at least can impact, scraping activities online.

However, a recent ruling from the Ninth Circuit Court of Appeals aims to clarify a key aspect of web scraping. Namely, it says that scraping content from a publicly accessible website is not a violation of the Computer Fraud and Abuse Act (CFAA), even if such activity is barred by the site’s terms of service.

The move follows a recent Supreme Court decision, which dealt with a police officer who used his access to official systems to access information for an unauthorized purpose. That decision found that, even though the use was authorized, individual had permission to use the system and, as such, it was not a violation of the CFAA.

However, despite the broad headlines proclaiming that web scraping is now legal, it’s far from it. Not only was the decision itself extremely narrow, but it only looks at one of the myriad of laws that can impact web scraping activities.

In short, not much has really changed with this ruling, but a potential future of web scraping may be coming into sharper focus nonetheless.

The Divisive History of Web Scraping

Web scraping has been a divisive topic both ethically and legally in large part because of variety of activities it covers.

For much of the web’s recent history, we scraping commonly referred a technique through which spammers would copy content from a website and republish it, either rewritten or verbatim. This was done largely using RSS feeds, which was known as RSS scraping, but could also be done from the site itself.

This form of web scraping began to fall out of favor in 2011 following a series of Google search updates that de-prioritized scraped websites. However, in more recent years, the approach has been on the rise again as some spammers have started finding increased success with it.

But, ultimately, that is just one form of web scraping. There are a myriad of reasons why people and organizations choose to scrape content. The Internet Archive can be classified as a web scraper for the purpose caching downed or altered pages. Likewise, shopping apps scrape prices and product descriptions to help users find the cheapest alternatives.

To be clear, all of these are controversial as they spend website resources (often without permission) and can be used for a competitive advantage. However, they aren’t as broadly hated as the spammers and scrapers that popularized the term some 15-20 years ago.

This is something that’s pointed out in the LinkedIn case, which is what the Ninth Circuit ruled on.

The LinkedIn Case

This case pits the social networking site LinkedIn is targeting the data company HiQ. HiQ participated in a large effort to scrape content from LinkedIn but not to republish it, but to analyze company attrition.

LinkedIn took several steps to try and block the scraping but HiQ circumvented them. This prompted LinkedIn to file a lawsuit against HiQ but HiQ sought a preliminary injunction barring LinkedIn from interfering with its efforts.

That preliminary injunction is what is before the Ninth Circuit and this is actually the second time the court has looked at this exact issue. In 2019, the same court found largely the same thing, but chose to revisit the issue following the Supreme Court ruling last year.

In short, the ruling only deals with a preliminary injunction, it’s not a decision on the case itself, and it only looks at the CFAA. In fact, the ruling leaves open a variety of approached LinkedIn could take, including copyright infringement, in particular violations of the Digital Millennium Copyright Act (DMCA).

LinkedIn, for whatever reason, has opted not to push those particular arguments up to this point.

What this means is that it is not a final say on whether HiQ’s actions were legal, but it sets the stage for an already long-running case to continue much further.

Where Does This Leave Us

The CFAA is far from the only law that governs web scraping. Copyright and breach of contract are just two other areas to consider.

That said, there’s not much doubt that last year’s ruling on the CFAA is going to severely limit how it is used in web scraping cases. However, the Ninth Circuit’s ruling doesn’t really change that. In fact, it doesn’t really change much from it’s 2019 ruling either.

Last year, the 11th Circuit took a look at the idea that there was an implied license for RSS scraping and found that there was none. As such, scraped content, even publicly available scraped content, still enjoys its full copyright protection.

This means that, the legality of web scraping depends heavily on what content is scraped, how it is scraped and how that content is used after the fact. This case is not a broad license for others to scrape publicly available content for whatever purpose they deem fit.

That is true no matter what the headlines say.

Bottom Line

To be clear, is important within a certain context. It does indicate that the Supreme Court ruling represents a severe limitation of the CFAA when applied to public content that is scraped. However, this is not the only way to scrape content, not the only type of content to be scraped and certainly not the only use of such content.

It’s still very possible that the courts may rule HiQ’s actions were on the wrong side of the law. It just won’t likely be the CFAA that is used to find that. With so many laws governing the scraping of content and the use of scraped content, LinkedIn still has plenty of options.

So while the ruling is definitely worth noting, it’s also not worth making too large of a deal about it. It follows an earlier Supreme Court ruling and is nearly identical to a 2019 ruling from the same court. Little has changed in the last three years.

While this does hand scrapers at least a small victory, it is far from the green light that some are portraying it as. Web scraping is still very much a legal minefield, the same as it was 16 years ago.