Here's What You Should Know About Data Scraping

Chelsea Caltuna
February 17, 2020

Data scraping has become an integral part of doing business on the internet. However, data scraping can live in a legal and ethical gray area. If you’re thinking about scraping financial data for personal or business use, here’s what you need to know about what’s accepted, what’s discouraged, and what’s illegal. 

What is data scraping? 

Data scraping is the act of extracting information from the internet. This typically is done with specialized software that funnels data into a database, where it can be analyzed or repurposed for the scraper’s website or business operations.  

There are plenty of widely accepted uses for data scraping. Search engines like Google or Bing scrape data to index and rank web content, which brings traffic and visibility to the sites it crawls. Businesses can also legitimately scrape data for things like price comparisons, market research, weather data, real estate listings, and more. 

Some uses, however, are discouraged or even downright illegal. This includes activities like passing the data off as your own, purposely attempting to weaken a competitor by undercutting prices, or stealing copyrighted content. Data scraping can cause serious damage, including tanking SEO rankings and inflicting severe financial losses. 

Even if you’re not acting with malicious intent, you can run into legal and ethical issues. If you face any of these traps, think twice about scraping data from that source: 

Trap #1: The data you want to scrape isn’t publicly accessible. 

Let’s start here, because this is a big one. It can be illegal to scrape nonpublic data – that is, data that isn’t freely available to everyone on the internet. This might apply to data that requires a login or payment to view or is only for a company’s internal use. “Stealing” data or publishing content that wasn’t meant to be published can open you up to serious legal action from the affected business. 

Trap #2: The site you’re scraping has anti-scraping protections. 

Is the data you’re looking at fair game for scraping? Checking out the website’s security can give you a clue. While website owners can’t prevent every possible scraping attack, they can put safeguards, like captchas, in place to fight scrapers. When in doubt, contact the webmaster to ask for permission to crawl the site. Just because you can bypass a website’s security measures doesn’t mean there won’t be consequences. 

Trap #3: You’re running afoul of a site’s terms & conditions or copyrights. 

Many websites include clauses that prohibit automated data scraping. Those terms are legally binding, so if the business whose data you’re targeting considers you a threat, there’s nothing stopping them from pursuing legal action (sensing a pattern here?). Plus, while the data itself may not be copyrighted, the “creative arrangement” of it can be – for instance, the way it appears on a website page. Copying that with a web scraper can constitute a copyright violation. 

Trap #4: Your scraping methods harm a website. 

Web scrapers typically want to pull data as fast as possible, but this can cause major issues for the website being scraped. Since scrapers can use software to send more requests per second than a human could, they can overload the website’s servers and damage the site’s performance. Slowing down or even stopping a server can put you in violation of trespass to chattel laws. You might also unintentionally compromise a company’s website, servers, or databases, opening them up to more dangerous cyberattacks. 

Trap #5: You’re relying on scraping for long-term use. 

Scraping is not a reliable long-term solution for your data needs. The websites you use might choose to block you, or your scraper can break – an inevitable side effect of the internet constantly changing. Pulling reliable, up-to-date data through a scraper requires constant monitoring and maintenance, and the expertise to fix it if it stops being functional. 

What are the alternatives to scraping financial data? 

If you’re wary of the legal and ethical consequences of scraping data (and you most definitely should be, depending on your intended use), there are alternatives. Were you looking into scraping because financial data is too expensive? We get it. That’s why we offer market data and fundamental data in flexible formats at more competitive prices than traditional data vendors.

Plus, we offer first-class support for our products and access methods (including API), so you don’t have to spend hours on maintenance. Many of our feeds offer redistribution rights, allowing you to use the data on your own website or in your business operations without the threat of legal repercussions. 

Access quality market data today

Ready to find your perfect alternative to scraped data? Visit intrinio.com to explore our data packages.

Sorry, we no longer support Internet Explorer as a web browser.

Please download one of these alternatives and return for the full Intrinio Experience.

Google Chrome web browser icon
Chrome
Mozilla Firefox web browser icon
Firefox
Safari web browser icon
Safari
Microsoft Edge web browser icon
Microsoft Edge