Are Louisiana Tech Startups Vulnerable to Scraping?

Image by flickr user Noah Sussman.

Image by flickr user Noah Sussman.

The new tech startups in Louisiana often forget one of the most basic necessities of keeping their website safe from the hands of content scrapers and often end up losing their hard earned industry database. The content duplicacy is a big industry and is estimated to be around $1 billion. The narrow mindset of some of the people may hamper the growth of tech startups in Louisiana if proper steps are not taken to safeguard the information and database from the hands of scrapers.

The World of Scraping

Getting a large sampling of information from the internet can be a long process. However, web scraping software dramatically reduces data collection. It has been said that ‘information is money.’ Apparently that saying must still hold some meaning, as websites and internet companies are facing an ongoing battle of protection of their online information. Even those sites that are protected by firewalls and other anti-theft devices are still sometimes susceptible to having their data scraped.

Web Scraping Defined

Over the past decades, numerous new programs have arrived on the market that aid in the collection of data from websites. It used to be the job of a team of individuals who spent day after day manually extracting information from websites. But now, new programs are able to gather that same information in a matter of hours or even minutes. The process is called web scraping, and it is a very highly debated topic in today’ cyber community.

An example of web scraping is when a company has an online shopping site that has become targeted by a web scraping program. The software can enter the shopping web site and copy all of the item names, descriptions, prices, and shipping details. This information can then be used by a competitor to create an online catalogue with identical products but at slightly reduced prices. This puts the first company at a distinct disadvantage because they now have to examine all of their web content to adjust for the new competition. In essence, they need to web scrape the new competitor’s site to determine how much of a price reduction they need to make in order to stay ahead of their new competition.

Legal or Illegal?

At the heart of the controversy surrounding information retrieval through web scraping is whether or not the activity is legal. The courts have been back and forth on the issue. Some argue that the information is the private property of the company who runs the website, while others say that if it is available on the net, it’s free for the taking. Yet others say that it’s okay to take the information, but it’s what you do with it afterward that makes it legal or illegal. At some point, the courts are going to need to come up with a definitive response to web scraping. Until then, many companies feel a need to check for web scraping activity on their internet sites so that can determine whether or not their information has been compromised by a web scraping program.

Web Site Protection

Companies that want to protect their data from web collection ‘bots’ do have some alternatives which may help them. Web scrapers may have some sophisticated qualities about them, but their ability to solve certain problems is usually outside of their programming. Web designers can add certain codes to their web pages that will stop or confound the web scraping programs. So if you want to prevent your website content from being stolen, the best way is to use anti scraping services like ScrapeSentry, other than that, here are some possible solutions to stopping web scraping software.

  • User registration. By requiring all users of your site to become registered members, you are limiting the ease with which bots can access your information. It is possible for individuals to get access prior to launching web scraping software, but it does at least slow the process.
  • Captcha codes. If the information on your web site is protected behind a captcha code device, bots and various types of web scraping software are prevented from entering the system. Captcha codes can be placed in several locations throughout the website which will stall their efforts to collect data.
  • Javascript. By including Javascript in your programming, you can effectively block most web scrapers. Web scraper software that requires the ability to read Javascript throughout the website is very difficult to write. Including even a few Javascript actions at the beginning of your web site can cripple web scraping software.
  • Monitor download rates. When individuals visit websites, they are only able to download a certain amount of information at a time. Web scrapers, however, can gather a vast array of data and download it very quickly. By installing monitoring systems that measure download rates, you can identify users who are employing web scrapers and block them from your site.

Web scraping is one of those activities that will probably become completely illegal in the near future. That won’t stop some people from creating and employing web scraping programs. Organizations in off-shore locations operate outside the law and can be employed to run web scraping services anywhere in the world. Web site owners will still need to install blocking software and monitor visitors. As new web scraping software is developed, new anti-scraping software will need to follow right behind. It is important that you stay abreast of these issues to protect you business and the information contained on your web site.