Perplexity, a company that uses Amazon Web Services (AWS) to train their AI models, is under investigation for potential violations of the Robots Exclusion Protocol. The Robots Exclusion Protocol involves placing a robots.txt file on a domain to specify which pages robots and crawlers should not access.
Developer Robb Knight and Wired discovered that Perplexity was using web scraping techniques to collect content from web pages in violation of the protocol. Web scraping, or data scraping, involves collecting content from web pages using software that extracts HTML code to filter and store information automatically.
AWS has strict terms prohibiting customers from engaging in illegal activities and are responsible for complying with all applicable laws. Perplexity claims to adhere to robots.txt and states that their services do not violate AWS terms of service except in rare cases where their bot ignores robots.txt to retrieve specific URLs.
However, investigations by Wired suggest that the company’s chatbot may ignore robots.txt in some cases to collect unauthorized information, raising concerns about potential violations of AWS terms of service and the legality of Perplexity’s data collection methods.