Amazon Web Services is looking into whether Perplexity employs ‘web scraping’ to train its AI

Perplexity, a company that uses Amazon Web Services (AWS) to train their AI models, is under investigation for potential violations of the Robots Exclusion Protocol. The Robots Exclusion Protocol involves placing a robots.txt file on a domain to specify which pages robots and crawlers should not access.

Developer Robb Knight and Wired discovered that Perplexity was using web scraping techniques to collect content from web pages in violation of the protocol. Web scraping, or data scraping, involves collecting content from web pages using software that extracts HTML code to filter and store information automatically.

AWS has strict terms prohibiting customers from engaging in illegal activities and are responsible for complying with all applicable laws. Perplexity claims to adhere to robots.txt and states that their services do not violate AWS terms of service except in rare cases where their bot ignores robots.txt to retrieve specific URLs.

However, investigations by Wired suggest that the company’s chatbot may ignore robots.txt in some cases to collect unauthorized information, raising concerns about potential violations of AWS terms of service and the legality of Perplexity’s data collection methods.

By Samantha Johnson

As a content writer at newsnmio.com, I craft engaging and informative articles that aim to captivate readers and provide them with valuable insights. With a background in journalism and a passion for storytelling, I thoroughly enjoy delving into diverse topics, conducting research, and producing compelling content that resonates with our audience. From breaking news pieces to in-depth features, I strive to deliver content that is both accurate and engaging, constantly seeking to bring fresh perspectives to our readers. Collaborating with a talented team of editors and journalists, I am committed to maintaining the high standards of journalism upheld by our publication.

Leave a Reply