1. News
  2. SOCİAL MEDİA
  3. Reddit Toughens Rules to Block AI Data Scraping

Reddit Toughens Rules to Block AI Data Scraping

featured
Share

Share This Post

or copy the link

On Tuesday, social media giant Reddit announced plans to revise their Web standard aimed at preventing automated data scraping from its platform. The decision comes in light of reports indicating that various artificial intelligence startups have been successfully evading these restrictions to harvest content.

This development occurs amid growing concerns regarding AI companies allegedly using content from publishers without permission, leading to the creation of AI-generated summaries that fail to credit the original sources.

Reddit stated it would be updating the Robots Exclusion Protocol, also known as “robots.txt,” a standard practice for delineating which sections of a website may be crawled.

In conjunction with this update, the company plans to enforce rate-limiting measures, which serve to restrict the volume of requests from individual entities. Additionally, Reddit will block unidentified bots and crawlers from collecting data on its site.

Recently, the robots.txt file has emerged as a crucial mechanism for publishers seeking to prevent tech firms from exploiting their materials for free in order to train AI algorithms and generate search query summaries.

A report from last week highlighted findings from content licensing startup TollBit, which indicated that several AI companies were bypassing the web standard to scrape content from publishing sites.

This follows a Wired investigation that revealed AI search startup Perplexity likely found ways to circumvent Reddit’s measures to restrict its web crawler using robots.txt.

Earlier in June, Forbes, a business media publisher, accused Perplexity of using its investigative work in generative AI systems without proper attribution.

Despite these measures, Reddit assured that organizations and researchers, including the Internet Archive, would still have access to its content for non-commercial usage.

© Thomson Reuters 2024


Affiliate links may be automatically generated – see our ethics statement for details.

Reddit Toughens Rules to Block AI Data Scraping
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!