LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)

fedipact@cyberpunk.lol

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)

"The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal."

ARTICLE: https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower

FULL PDF: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf

#FediPact #meta #threads #AI

spla@mastodont.cat

@FediPact I did apply this nginx config to fight against it and many other IA bots and scrappers:

https://github.com/kurren/ai-bots-crawlers

returning 444 to them seems a good way to confuse them and decrease server load.

FòrumCAT

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)