LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)
-
LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)
"The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal."
ARTICLE: https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower
FULL PDF: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf
-
LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)
"The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal."
ARTICLE: https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower
FULL PDF: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf
@FediPact I did apply this nginx config to fight against it and many other IA bots and scrappers:
https://github.com/kurren/ai-bots-crawlers
returning 444 to them seems a good way to confuse them and decrease server load.