Tumblr and WordPress plan to sell user data to OpenAI and Midjourney to train AI models: report


Tumblr And WordPress users may soon find their data is being used to train artificial intelligence (AI) models, according to a report. The blogging sites’ parent company, Automattic, reportedly entered into agreements with OpenAI and Midjourney to sell user-generated content that would be used to help train AI. While the details of the agreements and data sharing practices remain unclear at the moment, it raises a question about data privacy and the ethics of companies sharing their users’ data with third parties.

Internal communications from Automattic employees, seen by 404 Media, both confirmed the deal with the AI ​​companies and revealed details of these practices. In his report, the publication confirmed that Automattic’s deal with OpenAI and Midjourney could be announced soon. Additionally, it appears that the compilation of data for AI companies has already begun. Meanwhile, an internal message written by a product manager, Cyle Gage, suggested that all content from Tumblr’s public posts between 2014 and 2023 had been compiled.

The report also highlights a specific message suggesting that content from private and deleted users was also automatically compiled, alongside public data. It was unclear whether this dataset was already shared with the AI companies or not. Additionally, since such an accident puts all of its user base’s private information at risk, it also raises a question about the company’s ethical policy and data security infrastructure.

Automattic published a statement stating: “AI is rapidly transforming almost every aspect of our world, including how we create and consume content. At Automattic, we have always believed in a free and open web and individual choice. Like other technology companies, we are closely monitoring these advancements, including how to work with AI companies in a way that respects our users’ preferences.

The article details several things the company does for its users, including blocking AI platform crawlers, a setting to discourage search engines from indexing a site on WordPress and Tumblr, and insurance an unsubscribe setting for users who do not wish to share. data with the third party. “Currently, there is no law requiring web crawlers to follow these preferences,” the post states.

The mechanism for opting out of data sharing is also somewhat unclear. While the company said in its post that AI companies will honor opt-out settings and even remove past content from users who recently opted out, the report claims the reality is more complicated.

The report found an internal document from February 23 in which an employee asked whether the company had assurance that the data partner would respect the opt-out decision made by users. Andrew Spittle, Automattic’s head of AI, reportedly responded: “We will request that the content be removed and removed from all future training sessions.” I believe partners will honor this based on our conversations with them so far. I don’t think they gain much overall from keeping it.

The response was considered vague and did not confirm whether Automattic had an agreement on the matter, according to the report. Furthermore, it seems that the entire reasoning is based on the assumption that AI companies won’t gain much from retaining user data. It should be noted that the practice of sharing data with third parties is not new and most social media platforms own the rights to public user-generated content on the platform. However, entering into such agreements without revealing them to users could potentially expose private information to companies that use the same data to train AI systems.

Affiliate links may be automatically generated – check out our ethics statement for more details.

For more details on the latest launches and news from Samsung, Xiaomi, Realme, OnePlus, Oppo and other companies present at the Mobile World Congress in Barcelona, ​​visit our MWC 2024 Center.


Leave a Comment

Your email address will not be published. Required fields are marked *