OpenAI is looking to improve the real-world usefulness of its next-generation AI models by training them on data drawn from everyday tasks. The ChatGPT maker has partnered with training data company Handshake AI to collect data from third-party contractors based on the real work they did in their previous and current job roles, according to a report by Wired.

The data collection is part of OpenAIโ€™s efforts to compare the performance of its AI models against an established human baseline for various tasks. It comes at a time when several AI companies including Anthropic and Google are enlisting large teams of contractors to generate high-quality training data that can be used to develop AI models and AI agents capable of automating enterprise work. Several tech industry leaders have warned of a white-collar โ€˜bloodbathโ€™ due to the impact of AI on low-level tasks and entry-level roles, even as tech companies such as OpenAI continue to pursue artificial general intelligence (AGI) โ€“ a hypothetical AI system that outperforms humans at most economically valuable tasks.

What are OpenAIโ€™s contractors tasked with? OpenAI has directed contractors to upload data on real-world tasks with two components: the request from a personโ€™s manager or colleague asking them to do a task (task request) and the work produced in response to that request (task deliverable). In an internal presentation, OpenAI reportedly asked contractors to upload examples of real, on-the-job work that they have completed in the past or present, such as โ€œa concrete output (not a summary of the file, but the actual file), e. g.

, Word doc, PDF, Powerpoint, Excel, image, repo. โ€ The Microsoft-backed AI startup has also instructed contractors to delete proprietary and personally identifiable information before uploading the training data using a specialised โ€˜ChatGPT Superstar Scrubbingโ€™ tool. Story continues below this ad โ€œWeโ€™ve hired folks across occupations to help collect real-world tasks modeled off those youโ€™ve done in your full-time jobs, so we can measure how well AI models perform on those tasks.

Take existing pieces of long-term or complex work (hours or days+) that youโ€™ve done in your occupation and turn each into a task,โ€ OpenAI was quoted as saying in an internal document seen by Wired. โ€œRemove or anonymise any: personal information, proprietary or confidential data, material nonpublic information (e.

g. , internal strategy, unreleased product details),โ€ it added. The generative AI boom has created a lucrative sub-industry comprising third-party contracting firms such as Handshake AI, Surge, Mercor, and Scale AI that hire and manage networks of data contractors to generate higher-quality training data in order to improve AI models.