Research shows AI agents fail most remote tasks, with top performer automating just 2.5% of freelance work.
The study, called the Remote Labor Index (RLI), represents one of the most detailed attempts so far to measure AI’s performance on practical digital work.
It focuses on tasks that mirror real online freelancing jobs rather than theoretical tests or benchmark problems.
Six advanced AI agents were then tested on the same projects, including Manus, Grok 4, Sonnet 4.5, GPT-5, ChatGPT agent, and Gemini 2.5 Pro.
Author summary: AI fails most remote tasks with top performer automating just 2.5%.