Why AI sucks at freelance work and real-life tasks: AI Eye

November 18, 2025

38

Mass unemployment from AI temporarily suspended

AI agents can’t complete 97% of tasks on Upwork to even a basic standard.

Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer.

The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks.

While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment.

So they won’t be causing mass unemployment for a while yet.

This backs up research from August at MIT, which found that 95% of organizations had zero return on the collective $30 billion they’d invested in AI.

*Ironically the model in this photo portraying a freelancer will probably be replaced by AI AI-generated one in future (Upwork)*

Why humans still have the edge over AI

AIs are good at pattern matching and predicting words. But they’re currently pretty bad at building internal models of the world, according to WorldTest from MIT and Basis Research.

For example, humans have an internal model of their own kitchen in their minds, which allows them to determine where the knives are, how long it will take for the pot to boil, and to plan a sequence of actions resulting in a meal. But the testing showed that three frontier reasoning AI models suck at it.

*Man vs machine (Benchmarking World Model Learning)*

The researchers created 129 tasks across 43 interactive worlds (spot the difference, physics puzzles, etc). The tasks required the AIs to predict hidden aspects of the world, plan sequences of actions to achieve a goal, and determine when the rules of the environment changed. Then they tested 517 humans on the same problems.

The researchers concluded:

“Our analysis reveals that humans achieve near-optimal scores while existing models frequently fail.”

Humans perform better on these sorts of tasks because we intuitively understand environments, revise beliefs in the face of new evidence, run experiments, start from scratch and explore strategically.

And adding more compute doesn’t always work, helping in only 25 out of 43 environments.

AI gets the news wrong 45% of the time

Research from the BBC and European Broadcasting Union found that ChatGPT, Copilot, Gemini and Perplexity also suck at reporting the news, failing against key criteria, including accuracy, sourcing, distinguishing opinion from fact, and providing context.

— 45% of AI answers had at least one significant issue

— 31% were sourced incorrectly

— 20% were just wrong with hallucinated details and outdated info

— Gemini was by far the worst, with significant issues in 76% of its responses.

AI cover letters get the wrong people hired

The primary purpose of a cover letter is to distinguish between low-effort applications. Anyone spending a day writing a good cover letter in response to a job ad that shows knowledge of the company is likely to be diligent and motivated.

Unfortunately, new research on Freelancer.com suggests that AI-generated cover letters have completely compromised this signal, resulting in employers hiring fewer people, and often the wrong ones.

Compared to the days before AI, skilled workers in the top quintile for abilities are being hired 19% less often, and dumb bums in the lowest quintile are being hired 14% more often.

*AI cover letters are bad (Paul Novasad)*

New robot looks like a lady

Chinese EV manufacturer XPeng has unveiled theXPeng Iron female robot, which bears a striking resemblance to a human. It has a similar spine movement to humans, and skin stretched over soft 3D lattice structures mimics the human body.

It’s due to go into production early next year, but the company says it requires too much compute for use in the home, so it’ll likely be used for commercial applications first, like introducing cars to customers at Xpeng stores.

The first catwalk from @XPengMotors XPENG IRON female robot appears to cross the uncanny valley in its walking!
Mass production is slated to begin end of 2026, with initial deployments in commercial locations. China seems to be in the lead in robotics advances & manufacturing. pic.twitter.com/CiSsFMkW7Y

— Derya Unutmaz, MD (@DeryaTR_) November 5, 2025

80% of ransomware attacks are made up

A new paper from MIT Sloan researchers and Safe Security makes the terrifying claim that 80% of ransomware attacks are AI-driven.

Rethinking the Cybersecurity Arms Race examined 2,800 ransomware attacks and concluded that “adversarial AI is now automating entire attack sequences,”…

cointelegraph.com

Why AI sucks at freelance work and real-life tasks: AI Eye

Mass unemployment from AI temporarily suspended

Why humans still have the edge over AI

AI gets the news wrong 45% of the time

AI cover letters get the wrong people hired

New robot looks like a lady

80% of ransomware attacks are made up

Bitmine Generated $46M from Ethereum Staking Last Quarter

Over 95% of Coinbase’s Code is Written with AI: Rob Witoff

Three US Senators Oppose CLARITY Act on Ethics Grounds with Vote Expected Soon

Most Popular

Bitmine Generated $46M from Ethereum Staking Last Quarter

[Tokyo Foreign Exchange] The dollar weakened to the lower part of the 162 yen range in early trading on the 15th, pressured by weaker-than-expected...

Lindsey Graham’s Death Leaves Fate of Russia Sanctions Bill Uncertain

Frontier Airlines to debut Wi-Fi in 2027 with SpaceX’s Starlink

Recent Comments

EDITOR PICKS

Bitmine Generated $46M from Ethereum Staking Last Quarter

[Tokyo Foreign Exchange] The dollar weakened to the lower part of the 162 yen range in early trading on the 15th, pressured by weaker-than-expected...

Lindsey Graham’s Death Leaves Fate of Russia Sanctions Bill Uncertain

POPULAR POSTS

Bitmine Generated $46M from Ethereum Staking Last Quarter

[Tokyo Foreign Exchange] The dollar weakened to the lower part of the 162 yen range in early trading on the 15th, pressured by weaker-than-expected...

Lindsey Graham’s Death Leaves Fate of Russia Sanctions Bill Uncertain

POPULAR CATEGORY

ABOUT US

FOLLOW US