Opinion by: Rowan Stone, CEO at SapienAI is a paper tiger without human expertise in data management and training practices. Despite massive growth pr
Opinion by: Rowan Stone, CEO at Sapien
AI is a paper tiger without human expertise in data management and training practices. Despite massive growth projections, AI innovations won’t be relevant if they continue training models based on poor-quality data.
Besides improving data standards, AI models need human intervention for contextual understanding and critical thinking to ensure ethical AI development and correct output generation.
AI has a “bad data” problem
Humans have nuanced awareness. They draw on their experiences to make inferences and logical decisions. AI models are, however, only as good as their training data.
An AI model’s accuracy doesn’t entirely depend on the underlying algorithms’ technical sophistication or the amount of data processed. Instead, accurate AI performance depends on trustworthy, high-quality data during training and analytical performance tests.
Bad data has multifold ramifications for training AI models: It generates prejudiced output and hallucinations from faulty logic, leading to lost time in retraining AI models to unlearn bad habits, thereby increasing company costs.
Biased and statistically underrepresented data disproportionately amplifies flaws and skewed outcomes in AI systems, especially in healthcare and security surveillance.
For example, an Innocence Project report lists multiple cases of misidentification, with a former Detroit police chief admitting that relying solely on AI-based facial recognition would lead to 96% misidentifications. Moreover, according to a Harvard Medical School report, an AI model used across US health systems prioritized healthier white patients over sicker black patients.
AI models follow the “Garbage In, Garbage Out” (GIGO) concept, as flawed and biased data inputs, or “garbage,” generate poor-quality outputs. Bad input data creates operational inefficiencies as project teams face delays and higher costs in cleaning data sets before resuming model training.
Beyond their operational effect, AI models trained on low-quality data erode the trust and confidence of companies in deploying them, causing irreparable reputational damage. According to a research paper, hallucination rates for GPT-3.5 were at 39.6%, stressing the need for additional validation by researchers.
Such reputational damages have far-reaching consequences because it becomes difficult to get investments and affects the model’s market positioning. In a CIO Network Summit, 21% of America’s top IT leaders expressed a lack of reliability as the most pressing concern for not using AI.
Poor data for training AI models devalues projects and causes enormous economic losses to companies. On average, incomplete and low-quality AI training data results in misinformed decision-making that costs companies 6% of their annual revenue.
Recent: Cheaper, faster, riskier — The rise of DeepSeek and its security concerns
Poor-quality training data affects AI innovation and model training, so searching for alternative solutions is essential.
The bad data problem has forced AI companies to redirect scientists toward preparing data. Almost 67% of data scientists spend their time preparing correct data sets to prevent misinformation delivery from AI models.
AI/ML models may struggle to keep up with relevant output unless specialists — real humans with proper credentials — work to refine them. This demonstrates the need for human experts to guide AI’s development by ensuring high-quality curated data for training AI models.
Human frontier data is key
Elon Musk recently said, “The cumulative sum of human knowledge has been exhausted in AI training.” Nothing could be farther from the truth since human frontier data is the key to driving stronger, more reliable and unbiased AI models.
Musk’s dismissal of human knowledge is a call to use artificially produced synthetic data for fine-tuning AI model training. Unlike humans, however, synthetic data lacks real-world experiences and has historically failed to make ethical judgments.
Human expertise ensures meticulous data review and validation to maintain an AI model’s consistency, accuracy and reliability. Humans evaluate, assess and interpret a model’s output to identify biases or mistakes and ensure they align with societal values and ethical standards.
Moreover, human intelligence offers unique perspectives during data preparation by bringing contextual reference, common sense and logical reasoning to data interpretation. This helps to resolve ambiguous results, understand nuances, and solve problems for high-complexity AI model training.
The symbiotic relationship between artificial and human intelligence is crucial to harnessing AI’s potential as a transformative technology without causing societal harm. A collaborative approach between man and machine helps unlock human intuition and creativity to build new AI algorithms and architectures for the public good.
Decentralized networks could be the missing piece to…
cointelegraph.com