The training of AI models is reliant on real-world data, however, this may no longer be possible according to xAI owner Elon Musk.

During a livestreamed conversation on X with Stagwell Chairman Mark Penn, the Tesla CEO explained that “we’ve now exhausted basically the cumulative sum of human knowledge”.

“You take the entire internet, all books ever written and all the interesting videos and you distill that down into essentially bits of information and we’ve now exhausted all of [it] in AI training,” Musk said.

As a result, Musk suggested that synthetic data – which is itself generated by AI models – will now need to be used for AI training.

“The new sort of thing is synthetic data,” he said. “The only supplement [real-life data] is with synthetic data where the AI writes an essay or comes up with a thesis and then it will grade itself and go through a process of self-learning.”

Already, other companies such as Microsoft, Meta and OpenAI are using synthetic data to train their flagship models. Microsoft’s Phi-4 and Google’s Gemma models were both trained on synthetic data alongside real-world data.

A recent report suggests that the demand for synthetic data is only going to continue to grow in the coming years as it predicts that synthetic datasets will register the fastest market growth rate in the next five years. 

By 2029, the report also expects the overall value of the market for AI training datasets to grow to $9.58bn from its 2024 valuation of $2.82bn.

Although the use of synthetic data does offer advantages such as cost savings, it also comes with its disadvantages.

Research has suggested that synthetic data can lead to model collapse where a model becomes “less creative” and more biased with its outputs, eventually compromising its functionality. 

This fear was shared by Musk who admitted that it is challenging to understand if AI has “hallucinated an answer or if it’s a real answer”.

Elon Musk: AI has ‘exhausted the cumulative sum of human knowledge’