Topic: Synthetic Data

2 chapters across the catalog

Death Buses
Episode 1797 1:26:13 - 1:29:18

1797: Death Buses

AI Data Scarcity, Synthetic Data and Snap AI

Major AI companies like OpenAI and Anthropic are reportedly struggling to find new high-quality data to ingest, leading to a reliance on "synthetic data" and partnerships for non-public datasets. Meanwhile, the FTC is investigating Snap's "My AI" chatbot for potential risks to young users, highlighting the growing legal and regulatory pressure on the sector.

Seismic Sundae
Episode 1680 1:55:43 - 1:59:39

1680: Seismic Sundae

AI Model Collapse and Synthetic Data

A study published in Nature magazine reveals that AI models fed on AI-generated data quickly "collapse" into nonsense. Researchers found that by the ninth generation of training on synthetic text, the output becomes incoherent. This "model collapse" theory suggests that the cost of training models will increase while quality decreases if they cannot access original human-generated content.