Researchers at Edinburgh University found that prominent multimodal large language models struggle to tell time from images of analog clocks. The models also failed to accurately parse calendar images or identify times on clocks with Roman numerals. The failure is attributed to the models being trained on digital-first data sources like Reddit.


