Google has expanded its WAXAL speech dataset to include Luo, Kikuyu and Luganda, a move aimed at improving artificial intelligence systems’ ability to understand African languages.
The addition is expected to accelerate the development of voice-enabled AI tools for millions of speakers who have historically been excluded from speech-based technologies due to limited language support.
According to Google, WAXAL is designed to help developers and researchers build AI systems that can accurately recognise and process African languages, addressing a major digital inclusion gap across the continent.
“With these additions, we want to improve access to AI-powered tools in East Africa, including voice assistants, speech-to-text services, educational platforms and digital public services,” Google said.
The company revealed that the dataset contains 1,250 hours of transcribed natural speech and more than 20 hours of high-quality studio recordings, collected over a three-year period. This data is expected to provide a strong foundation for building reliable and scalable language technologies.
Aisha Walcott-Bryant, Head of Google Research Africa, said the initiative is about empowering African communities through technology.
“The ultimate impact of WAXAL is the empowerment of people in Africa,” she said.
“This dataset provides the critical foundation for students, researchers and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people.”
Google noted that in communities with limited English proficiency, access to technology in local languages can significantly improve outcomes in education, agriculture and healthcare, by making information more accessible and relevant.
The WAXAL dataset already includes Swahili, which is widely spoken across Kenya and East Africa, and the addition of Luo, Kikuyu and Luganda further strengthens the region’s representation in global AI development.
