Press "Enter" to skip to content

Howard University and Google Launch Dataset for Inclusive Speech Tech

Serious thinking woman standing with phone in hands near window inside office. Businesswoman reading online news and browsing internet pages at workplace, using app
Source: Liubomyr Vorona / Getty

In a pioneering endeavor to enhance speech technology, Howard University has collaborated with Google Research to create a dataset that aims to improve automatic speech recognition (ASR) for Black users. This initiative, known as Project Elevate Black Voices, involves capturing the diverse speech patterns and dialects prevalent in Black communities, which are often overlooked by existing AI systems.

This project puts a spotlight on African American English (AAE), a linguistically rich form of communication. Due to biases in AI tool development, Black users frequently face misinterpretation by voice technologies, leading many to modify their speech to be understood—a practice known as code switching.

Researchers are committed to addressing this gap. “African American English has been at the forefront of United States culture since almost the beginning of the country,” stated Gloria Washington, Ph.D., a Howard University researcher and co-principal investigator of Project Elevate Black Voices. “Voice assistant technology should understand different dialects of all African American English to truly serve not just African Americans, but other persons who speak these unique dialects.”

The dataset was constructed by collecting 600 hours of speech from various AAE dialects across 32 states. This effort seeks to dismantle the barriers that limit ASR effectiveness for Black users. The research revealed an underrepresentation of AAE in current datasets, not due to its absence but because users often alter their speech to be understood by technology.

Despite hurdles, such as privacy policies that restrict data collection, progress is underway. Dialect classifiers are being used to identify AAE within broader datasets, a critical step toward more inclusive technology. Howard University retains the ownership and licensing rights to this dataset to ensure its ethical use, while Google aims to leverage it to enhance its ASR products, promoting equity across languages and accents.

SEE MORE:

What Are Racial Microaggressions?

Black Culture, White Face: How the Internet Helped Hijack Our Culture