This German nonprofit is developing an open voice assistant that anyone can use


There have been many attempts at open source AI-powered voice assistants (see Rhasspy, Mycroft, and Jasper, to name a few) – all established with the goal of creating privacy-preserving offline experiences. confidentiality and not compromising functionality. But development proved extraordinarily slow. Indeed, in addition to all the usual challenges associated with open source projects, programming a wizard is hard. Technologies like Google Assistant, Siri and Alexa have years, even decades, of R&D behind them – and a huge infrastructure to boot.

But that doesn’t deter the folks at the Large-scale Artificial Intelligence Open Network (LAION), the German nonprofit responsible for maintaining some of the world’s most popular AI training datasets. This month, LAION announced a new initiative, BUD-E, which aims to create a “fully open” voice assistant capable of running on consumer hardware.

Why launch a brand new voice assistant project when there are countless others in various states of abandonment? Wieland Brendel, a researcher at the Ellis Institute and contributor to BUD-E, believes that there is no open assistant with a sufficiently extensible architecture to take full advantage of emerging GenAI technologies, particularly large models of language (LLM) of the type of OpenAI ChatGPT.

“Most interactions with [assistants] rely on chat interfaces that are quite cumbersome to use, [and] dialogues with these systems seem stilted and unnatural,” Brendel told TechCrunch in an email interview. “These systems allow you to transmit commands to control your music or turn on the lights, but they don’t provide a basis for long, engaging conversations. The goal of BUD-E is to provide the basis for a voice assistant that sounds much more natural to humans and that mimics the natural speech patterns of human dialogues and remembers past conversations.

Brendel added that LAION also wants to ensure that every component of BUD-E can eventually be integrated into unlicensed applications and services, even commercially – which is not necessarily the case for other BUD-E efforts. open assistants.

BUD-E – recursive shorthand for “Buddy for Understanding and Digital Empathy” – has an ambitious roadmap. In a blog postThe LAION team lays out what they hope to accomplish over the coming months, primarily by integrating “emotional intelligence” into BUD-E and ensuring it can handle conversations involving multiple speakers at once.

“There is a great need for a natural voice assistant that works well,” Brendel said. “LAION has shown in the past to be good at building communities, and the ELLIS Institute in Tübingen and the Tübingen AI Center are committed to providing the resources necessary for the development of the assistant.

BUD-E is operational – you can download and install it today from GitHub on an Ubuntu or Windows PC (macOS is coming) – but it’s still in its infancy.

LAION assembled several open models to assemble an MVP, including Microsoft’s Phi-2 LLM, Columbia’s StyleTTS2 for speech synthesis, and Nvidia’s FastConformer for speech synthesis. As such, the experience is not optimized. For BUD-E to respond to commands in around 500 milliseconds – in the range of commercial voice assistants such as Google Assistant and Alexa – requires a powerful GPU like Nvidia’s. RTX4090.

Collabora works pro bono to adapt its open source speech recognition and text-to-speech models, WhisperLive and WhisperSpeech, for BUD-E.

“Building the speech synthesis and speech recognition solutions ourselves means we can customize them to a degree that is not possible with closed models exposed via APIs,” Jakub Piotr Cłapa, AI researcher at Collabora and member from the BUD-E team, said in an email. “Collabora first started working on [open assistants] partly because we had difficulty finding a good text-to-speech solution for an LLM-based voice agent for one of our clients. We decided to join forces with the broader open source community to make our models more widely accessible and useful.

Short term, LAION claims this will help make BUD-E’s hardware requirements less onerous and reduce assistant latency. A longer-term undertaking is to create a dataset of dialogues to refine BUD-E, as well as a memory mechanism for BUD-E to store information from previous conversations and a speech processing pipeline for follow several people speaking. immediately.

I asked the team if accessibility was a priority, given that voice recognition systems have historically not worked well with languages ​​other than English and accents that are not transatlantic. A Stanford study found that voice recognition systems from Amazon, IBM, Google, Microsoft, and Apple were almost twice as likely to mishear black speakers as white speakers of the same age and gender.

Brendel said that LAION does not ignore accessibility – but that it is not an “immediate priority” for BUD-E.

“The first priority is to really redefine the experience of how we interact with voice assistants before generalizing that experience to more diverse accents and languages,” Brendel said.

To this end, LAION has some pretty original ideas for BUD-E, ranging from an animated avatar to personify the assistant to supporting the analysis of users’ faces via webcams to take into account their emotional state.

The ethics of that last bit – facial analysis – are a bit dicey, to say the least. But Robert Kaczmarczyk, co-founder of LAION, stressed that LAION would remain committed to security.

“[We] strictly adhere to the security and ethics guidelines formulated by the European AI law,” he told TechCrunch via email – referring to the legal framework governing the sale and use of AI in the EU. The EU AI law allows European Union member countries to adopt more restrictive rules and safeguards for “high-risk” AI, including emotion classifiers.

This commitment to transparency not only facilitates the early identification and correction of potential biases, but also contributes to the cause of scientific integrity,” Kaczmarczyk added. “By making our datasets accessible, we enable the broader scientific community to engage in research that meets the highest standards of reproducibility.”

Previous work by LAION was not impeccable in the ethical sense, and he is currently pursuing a separate, somewhat controversial project on emotion detection. But maybe BUD-E will be different; we will have to wait and see.


Leave a Comment

Your email address will not be published. Required fields are marked *