JetStyle: Our opinion on the possibilities and limitations of Alice by Yandex
In October 2017, Russian search giant Yandex (often referred to as the Google of Russia) introduced Alice – an Amazon’s Alexa-like intelligent personal voice assistant for Android, iOS and Windows operating systems that speaks Russian. Half a year later, the company launched Yandex.Station – Russia’s first smart speaker powered by Alice. The device uses Yandex's in-house voice assistant to handle over 4000 various skills like ordering pizza, finding plane tickets or checking traffic.
We spoke with our team lead Vitaly Semyachkin about the limitations and possibilities of Alice that developers are facing now and about the challenges faced by our team when we began to explore this topic.
Here is what we found out. From a developer's perspective, Alice and Yandex.Station are a chatbot, which is able to communicate by voice rather than text. But even compared to the chatbots for Telegram, this is still a more limited story.
Why is that?
- In some chatbots, we are able to make payments, but in Alice – we can't do that yet. It’s all about the environment where the bot exists, of course, but nonetheless.
- In the chatbot, we can identify the user, and in Alice, we only have the device ID — and we, as developers, don’t know that this is that exact John Smith currently talking to Alice with this exact Yandex account. Or maybe it’s the son of John Smith? Or one of John Smith’s guests?
- We can’t initiate events on our own until the user explicitly designates a skill call – we can’t use the opportunity to listen to and to say something, to react to some event in the background. This significantly limits the construction of smart home systems, because everything works on triggers. And when the trigger fires, it would be great to be able to respond. But unfortunately, it’s impossible with Alice, while a chatbot in Telegram, for example, can easily do this.
At the same time, there are undoubted advantages:
- Alice has a voice;
- You can process requests in natural language from the box.
In fact, skills for Alice now are a conversational experience. We are building a dialogue. The user tells us something, this information reaches the developer in a textual form, the developer draws some conclusions in the course of processing of the information and can respond to the user only with a voice.
As the way to respond to the user, the developer has three tools: a voice, an image or a group of images, or a button link. But we don’t have access to the management of the speaker itself. We can’t change the volume, turn the speaker on or off by voice through the developed skill. We can’t even set the alarm. Yes, Alice itself has this opportunity, but the developers simply don’t have access to it. For example, we can’t make a skill like Alexa has: “Wake me up with something nice from Black Sabbath” or “Wake me up at 5 am with BBC 1 Radio”
You should also understand that Alice in Yandex.Browser, in the app and in Yandex.Station has exactly the same API for developers, the same voice interface, just different entry points.
So, what is available for developers now:
- API documentation – it’s not too big, clear and simple (even compared to Alexa);
- We can only interact with the user in the "question-answer" format,
- Creating skills for Alice now is designing and programming a dialog script like in RPG games: there are cues and skill trees; that's why there are plenty of skills with games like tic-tac-toe, poker, etc;
- In fact, this is a voice interface for the bot.
As for our experience, as soon as Yandex started accepting applications for certification of web studios and agencies to become partners of Yandex.Dialogues, we immediately got in.
And we started considering using voice assistants even earlier. For instance, based on Alexa from Amazon, we have set up a smart meeting room in our office. By a voice request, it can:
- turn on / off the camera,
- take screenshots from the screen, save them to Google Drive,
- display images on the screen through the projector,
- make a copy of the screen: for example, you can write something on the screen, make a copy of it, display it, the text then can be erased, and the information will still be there and you can keep adding to it,
- display a mockup of an empty iPhone or an empty browser, different grids and graphics (this is super useful for clients and working groups meetings).
All of this can now be done on the basis of Alice too, no problem at all. And it will even answer in Russian. Awesome!
Apart from that, before Yandex started accepting applications for certification to become partners of Yandex.Dialogues, we did some brainstorming on the potentially useful skills for Alice. Now we are picking the ones that could be done without losing value with the capabilities of the current open API. For example, we can use Alice for people with disabilities, or when providing first aid and calling a medical chatbot using a voice assistant.
In any case, we can say with confidence that Alice from Yandex will be developing and improving.