Speech recognition, nodeJS

I’m currently working on a tool allowing me to read all my notifications thanks to the connection to different APIs.

It’s working great, but now I would like to put some vocal commands to do some actions.

Like when the software is saying “One mail from Bob”, I would like to say “Read it”, or “Archive it”.

My software is running through a node server, currently I don’t have any browser implementation, but it can be a plan.

What is the best way in node JS to enable speech to text?

I’ve seen a lot of threads on it, but mainly it’s using the browser and if possible, I would like to avoid that at the beginning. Is it possible?

Another issue is some software requires the input of a wav file. I don’t have any file, I just want my software to be always listening to what I say to react when I say a command.

Do you have any information on how I could do that?



Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Both of the answers here already are good, but what I think you’re looking for is Sonus. It takes care of audio encoding and streaming for you. It’s always listening offline for a customizable hotword (like Siri or Alexa). You can also trigger listening programmatically. In combination with a module like say, you could enable your example by doing something like:

say.speak('One mail from Bob', function(err) {
  Sonus.trigger(sonus, 1) //start listening

You can also use different hotwords to handle the subsequent recognized speech in a different way. For instance:
Notifications. Most recent.” and “Send message. How are you today”

Throw that onto a Pi or a CHIP with a microphone on your desk and you have a personal assistant that reads your notifications and reacts to commands.

Simple Example:

Something a bit more complex:

Full documentation:

Disclaimer: This is my project 🙂

Method 2

To get audio data into your application, you could try a module like microphone, which I haven’t used by it looks promising. This could be a way to avoid having to use the browser for audio input.

To do actual speech recognition, you could use the Speech to Text service of IBM Watson Developer Cloud. This service supports a websocket interface, so that you can have a full duplex service, piping audio data to the cloud and getting back the resulting transcription. You may want to consider implementing a form of onset detection in order to avoid transmitting a lot of (relative) silence to the service – that way, you can stay within the free tier.

There is also a text-to-speech service, but it sounds like you have a solution already for that part of your tool.

Disclosure: I am an evangelist for IBM Watson.

Method 3

To recognize few commands without streaming them to the server you can use node-pocketsphinx module. Available in NPM.

The code to recognize few commands in continuos stream should look like this:

var fs = require('fs');

var ps = require('pocketsphinx').ps;

modeldir = "../../pocketsphinx/model/en-us/"

var config = new ps.Decoder.defaultConfig();
config.setString("-hmm", modeldir + "en-us");
config.setString("-dict", modeldir + "cmudict-en-us.dict");
config.setString("-kws", "keyword list");
var decoder = new ps.Decoder(config);

fs.readFile("../../pocketsphinx/test/data/goforward.raw", function(err, data) {
    if (err) throw err;
    decoder.processRaw(data, false, false);

Instead of readFile you just read the data from microphone and pass it to recognizer. The list of keywords to detect should look like this:

read it /1e-20/
archive it /1e-20/

For more details on spotting with pocketsphinx see Keyword Spotting in Speech and Recognizing multiple keywords using PocketSphinx

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Notify of

Inline Feedbacks
View all comments
Would love your thoughts, please comment.x