Speech Recognition with C# – Dictation and Custom grammar
I’ve been working on a project lately (i will post more about it in a couple of weeks) where i needed to add speech recognition. Thanks to Managed Wrapper for the Speech API (included in .NET since v3), it only takes a few lines of code to add the functionality you want. Of course, the framework provides the classes, etc to write more complex speech recognition software if you need to. If you are interested for the reverse operation – Text to Speech – you can check a previous post here: Text to Speech using C#.
First of all you need to add a reference to the System.Speech assembly:
So, the simplest thing we can do is the following:
First of all, we create a new instance of SpeechRecognitionEngine and tell the engine to get the input from the default audio device. We can also set the input to a .wav file or an audio stream. Then we have to load a grammar. For dictating text, we use an instance of the DictationGrammar class. If we only need to recognize specific specific words, we need to define a custom grammar, more on that later. The SpeechRecognitionEngine.Recognize() is a synchronous method, meaning that will block till you speak or the time defined in the constructor elapses. It performs a single recognition and then returns. We can then iterate through recognized words and process them as we want. A very helpful property of the RecognizedWordUnit is the Confidence which returns a float (between 0 and 1) with the measure of the engine certainty as to the word correctness.
Since the above example won’t be used in (almost) any real application due to the blocking call, let us see an asynchronous version of the above example:
This time, the recognition will continue until we explicitly call the RecognitionEngine.RecognizeAsyncStop() and not immediately after the first recognition. If you only need a single recognition, change the parameter to RecognizeMode.Single. I have use a lambda expression for handling the recognized words. This is the equivalent code if you’d like to use a typed method:
Now consider a scenario where we need to control a program with our voice. We don’t need the full dictation grammar for it, only the available commands, like “Open”, “Close”, “Start”, “Delete”, … . In this case we can create a custom grammar. This can be a very complex thing to do but i’ll just show the basics here.
So what this function basically does, is create a list of alternative items for an element of the grammar and then build the grammar. First we create the list of options (“Calculator”, “Notepad”, …) and then we append it to our grammarBuilder. A GrammarBuilder is used to create grammar objects programmatically. So this grammar will understand the following type of phrases:
“Start [choice]”
where [choice] is one of the 4 string.
You can give a friendly name to your grammar if you want, but you don’t have to. Now we can load this grammar to the recognition engine and start the recognition as in the previous examples:
That’s all for now. Here are a couple of links in MSDN with some useful info:
- How to use GrammarBuilder class – Also has a nice example
- SpeechRecognitionEngine class
You can download a sample project i made with the code above:
Download