Microsoft Speech

Updated September 2020.

Microsoft Windows has had a number of speech systems over the years. Windows 10 has made more changes, and the MSDN documentation is fragmented and confusing. Here, then, is a description of Microsoft speech systems on desktop Windows as of November 2016, mainly for developers and enthusiasts.

The three systems are SAPI5, Microsoft Speech Server, and the new system for Windows 10, the variously-named Windows Runtime or Bing or Mobile system.

SAPI5, Microsoft Speech API version 5 (System.Speech.Synthesis)

From Windows XP onwards, this is the main Windows speech system for desktop applications, like screenreaders. It’s got a huge range of voices from third parties like Acapela, Nuance, Cereproc and Ivona.

  • The default Windows Voices have been SAPI5 since Windows XP. Microsoft Sam in Windows XP, Microsoft Anna in Windows Vista and 7, Microsoft Hazel in Windows 8 and 10.
  • You can find SAPI5 voices that are installed on your machine in the Control Panel. You are looking for the Text to Speech window, which is hidden away in the Speech Recognition settings in Control Panel. Some SAPI5 voices may be hidden (e.g. Acapela voices) but the ones you can see there can generally work in any program that uses SAPI5:

  • You can get lots and lots of SAPI5 voices from third parties, including the free eSpeak.
  • You can also get new SAPI5 voices from Microsoft by installing a new Language Pack from Control Panel. Language Packs are all free for Windows 8 and later (and mostly free for Windows Vista and 7 too) and some of them come with SAPI5 voices. For example, if you install the French language pack, you get a French SAPI5 voice that appears in Control Panel and can be used in software that supports SAPI5. List of free SAPI5 voices on Windows 8 and Windows 10

  • Voices can be either 32-bit or 64-bit, just like Windows. If you’re on a 64-bit Windows machine, 32-bit voices won’t show up in the Speech window in Control Panel, because it is a 64-bit version of the Speech window. You have to find and run C:\Windows\SysWOW64\Speech\SpeechUX\sapi.cpl to see 32-bit voices. Also, 64-bit programs won’t see or be able to use 32-bit-only SAPI5 voices.
  • You can find the installed voices in the registry, under HKEY_LOCAL_MACHINE > Software > Microsoft > Speech, or if you are on a 64-bit machine, both that key and in HKEY_LOCAL_MACHINE > Software > WOW6432Node > Microsoft > Speech for the 32-bit voices. You can more rarely find per-user installation of SAPI5 voices in HKEY_CURRENT_USER > Software > Microsoft > Speech.
  • There was a SAPI4, which was the predecessor to SAPI5, and shipped in Windows 2000 and as part of Microsoft Agent. It was similar to SAPI5. No-one uses it nowadays (this is almost certainly untrue, but as a general rule, it’s all SAPI5 now.)
  • Developers: The Desktop Windows API you use in .Net is System.Speech.Synthesis and the SpeechSynthesizer class, which is a wrapper round the SAPI5 COM object. You can use SAPI5 on Windows Server or desktop versions. SAPI5 is also available through a COM interface for C++ and other programming languages, Microsoft Speech Object Library.
  • Microsoft Speech API (SAPI) 5.3 on MSDN

Windows Mobile, Windows Runtime, or Bing Speech Services

Windows Mobile, which became Windows Phone, and is at the time of writing becoming Windows 10 Mobile, has a text-to-speech system. It’s “the text-to-speech you can use when you are writing Windows Phone Apps.” In some places on the Microsoft website it’s called the Bing Speech Service, but I don’t know how long that will last.

The interesting thing is that this system, and the voices that come with it, have landed on desktop Windows with the arrival of Windows 10. If you open up the Settings App in Windows 10, as opposed to the old Control Panel, you’ll find a Speech setting. This lists different voices from the SAPI5 list: on my Windows 10 UK English machine I have Microsoft Susan Mobile and Microsoft Heera Mobile. These are NOT SAPI5 voices. It’s ANOTHER speech API on your Windows 10 machine. It’s used by Windows Store Apps only (again, now called Universal Windows Platform apps).

  • Like SAPI5 voices, you get more by adding Languages to Windows. However, you can’t use the Languages and Language Packs in Control Panel: this gets you SAPI5 voices. You have to use the new Settings App and the Language settings. This gets you more Mobile voices.Open Settings in Windows 10 and go to Time & Language, then Region & Language, and Add a language.

    Select your new language and click the Options button

    Click Download for the Speech option.

  • These Mobile voices can now be used by normal Windows applications, but the developer will have to write support for them into their programs – they don’t show up in the SAPI5 system.
  • Conversely, SAPI5 voices cannot be used in Windows Store Apps, which can only use this Windows Mobile system.
  • You can find the installed voices in the registry, under HKEY_CURRENT_USER > Software > Microsoft > Speech Virtual and in HKEY_LOCAL_MACHINE > Software > Microsoft > Speech_OneCore (duplicated in HKEY_LOCAL_MACHINE > Software > WOW6432Node > Microsoft > Speech_OneCore for 32-bit machines.)
  • There’s ALSO an online Bing Speech service, also called Microsoft Translator, now called the Bing Speech API in Microsoft Cognitive Services. This is a web service you call with text to get audio files back with the speech you requested. So it’s not something you use on your Windows machine (though a website or App might).
  • Developers: the API you use in .Net is Windows.Media.SpeechSynthesis and the SpeechSynthesizer class. This works just like the .Net SAPI5 API, System.Speech.Synthesis, but it’s targeted at Windows Store Apps / Windows Runtime Apps / Universal Windows Platform Apps. So System.Speech.Synthesis is Desktop, and Windows.Media.SpeechSynthesis is Windows Runtime, and they are completely different systems. As of Windows 10 Anniversary Edition in 2016 you can access the Windows.Media.SpeechSynthesis API from desktop apps, so you should be able to access these voices – but you’ll have to write specific code for them, you can’t use the SAPI5 API.
  • You CAN actually get the Mobile voices to work with SAPI5, but you have to hack the registry, and since I might end trying to fix your computer when you’ve done this I’m not going to give you detailed instructions. Essentially, copy the Tokens registry key for the Mobile voice into the SAPI5 Voices key in HKLM > Software, do the same for 32-bit if necessary, and there you go!
  • Finally, also lurking in the same speech system are the Cortana voices, used by Microsoft’s Cortana digital personal assistant. They are also described in the same registry key but don’t show up in Windows.Media.SpeechSynthesis. Hacking the registry will probably enable them.

Microsoft Speech Platform

This is a set of voices and yet another speech system, but for use on Microsoft Windows Server. SAPI5 doesn’t work on Windows Server. They are intended to provide speech for servers, like automated voice menus when you call somewhere, that kind of thing. They aren’t intended for desktop programs.

  • This has Microsoft-only voices, and they won’t work with SAPI5 programs (or Windows Store Apps). A program could use them if the developer writes the code for it: the NVDA screenreader has done so, so you can install Microsoft Speech Platform and use the voices in NVDA.
  • You get the engine and the voices as installers from the Microsoft website, and install them on your server.
  • Developers: this uses the Microsoft.Speech API in .Net.
  • Microsoft Speech Platform on MSDN

12 thoughts on “Microsoft Speech”

  1. Hi,

    We’re using the Speech Platform. Your blog is about the only current one that talks about. If you know of any support resources, could you send me an email? Thanks.

    Murray

  2. I would like to know if Microsoft make Natral voices for windows 7 SAPI5!

  3. It seems as though some things were changed in the anniversary update to Windows 10, as I now can’t find the mobile voices under the given registry key. Nothing (other than the default value) exists there. I use some screen reader (and related) programs which would benefit greatly from being able to use the less laggy and more understandable mobile voices, but many of these apps don’t have direct support yet. Any advice is appreciated. This article is about a year old but remains the only easily-located source of information out there. I understand that you’re hesitant to give specifics because you don’t want people to break their computers, but determined people are still going to find their own ways to get these things done.

  4. Hello! I downloaded a spelling program called Speak N Spell. No matter what speech voice I set in 64-bit Windows 10, the program uses David (which is hard to understand and robotic-sounding) (Here are the voices I currently have installed (all mobile): David, George, Susan, Hazel, Zira, Mark). I think from what I am reading that this are all mobile voices and what the program needs is a desktop voice, right?

  5. I figured it out from what you wrote:

    No matter what the Settings> Speech> Control Panel>Text-to-Speech voice is set to, you also have to look at what the C:\Windows\SysWOW64\Speech\SpeechUX\sapi.cpl voice is set to. In my case, I changed what the sapi.cpl voice was, and the program was able to use that. Thanks for all your help and info!

  6. Hi. I’m trying to use the self-voicing Accessibility plugin for NetBeans 8.2, from http://www.quorumlanguage.com, running under Windows 10 November update. I also have JAWS V18 installed.
    In NetBeans, under Tools >Options >Accessibility, there’s a list view for search engines, and another for voices.
    Initially, both lists were empty, and the plugin didn’t work.
    The Quorum guys advised me to install .net 3.5, and now, the speech engines list contains 3 items: MICROSOFT_API, JAWS, and NVDA, but the voices list is empty in all 3 cases.
    Whichever engine I select, the plugin seems to hijack JAWS speech, sometimes causing JAWS to crash and reload.
    How can I make the standard Microsoft SAPI 5 voices appear in the voices list, so I can select one of these instead of JAWS?
    In Text to Speech under Control Panel, 3 voices are shown: Microsoft David Desktop, Microsoft Hazel Desktop, and Microsoft Zira Desktop.

    many thanks for any help.

    John (Burling)
    Email: j.burling@btconnect.com

  7. The blog author states that “Mobile voices cannot be used by programs targeting SAPI5 (i.e. Desktop programs like screenreaders).” This was a correct statement when the blog was last updated in 2016. However as a screen reader user myself, I want to point out that as of 2017 the JAWS and NVDA screen readers both began supporting the Microsoft Mobile voices. Consequently these screen readers can now talk using the mobile voices. NVDA calls them “Windows One Core” voices whereas JAWS refers to them as “Microsoft Mobile” voices.

    Regarding John Burling’s comment: I am not familiar with Net Beams as I run JAWS on two computers which have Windows 10 operating system. However I suggest going into the Jaws menu, voices, voice profiles. From there you can switch from the default TTS Eloquence to either SAPI5X or SAPI5X64. SAPI5X in Jaws refers to 32-bit voices, x64 is 64-bit voices.

  8. >SAPI5 doesn’t work on Windows Server

    Not true.
    1st of all both Desktop and Speech Platform are Sapi5.
    By default Desktop sapi5 is installed in Win2012 and Win2016 from the box.
    Speech Platform can be installed and live together with Desktop. Their COM GUIDs are different. You can create one or another. They use different registry keys to store voices tokens: Speech and SpeechServer.

  9. Mikhail, I’m not a server expert, but when I spin up a Windows Server in Azure I find that SAPI5 doesn’t work. The components are all there: might just need regsvr32 run on sapi.dll, maybe? Or I’m wrong, smile.

Leave a Reply

Your email address will not be published. Required fields are marked *