Updated September 2020.
Microsoft Windows has had a number of speech systems over the years. Windows 10 has made more changes, and the MSDN documentation is fragmented and confusing. Here, then, is a description of Microsoft speech systems on desktop Windows as of November 2016, mainly for developers and enthusiasts.
The three systems are SAPI5, Microsoft Speech Server, and the new system for Windows 10, the variously-named Windows Runtime or Bing or Mobile system.
SAPI5, Microsoft Speech API version 5 (System.Speech.Synthesis)
From Windows XP onwards, this is the main Windows speech system for desktop applications, like screenreaders. It’s got a huge range of voices from third parties like Acapela, Nuance, Cereproc and Ivona.
- The default Windows Voices have been SAPI5 since Windows XP. Microsoft Sam in Windows XP, Microsoft Anna in Windows Vista and 7, Microsoft Hazel in Windows 8 and 10.
- You can find SAPI5 voices that are installed on your machine in the Control Panel. You are looking for the Text to Speech window, which is hidden away in the Speech Recognition settings in Control Panel. Some SAPI5 voices may be hidden (e.g. Acapela voices) but the ones you can see there can generally work in any program that uses SAPI5:
- You can get lots and lots of SAPI5 voices from third parties, including the free eSpeak.
- You can also get new SAPI5 voices from Microsoft by installing a new Language Pack from Control Panel. Language Packs are all free for Windows 8 and later (and mostly free for Windows Vista and 7 too) and some of them come with SAPI5 voices. For example, if you install the French language pack, you get a French SAPI5 voice that appears in Control Panel and can be used in software that supports SAPI5. List of free SAPI5 voices on Windows 8 and Windows 10
- Voices can be either 32-bit or 64-bit, just like Windows. If you’re on a 64-bit Windows machine, 32-bit voices won’t show up in the Speech window in Control Panel, because it is a 64-bit version of the Speech window. You have to find and run
C:\Windows\SysWOW64\Speech\SpeechUX\sapi.cpl
to see 32-bit voices. Also, 64-bit programs won’t see or be able to use 32-bit-only SAPI5 voices.
- You can find the installed voices in the registry, under
HKEY_LOCAL_MACHINE > Software > Microsoft > Speech
, or if you are on a 64-bit machine, both that key and in HKEY_LOCAL_MACHINE > Software > WOW6432Node > Microsoft > Speech
for the 32-bit voices. You can more rarely find per-user installation of SAPI5 voices in HKEY_CURRENT_USER > Software > Microsoft > Speech
.
- There was a SAPI4, which was the predecessor to SAPI5, and shipped in Windows 2000 and as part of Microsoft Agent. It was similar to SAPI5. No-one uses it nowadays (this is almost certainly untrue, but as a general rule, it’s all SAPI5 now.)
- Developers: The Desktop Windows API you use in .Net is System.Speech.Synthesis and the SpeechSynthesizer class, which is a wrapper round the SAPI5 COM object. You can use SAPI5 on Windows Server or desktop versions. SAPI5 is also available through a COM interface for C++ and other programming languages, Microsoft Speech Object Library.
- Microsoft Speech API (SAPI) 5.3 on MSDN
Windows Mobile, Windows Runtime, or Bing Speech Services
Windows Mobile, which became Windows Phone, and is at the time of writing becoming Windows 10 Mobile, has a text-to-speech system. It’s “the text-to-speech you can use when you are writing Windows Phone Apps.” In some places on the Microsoft website it’s called the Bing Speech Service, but I don’t know how long that will last.
The interesting thing is that this system, and the voices that come with it, have landed on desktop Windows with the arrival of Windows 10. If you open up the Settings App in Windows 10, as opposed to the old Control Panel, you’ll find a Speech setting. This lists different voices from the SAPI5 list: on my Windows 10 UK English machine I have Microsoft Susan Mobile and Microsoft Heera Mobile. These are NOT SAPI5 voices. It’s ANOTHER speech API on your Windows 10 machine. It’s used by Windows Store Apps only (again, now called Universal Windows Platform apps).
- Like SAPI5 voices, you get more by adding Languages to Windows. However, you can’t use the Languages and Language Packs in Control Panel: this gets you SAPI5 voices. You have to use the new Settings App and the Language settings. This gets you more Mobile voices.
- These Mobile voices can now be used by normal Windows applications, but the developer will have to write support for them into their programs – they don’t show up in the SAPI5 system.
- Conversely, SAPI5 voices cannot be used in Windows Store Apps, which can only use this Windows Mobile system.
- You can find the installed voices in the registry, under
HKEY_CURRENT_USER > Software > Microsoft > Speech Virtual
and in HKEY_LOCAL_MACHINE > Software > Microsoft > Speech_OneCore
(duplicated in HKEY_LOCAL_MACHINE > Software > WOW6432Node > Microsoft > Speech_OneCore
for 32-bit machines.)
- There’s ALSO an online Bing Speech service, also called Microsoft Translator, now called the Bing Speech API in Microsoft Cognitive Services. This is a web service you call with text to get audio files back with the speech you requested. So it’s not something you use on your Windows machine (though a website or App might).
- Developers: the API you use in .Net is Windows.Media.SpeechSynthesis and the SpeechSynthesizer class. This works just like the .Net SAPI5 API, System.Speech.Synthesis, but it’s targeted at Windows Store Apps / Windows Runtime Apps / Universal Windows Platform Apps. So System.Speech.Synthesis is Desktop, and Windows.Media.SpeechSynthesis is Windows Runtime, and they are completely different systems. As of Windows 10 Anniversary Edition in 2016 you can access the Windows.Media.SpeechSynthesis API from desktop apps, so you should be able to access these voices – but you’ll have to write specific code for them, you can’t use the SAPI5 API.
- You CAN actually get the Mobile voices to work with SAPI5, but you have to hack the registry, and since I might end trying to fix your computer when you’ve done this I’m not going to give you detailed instructions. Essentially, copy the Tokens registry key for the Mobile voice into the SAPI5 Voices key in
HKLM > Software
, do the same for 32-bit if necessary, and there you go!
- Finally, also lurking in the same speech system are the Cortana voices, used by Microsoft’s Cortana digital personal assistant. They are also described in the same registry key but don’t show up in Windows.Media.SpeechSynthesis. Hacking the registry will probably enable them.
Microsoft Speech Platform
This is a set of voices and yet another speech system, but for use on Microsoft Windows Server. SAPI5 doesn’t work on Windows Server. They are intended to provide speech for servers, like automated voice menus when you call somewhere, that kind of thing. They aren’t intended for desktop programs.
- This has Microsoft-only voices, and they won’t work with SAPI5 programs (or Windows Store Apps). A program could use them if the developer writes the code for it: the NVDA screenreader has done so, so you can install Microsoft Speech Platform and use the voices in NVDA.
- You get the engine and the voices as installers from the Microsoft website, and install them on your server.
- Developers: this uses the Microsoft.Speech API in .Net.
- Microsoft Speech Platform on MSDN