Log in

View Full Version : Voice controllers for the Pocket PC


Menneisyys
09-21-2005, 01:20 PM
Voice controllers for the Pocket PC

Now that VITO has released version 1.1 of their Voice2Go (VITO for short) software and they have asked me to post an unbiased review of it, I've decided to compare it to its only alternative, Microsoft Voice Command (VC). At the same time, I also introduce genuine Voice Command hacks never ever published before to fix the problems of the startup annoyance mentioned for example here (http://www.ipaqhq.com/forums/showthread.php?t=22361). Therefore, it's worth reading this article even for VC users that otherwise aren't interested in the VITO app at all, but suffer a lot from the autostart facility of VC, especially when installed to a memory card (which is not accessible at boot time – with some users, this has even resulted in the need of a hard reset (http://www.aximsite.com/boards/showthread.php?t=93536)).

Isn't this an apples-to-oranges comparison, you may ask, if you already know the two apps. To a certain degree, it is, because these two programs have quite different feature sets/usage areas. However, as they have common functions (access contacts/start programs, for example), I still discuss them together.

Microsoft Voice Command (http://www.microsoft.com/windowsmobile/downloads/voicecommand/default.mspx) (tested version: 1.5) is especially useful for native (or at least very good) US/UK English, German and French speakers that don't want to fuss with training and also want direct access to their contacts/media files/programs, by just telling the Pocket PC their name. It, however, doesn't let the user run macros / create custom passwords.

Pros:
- very good recognition rate – it was able to understand my non-native (!) English about 90% of the time, with all the tested applications
- clearly better in high-noise environments than VITO; particularly with abbreviations. For example, it was almost always to recognize 'TCPMP' all letters spelled out even in a very noisy environment. On the other hand, you need to either spell all the letters in a (abbreviated) word or none of them. For example, it won't recognize SKTools if you spell it es-key-tools, only as sktools. This may be a minor annoyance.
- generally very fast – the VITO app is noticeably slower to start capturing input after you press its hotkey
- as with the VITO app, start of recognition is assignable to a button
- very good at recognizing long family names not pronounced in English. For example, it could find most of the test (long) Finnish family names without problems. I didn't even try to pronounce them in a non-Finnish way. Great! Note that this only applies to long Finnish family names (I haven't tested other languages). I've tested with some short(er) Finnish christian names. It has never been able to recognize, for example, 'Tiina' (pronounced as 'tee-nah' for non-Finnish speakers) – always tried to offer 'Steve' instead.) The German version may be better-suited for (as opposed to English) 'clearly and evenly pronounced, WYWIWYS (what you write is what you say)', non-Indo-European languages like Finnish or Hungarian – I haven't tested this myself as I've found the English version sufficient for searching in my Finnish contacts database.
- You won't see any kind of speed degradation if you install it on an alternative medium (for example, a memory card), unlike with the VITO app.
- You don't need to train the app to be able to recognize newly-copied media files/ installed apps/added contacts on the Pocket PC.

Cons:
- always occupies about at least 4Mbyte RAM after the first start, and this will increase if you have many contacts/multimedia files. According to the System Requirements page (http://www.microsoft.com/windowsmobile/downloads/voicecommand/sysreq.mspx), its dynamic memory consumption can easily be as high as 7 Mbytes with 500 contacts and 100 multimedia files on the PDA. The VITO app is much better at this – its dynamic memory consumption doesn't increase with the
- always creates a Voice Command.lnk file in \Windows\Startup; you need to manually hexedit voicecmd.exe to get rid of this 'feature' – see the next subsection.

How to get rid of the autostart annoyance?

1, get a hexeditor. Please read this thread (http://www.firstloox.org//forums/showthread.php?p=35348) on obtaining / using them when editing files on the PDA.
2, navigate to the home directory of your installed VC; copy voicecmd.exe (it's in its home install directory; for example, \Program Files\Microsoft Voice Command US PPC trial 1.50 or SD-MMCard\Microsoft Voice Command US PPC trial 1.50 with the trial 1.50) to the desktop. If it's not readable, delete \Windows\Startup\Voice Command.lnk, reset the device and try again.
3, change the (hexa) value 56 (letter V) to 00 at position hexa 30e0, as in the following two screenshots:

before (http://www.winmobiletech.com/kuvat/VoiceCommandHack-1.png)

after (http://www.winmobiletech.com/kuvat/VoiceCommandHack-2.bmp.png)

VITO Voice2Go 1.1 (http://vitotechnology.com/en/products/voice2go.html) is an entirely different animal. It can't directly invoke/choose media files/contacts; it's only for actually controlling your device.

It doesn't offer 100% recognition. For example, I've almost never managed to get it recognize 'ahvenanmaa' (a Finnish county, before you ask – was one of the (Finnish-only) test words I've used. I haven't used other languages in this test.)

Pros:
- especially good for non-native US/UK English, German and French speakers (albeit, as has already been pointed out, even a non-native English speaker can use even the US version of VC without major problems).
- much as continuous 'magic word' capturing recognition takes 20-30% CPU power, it's still much better than that of Voice Controller – the latter has to completely rely on button-based activation
- macro defining and contact calling capabilities, known from other VITO products
- can be used to answer/dial a BT/serial/IrDA-connected phone. (I haven't managed to make it work with my t610 though – I used BT. I haven't spent much time on this, however – with some serious hacking, I most probably would have been able to do this.)

Cons:
- may be slow at runtime, even if you install it to RAM, in part because you can't decrease the listening time (see next bullet)
- you can't set/decrease the 'listening' time frame if you, for example, prefer very short commands. This certainly degrades responsiveness and speed of access.

You can find a comparison table here (http://www.winmobiletech.com/sekalaiset/VCvsVITOVoice2GoTable.html).

Bottom line:
- if you have tons of contacts/media files, get VC. It is surpirisingly good even for non-native speakers.
- if you want to invoke complex macro functions (of which, only VITO's macro apps – and, incidentally, the Mort apps (http://www.pocketpcthoughts.com/forums/viewtopic.php?p=356534) – are capable of).

(Even negative!) feedback/questions are welcome.

Tye
09-21-2005, 07:46 PM
1. By auto start annoyance, do you mean the fact that it always auto starts and takes up a lot of memory? Or is it something else? I've only used the demo.

2. Could Voice Command be made to run nscriptm scripts?

3. What about Fonix Voice Central: http://pocketgear.com/software_detail.asp?id=4659

Phillip Dyson
09-21-2005, 08:19 PM
Thanks for the article.

Actually both the Toshiba e830 and Eten m500 come with voice control apps. Though I admit since they don't seem to be sold commercially they don't really apply.

Menneisyys
09-21-2005, 08:34 PM
I've received (see for example http://www.pocketpcthoughts.com/forums/viewtopic.php?p=366177 and, most importantly, http://www.aximsite.com/boards/showthread.php?p=834172 ) several questions about the VC annoyance. I'll elaborate a bit more on it.

There're several problems with VC starting automatically (from \Windows\Startup):

1, you won't be able install the application on storage cards because, at boot time, they may still be unavailable. It's at boot time that the contents of \Windows\Startup is executed. If you install VC (or any app loaded at boot time) app on a storage card, you most probably will get an annoying error message during the boot. This can be not only annoying at cases, but can also make booting in impossible.

This means, if you don't force VC not to reregister itself to be auto-loaded (or, you don't uncheck the Enabled checkbox each time before rebooting/finishing using in Settings / Voice Command), you will only be able to install it in the main RAM or the built-in File Store, which isn't particularly a good idea with RAM/FS memory-constrained devices like most previous-generation (WM2003) devices.

2, not only will you be forced to install the application using scarce static (storage) memory resources (RAM or the built-in File Store ROM), but also VC will take up at least 4 Mbytes of precious (dynamic) RAM even when you don't plan to use it at all. Why waste 4-7 Mbytes of dynamic RAM all the time on an app that you don't even want to use?

(Again: 4 Mbytes is the minimal memory requirement with very few contacts defined; with hundreds of contacts, this can be considerably higher – even 6-7 Mbytes! Think of running it on the space-constrained rz1715 or a 64M RAM WM5 device like the Dell Axim x51v!)

This is why it's advantageous to get rid of the automatic recreation of \Windows\Startup\Voice Command.lnk entirely.

Again, note that even when you disable the application in Settings / Voice Command, it will recreate the file in Startup when you start to use it again.

Menneisyys
09-21-2005, 08:52 PM
1. By auto start annoyance, do you mean the fact that it always auto starts and takes up a lot of memory? Or is it something else? I've only used the demo.

As this is a frequently asked question, I've elaborated on it a bit in a separate post - please read my previous one.

2. Could Voice Command be made to run nscriptm scripts?

Yes - just put the appropriately named .lnk files in \Windows\Start Menu\Programs.

3. What about Fonix Voice Central: http://pocketgear.com/software_detail.asp?id=4659

Thanks for the tip; I'll post an updated roundup, also including Fonix Voice Central (or an add-on review).

Menneisyys
09-22-2005, 07:52 AM
OK, here's an additional review of Fonix VoiceCentral 3.0 (http://www.fonixspeech.com/pages/voicecentral.php), as promised, with some genuine DLL relocation information (resulting in 1 Mbytes of storage RAM freeing up).

This application is pretty similar to VC in that it doesn't need any kind of training to access Start Menu programs or contacts, unlike the VITO app. However, there're a lot of differences too.

First, it has extensively configurable speech synthesing capabilities: you can speed up reading to get rid of the default, slow explanations/lists, which can be really annoying at cases (unless you press the hardware button assigned to the application again to stop it). It remains intelligible at speed 70-72 at most. You can also choose other voices/modify the pitch as shown below:

http://www.winmobiletech.com/kuvat/FonixVC-1.gif.png

I've also tested its speech synthetization capabilities with specially pronounced words like 'adobe' (aedoubi', as opposed to aedoub) and vehicle (vi:'ikl, as opposed to vi:'haikl). It was able to pronounce both words correctly. It seems the application also has a dictionary of specially pronounced words (see \Fonix\fre\dt01a\usenglish\dtalk_us.dic – it even has the word 'adobe' in ASCII if you cast a glance in it with a file viewer).

Incidentally, as can also be seen in the screenshot above (see the Menn speaker!), you can easily 'hack' "new" (you can only change the pitch and whether it's male or female in the configuration file) sounds into new subdirectries of \fre\dt01a\usenglish\ – just create a new directory there and copy a snddat.fmt file from some other, pre-defined subdirectory of \fre\dt01a\usenglish\.

My main problem with VITO, the inability to set the maximal recognition time, is also a non-issue here: the Fonix application supports fine-tuning these values.

http://www.winmobiletech.com/kuvat/FonixVC-2.gif.png

It, unlike MS VC (and like VITO), also allows for building your own speech commands:

http://www.winmobiletech.com/kuvat/FonixVC-3.gif.png

This is far inferior to VITO's built-in capabilities, though (if you don't use third-party macro-capable application like VITO ButtonMapper (http://vitotechnology.com/en/products/buttonmapper.html)) – all you can do here is defining aliases for the otherwise also accessible Start Menu programs. The only addition is the ability to also pass pre-defined parameters to them – for example, URL's to Pocket Internet Explorer, a given song to Windows Media Player etc. You can achieve the same, however, with MS VC too if you just create the appropriate .lnk file in \Windows\Start Menu\Programs, passing any parameter to the chosen application. That is, this isn't a 'killer', unmatched feature of the Fonix app either.

It, as with the MS VC app, can't monitor continuously the sound input; you need to start it from either a menu or assigned to a hardware button. Then, the recognition starts a bit slower than with the MS VC app. That's because it isn't resident in memory, unlike VC. It has both good (no memory conspumption unless it's absolutely necessary) and bad (slower to load) aspects.

However, even when run from storage cards, the Messaging/Inbox start time test resulted in tolerable results – far better than those of VITO. Downclocked to 104 MHz, it took MS VC (after pressing its hotkey) 3 (three) secs to launch Messaging; with the Fonix app, this took 10 seconds when installed on a storage card (with 2 sec of max. record time and using the "most accurate" speech recognition); finally, with the VITO app, it took at least 15 seconds (not counting in the obligatory Correct/Wrong choice after the recognition, which also takes some time), even when installed to RAM (again, VITO sometimes, in 10-20% of cases, displays the initial dialog far faster than the usual 8 seconds). (Incidentally, all the three applications are able to process sounds at 104 MHz – so, you won't run into problems with them even on low-end devices like the iPAQ rz1715. It's another question you may end up not having any free RAM after starting MS VC on that Pocket PC, particularly if you have many contacts.)

Finally, the most important aspect: recognition accuracy of the Fonix application. I've found it definitely worse than that of MS VC, under exactly the same circumstances, even when I set the Speech Recognition slider of the Fonix app in the Main settings tab to the most accurate position. It consistently delivered bad results with my spoken English (which, again, is not native!): always kept launching the wrong application/quitting when, with exactly the same input, MS VC was correct in most cases.

Unfortunately for non-English speakers, there still aren't non-English versions of the application (unlike that of MS VC, which has a German and a French version too).

Another annoyance with the application is that it will read the headers of all your mail even when there is no mail in your Inbox/Messaging at all. Unfortunately, unlike the (lengthy) reading of available speech commands, this can't be stopped by a harware button press – you must tap the screen to stop this. This should be fixed by the Fonix people as soon as possible!

Some technical data: it doesn't take any dynamic memory between invocations, unlike the other two applications (especially MS VC). Its static memory consumption is 1.5M (can be installed anywhere) and about 1 Mbytes in 5 Fonix*/Fnx* DLL files in \Windows (and a 40k help HTML file). You can safely relocate the DLL's onto even a storage card, in the home directory (in the same directory as VoiceCentral.exe is located) of the app – they don't need to be in the System Path. These DLL's are as follows:


06/18/2004 02:16 PM 193,536 FnxAsrCoreCE41.dll
01/07/2005 06:16 PM 39,936 FonixCommon40CE.dll
01/07/2005 06:16 PM 27,648 FonixTts40CE.dll
01/14/2005 11:17 AM 13,824 FonixTtsDt40CE.dll
01/07/2005 12:58 PM 764,416 FonixTtsDtUs40CE.dll


Bottom line: I'd prefer MS VC to this application, mostly because it had problems with my spoken English. For native English speakers, though, it may (no guarantees!) work better. For the same price ($40), the MS VC application offers much better performance (much faster at spoken input, much better recognition etc).

Menneisyys
01-28-2006, 08:27 PM
In the meantime, Voice2Go version 1.2 has been released with reduced delay. See http://www.pocketpcmag.com/blogs/index.php?blog=3&title=voice_controller_app_vito_voice2go_versi&more=1 for more info.