Log in

View Full Version : TUTORIAL: PPC spellchecker/word completion dictionaries


Menneisyys
09-17-2005, 09:27 AM
Everything you wanted to know about the built-in Pocket PC spellchecker/word completion dictionaries

In this article, I explain how/where the Word spellchecker and the generic word completion dictionaries are stored and what you need to know about changing/extending them.

First, the Word spellchecker dictionaries and the system-wide (meaning they are accessed from all PPC applications, not just the spellchecker module of Pocket Word), generic word completion dictionaries are completely independent of each other. This means if you, for example, change the language of one of them, the other will still stay English (or, to be put more precisely, the language of your Pocket PC). (This, incidentally, also applies to the additional, custom dictionary files, the subject of section 4. For example, your built-in dictionary may contain English words, while your custom one Finnish only.)

Therefore, I completely separate the two cases.

First, I elaborate on common stuff like getting and changing these dictionaries (which means acquiring/copying files), which applies to both of these dictionaries. Only after this do I turn to, first, the Word dictionaries and, finally, the generic word completion dictionaries.

1. Generic overview of replacing the system dictionaries

No matter which of the two dictionaries you want to replace (in order to change their language), there are some common things that apply to both of them.

By default, the Windows Mobile operating system only has dictionaries in one language – the language the entire operating system is in.

This means you need to install additional/other language dictionaries by hand.

While doing this, you, generally (except the case described in section 2.3.2 Editing the Registry) make use of 'shadowing'. This word is used to denote that you can disable a file in ROM (the pre-installed dictionaries are all installed in ROM) by copying a file of the same name in the same directory, into the RAM. Then, the RAM version will become the default one, disabling the one in ROM. This is what the dictionary language change is based on.

For example, as far as Word is considered, depending the language of your device, there will be a mssp2_< language code >.lex file on your Pocket PC, in the \Windows directory, in the ROM. Here, < language code > is, for example, ge, en, fr, es, it – you may have already guessed that the first stands for German, the second English and so on. If you, for example, have a Pocket PC with German Windows Mobile, this file will be a German spellchecker dictionary and, therefore, will be called mssp2_ge.lex. If you have a French Pocket PC, you'll have a mssp2_fr.lex dictionary file and so on. If you just copy over a dictionary file of a different language, the latter will become the default and all subsequent spellchecking will use it, instead of the default, German one.

System-wide word completion dictionaries, on the other hand, are kept in the files dictprob.dat and statdict.dat, also in ROM and also in the \Windows directory. You can also copy dictprob.dat and statdict.dat files of other language over the ones in ROM; after that, all word completion suggestions will be offered in the new language.

These files are all compatible with any other Windows Mobile 2003 and Windows Mobile 2003 SE device. This means you can use a Windows Mobile 2003 SE dictionary with any other Windows Mobile 2003 SE or Windows Mobile 2003 device, and vice versa. This certainly helps in acquiring dictionaries in different languages.

The fact that the spellchecker dictionary files can be freely distributed (ChDict.NET, to be introduced later, also contains them!) also means getting and using spellchecker dictionaries of a different language is not an infringement of the law (otherwise, ChDict.NET couldn't exist) – therefore, you can freely get/install any dictionaries. To my knowledge, anyway (again, I point at ChDict.NET).

1.1 How do I get dictionary files in different languages?

If you plan to replace the Word spellcheck dictionary, you are lucky – go to the Word-specific (2nd) section (2. The Pocket Word Spellcheck Dictionary) below. If the ways of getting the dictionary I've explained there don't work (which is highly impropable), you may want to come back here and do everything the 'hard way' – that is, the default way you must do everything if you want to replace the default word completion dictionary.

To do everything "the hard way", you'll need to get a ROM upgrade image. You may download one for your particular device type and play around with it – DumpRom (to be explained later) will most probably be able to read its .nbf ROM files too. To be absolutely safe, if you want to get the German, English or the French dictionary (there's no Italian/Spanish upgrade), however, use the Pocket Loox 720 ROM upgrades (http://www.fujitsu-siemens.com/support/linkapplication.html?LNG=EN&ProductID=3098). I've tested the (standard!) dictionary files in them, they work great on other WM2003(SE) PDA's too – this is why I'm recommending these upgrades. Click the Select All checbox in the column on the left and click Go. On the next screen, choose the right download under Operating Systems Tools.

Start the setup utility after the download. It'll decompress the .nbf file (that's the file you'll need to run DumpRom on) into c:\fsc.tmp\PocketLOOX7xx\FlashUpdateV713\OSV_10009F.nbf (the French version), to c:\fsc.tmp\PocketLOOX7xx\FlashUpdateV712\OSV_10009U.nbf (the English version) or to c:\fsc.tmp\PocketLOOX7xx\FlashUpdateV711\OSV_10009G.nbf (the German version).

After this, get DumpRom (http://wiki.xda-developers.com/wiki/DumpRom) and extract dumprom.exe from the ZIP archive. This utility extracts individual files from .nbf ROM images. Copy it to the above-listed target directory and issue the

dumprom.exe -4 -d . OSV_10009G.nbf

command (for the German ROM), or

dumprom.exe -4 -d . OSV_10009U.nbf

command (for the English ROM), or

dumprom.exe -4 -d . OSV_10009F.nbf

(for the French ROM).

All files from the ROM will be extracted in the current directory. You'll only need to find mssp2_< language code >.lex from there (if you need the Word spellcheck dictionary) or the dictprob.dat and statdict.dat files (for the completion dictionary).

(Incidentally, you may also want to read this article (http://www.firstloox.org//forums/showthread.php?t=4774) on other, practical uses of DumpRom.)

The extracted files must, then, be copied on your PDA. To do this, navigate to \Windows (preferable with a desktop-based tool like Total Commander (TC) + its WinCE FS plug-in (http://pocketpcmag.com/forum/topic.asp?TOPIC_ID=15577)) and just copy (F5 in TC) the needed files in there.

2. The Pocket Word Spellcheck Dictionary

This file is located in the \Windows directory and named 'mssp2_< language code >.lex, as has already been pointed out in Section 1.

2.1 Acquiring/using English and Spanish dictionaries - ChDict.NET

First, if you need to use either English or Spanish dictionaries (no German/French/Italian!) on your PDA, probably the easiest way to acquire them is getting ctitanic's (author of Tweaks2k2) ChDict.NET (http://www.freewareppc.com/educational/chdictnet.shtml), installing and using it. It contains the Spanish and English dictionaries. They can even be dynamically switched.

If you don't want to use its switching capabilities and only need one dictionary of the two (to avoid the other taking up precious RAM), just set the given dictionary to be the default and you're all set. You can even remove the other dictionary (for example, \Windows\mssp2_es.lex, if you only need the English dictionary because, say, you have a German or a French device and you need to spellcheck English documents) from \Windows (and \Program Files\ChDict .NET\ChDict.exe). Do NOT, however, "officially" uninstall the entire application because it'll also remove both dictionaries – even the one you're currently using!

If you plan using ChDict.NET, you may stop reading (except when you also want to change the language of your word completion dictionary or want to know more about custom, user-defined dictionaries) – this was all you need to know.

If you don't want to install ChDict.NET but want to use the dictionaries in it, you should do the following. After decompressing the ZIP file, there will be a chdict.Arm 1100 (4K) v3.00.CAB file inside. Most probably, you'll be able to step into it right away if you use, say, Total Commander, on the desktop. Just extract the dictionary file of your choice – if you need the English dict., extract mssp2_en.002; if you need the Spanish one, extract mssp2_es.003. After doing this, rename its/their extension (002/003) to .lex. These files must then be manually transferred to the PDA, however – see below.

2.2 Acquiring non-English/Spanish dictionaries

If you need a dictionary of a different language (French/German/Italian etc), you must do some manual work.

First, you must acquire a dictionary file. You can do this by extracting this from a ROM image (see section 1.1) or transferring it from another PDA: you can extract the needed Word spellcheck dictionary on a PDA that has an operating system in the needed language.

If you want to go the second way, all you have to do is the following: on the PDA that has the needed language, navigate into \Windows (preferable with a desktop-based tool like the above (section 1.1)-mentioned Total Commander (TC) + its WinCE FS plug-in (http://pocketpcmag.com/forum/topic.asp?TOPIC_ID=15577)), look for the file named mssp2_< language code >.lex and copy (F5 in TC) it to the desktop. Please note that you may need to exit Word on the Pocket PC for this file to be readable.

If you don't have any PDA with the needed language (and English/Spanish – see ChDict.NET – aren't your choice either), go back to section 1.1 and extract the needed file from a ROM image, using DumpRom.

2.3 Making your PDA use of the new Word spellcheck dictionary

There're two ways of doing this: the first is shadowing the default one, which involves renaming the new dictionary file; the second is editing the Registry to point to the new one.

2.3.1 Shadowing the default dictionary

By default, depending the language of your device, there will be a mssp2_< language code >.lex file in \Windows. If you, for example, have a Pocket PC with German Windows Mobile, this file will be a German spellchecker dictionary and, therefore, will be called mssp2_ge.lex. If you have a French Pocket PC, you'll have a mssp2_fr.lex dictionary file and so on.

If you just 'shadow' this dict. file (that is, just copy a dict. file with the same name in \Windows), the latter file will become the default spellchecker dictionary. This means if you have a French Pocket PC and want to use, say, the German dictionary, after acquiring the latter (mssp2_ge.lex), you need to rename it to mssp2_fr.lex and copy it to \Windows. This is needed because, by default (unless you instruct it to do otherwise by modifying the registry – see section 2.3.2), the system will always access the dictionary of the name mssp2_fr.lex, totally independent of its contents (which may be of any other language).

2.3.2 Editing the Registry

You can avoid renaming (and shadowing) files if you just edit the registry value [HKEY_LOCAL_MACHINE\SOFTWARE\ Microsoft\Spell Check\Main_Dict to point to the spellchecker file of your choice. It, then, can be located anywhere and won't need to be renamed. (Please read my roundup of registry editors (http://pocketpcmag.com/forum/topic.asp?TOPIC_ID=16508) if you want to go this way but don't know which editor to choose.)

3. The system-wide word completion dictionaries

Unfortunately, unlike with the Word spellcheck dictionaries, these files are far harder to acquire. For example, they are always locked (unlike with Word) and, therefore, can't be copied from other PDA's. Therefore, you'll end up extracting the files needed, dictprob.dat and statdict.dat, from ROM upgrade .nbf images. Please read section 1.1 on this.

After extracting these two files, they must be copied to \Windows. (Please note that, after this, they become locked by compime.dll and you'll have a hard time deleting them, unlike with the Word spelling dictionaries, which can be deleted/swapped almost any time. If you need more information on removing locked, user-installed dictprob.dat and statdict.dat files, let me know and I elaborate on this subject later.)

Unfortunately, I don't know the internal format of these two files. statdict.dat has a directory at the beginning; perhaps that'd be a good starting point in reverse engineering. Indeed, it'd be great to be able to use custom word completion dictionaries of small languages ignored by Microsoft. Please see some (for example, Finnish) of my custom dictionaries, along with the optimizing converters (sources, as always, included!) here (http://www.firstloox.org//forums/showthread.php?t=4200).

4. Custom Dictionaries

Both the Word spellcheck dictionary and the system-wide, generic word completion dictionary engines support user dictionaries. In this section, I scrutinize them.

4.1 The Word Spellcheck Custom Dictionary

When you spellcheck a document in Pocket Word and choose Add in the spell checker menu, the new word will be added to the file \Windows\custom.dic.

Please note that this file is in ASCII and has CRLF-separated (that is, simple Windows Enter) words. This means you can just copy (without any need for conversion) c:\Documents and Settings\< username >\Application Data\Microsoft\Proof\CUSTOM.DIC from your desktop to your \Windows on your PDA; after this, Pocket Word will be able to use the new dictionary.

4.2 The Generic Word Completion Custom Dictionary

Unfortunately, unlike the case with the word spellcheck custom dictionary, the two files, dyncompdict.tmp and dyncompdict.dat used with the generic (and, therefore, much more useful) dictionary isn't as useful/easy to transfer.

First, it has its own, binary format, unlike Word's \Windows\custom.dic; consequently, you must use a custom converter to convert your, say, custom dictionaries in the new format.

Second, its size is limited to 9000 bytes. If you, for example, convert a file that is bigger than 9000 bytes, it'll simply be ignored by the system. This is an annoyance with the Pocket PC/Windows Mobile operating system that should be fixed by Microsoft as soon as possible.

Please read this thread (http://www.pocketpcthoughts.com/forums/viewtopic.php?t=41049) for more information on the format of these custom dictionaries and for my conversion/optimization (so that you can stuff in as many really different words as possible into the 9000 bytes) utilities. (Sorry for the constant linking – I don't want to repeat tens of thousands of already-written characters here.)

Comments (even negative comments!) are, as usual, welcome.

marisa8184
07-28-2007, 09:04 AM
Re:

4.2 The Generic Word Completion Custom Dictionary

dyncompdict.tmp and dyncompdict.dat

It appears that the system has changed under WM6 (Professional). Do you know where this word completion is now kept?

vicott
08-11-2007, 07:06 AM
From what I understand, these are the relevant filenames:
1) statdict.dat -> statdict.0409.dat
2) dictprob.dat -> dictprob.0409.dat
3) autocorrect.txt -> autocorrect.0409.dat
4) custom.dic -> custom.dic
5) mssp2.lex -> mssp2.lex
6) dyncompdict.dat (or .tmp) -> ???

From what is seen, "0409" probably refers to the locale.

marisa8184
08-16-2007, 07:29 PM
Yes, it's that last one (point 6) which nobody can find.

It's very irritating!