Windows Phone Thoughts - Daily News, Views, Rants and Raves

Check out the hottest Windows Mobile devices at our Expansys store!


Digital Home Thoughts

Loading feed...

Laptop Thoughts

Loading feed...

Android Thoughts

Loading feed...




Go Back   Thoughts Media Forums > WINDOWS PHONE THOUGHTS > Windows Phone Software

Reply
 
Thread Tools Display Modes
  #1  
Old 06-24-2005, 08:05 PM
Menneisyys
5000+ Posts? I Should OWN This Site!
Join Date: Jun 2007
Posts: 5,067
Default Create your own custom word dictionary - a working, free solution

I've continued playing with dyncompdict.dat (the user dictionary that the operating system collects the non-English words entered by the user) and found out the following, testing on both WM2003 (iPAQ 2210) and WM2003SE (PL 720). (Please see this thread for a discussion of the dyncompdict.dat file format and links for other tools.)

- the value of the byte leading each word is its weight. The bigger this value, the more chance that it will be offered first to the user. Its starting value is 3 (it's, therefore, advisable to choose 3 for all words while importing custom textual dictionaries under Windows CE); often-entered words may even have the value 30-40. This value is dynamically updated by the operating system, depending on the frequency of usage.

- the worst news: the system can't make use of dyncompdict.dat files bigger than 7-8 kbytes. (You may have more success with your files though). I've played around with accents, with eliminating words that may be also included in the main dictionary to circumvent this restriction - but in vain. Unfortunately, this seems to be an operating system restriction. It seems I have to re-engineer and, then, shadow the dictionary shipped with Pocket PC some day to fix this problem...

- do not include words that are shorter than 3-4 characters. They will render the entire file ignored. (I've implemented filtering out files like these in my converter.)

- forget the Kaisoft program. It is very slow and an absolutely useless - a very bad bargain for 20 bucks. I've tested it - it clearly shows how a program should NOT be written, optimization-wise.

- also, the Alex Feinman word editor can be ignored if you enter your words in an Enter-separated file and run in through my converter.

Now, for the most important part: if you need to convert your user dictionaries, for example, the standard Office user.dic (in non-English Offices, it's named other; for example, in the Finnish Office, it's called oma.dic), use my tool. Here's the source for my dyncompdict.dat generator. Just pass it the name of the import file and it will create a dyncompdict.dat for you. I've also compiled it and even tested under the major Pocket PC JVM's. It works flawlessly.

On your desktop computer, you only need to issue the

java -jar GenerateCustomDictWordList.jar <input dictionary name>

command (assuming you've installed a JDK or a JRE).

The batch files to run the utility right on your PPC are as follows. The generated dyncompdict.dat file will be created in the root (\) directory. I've created batch files (and tested them) for NetFront 3.1+, CrEme, Jeode and IBM J9 PJava. Please read this thread for more information on getting them if you don't have any of them (not even the Java-enabled Netfront - perhaps this is the best solution because you may already have it on your Pocket PC, which makes it unnecessary to install another JVM).

To run the converter on the PDA, make a directory on it with the name, say, \conv. Copy the JAR file and one of the following batch files (I've also listed their contents here).

NetFront jvlite (tested with NF 3.1):

"\Program Files\NetFront3\jvlite.exe" -classpath \conv\GenerateCustomDictWordList.jar GenerateCustomDictWordList %1

CrEme 3.x/4.x (tested with 3.26):

"\SD-MMCard\creme\bin\CrEme.exe" -Ob -classpath \conv\GenerateCustomDictWordList.jar GenerateCustomDictWordList %1

Jeode (tested with 1.7.3):

\SD-MMCard\jeode\evm.exe -Djeode.evm.console.local.keep=true -cp \conv\GenerateCustomDictWordList.jar GenerateCustomDictWordList %1

IBM J9 PJ (tested with 5.7.2):

"\SD-MMCard\IBM PPRO10\bin\J9.exe" "-jclpro10" -cp \conv\GenerateCustomDictWordList.jar GenerateCustomDictWordList %1

Please modify the path (the beginning of the batch files) to the executable according to the path your JVM/NetFront is installed to!

The next step is installing PPC Command Shell - please see my post on doing this here (second page, third post) - to provide a console to run batch files to avoid the need for editing and running link files with wired-in filenames (yes, icon-only operating systems have their own letdowns).

After installing it, fire up the console and go to \conv. Assuming you've already copied here the .BAT batch file(s) you want to invoke your JVM('s) with, GenerateCustomDictWordList.jar and the source dictionary, you can start working right away by issuing the <batchname> \conv\<user dictionary name> command. (Note that you need to supply the full directory path, here, \conv to the JVM so that it finds the input file.)

An example screenshot:



I may rewrite/repackage this app using the CF framework some day (but not in the very near future!) so that it can be run without any JVM and can have a GUI.
 
Reply With Quote
  #2  
Old 06-25-2005, 03:04 AM
nategesner
Intellectual
Join Date: Jul 2003
Posts: 123

Man, I sure appreciate the effort you've put into this. Unfortunately, the amount of time it would take me to understand your instructions and execute them (assuming there are no glitches) would take more time than I would save by adding words to my dictionary.

I think I'll just wait until WM05 comes out and see if there's something better. Thanks again for the effort!!
 
Reply With Quote
  #3  
Old 06-25-2005, 10:03 AM
Menneisyys
5000+ Posts? I Should OWN This Site!
Join Date: Jun 2007
Posts: 5,067

Quote:
Originally Posted by nategesner
Man, I sure appreciate the effort you've put into this. Unfortunately, the amount of time it would take me to understand your instructions and execute them (assuming there are no glitches) would take more time than I would save by adding words to my dictionary.
Well, actually, it's not that complicated. Just a few steps are needed:

1, first, check if you have Java on your desktop computer. Go to Start Menu/Run and enter Command. In the console window that comes up, enter java. if you see something like "Usage: java [-options] class [args...]" (and not the usual "not found") message, then, you have a Java system on your PC and you can ignore step 2 - go directly to step 3.

2, get the 1.5 JRE if you don't have any Java system on your Windows PC. Go to http://java.sun.com/j2se/1.5.0/download.jsp and choose Download JRE 5.0 Update X. (you can also download the JDK if you not only want to run, but also to compile and develop Java programs on your PC, but it's considerably bigger). On the next screen, Accept the agreement, click the "Windows Offline Installation, Multi-language" distribution, download the file and run it.

3, copy the GenerateCustomDictWordList.jar file I've provided to the same directory where your custom dictionary file is located with a file handler app.

4, go to this directory in the Command window too and issue the

java -jar GenerateCustomDictWordList.jar <filename>

command, where <filename> is the name of the custom dictionary file.

A dyncompdict.dat file will be created in this directory; just copy it over to your PDA's \Windows directory. For this, I recommend Total Commander with the WinCE FS plug-in for this because it's an excellent tool for accessing your Pocket PC. Please read http://pocketpcmag.com/forum/topic.asp?TOPIC_ID=15577 on installing it.
 
Reply With Quote
  #4  
Old 06-29-2005, 02:50 PM
Menneisyys
5000+ Posts? I Should OWN This Site!
Join Date: Jun 2007
Posts: 5,067

I've updated the algorithm of my generator:

- it contains in-code sorting - the source dictionary no longer needs to be alphabetically sorted.

- it contains a lot of string manipulation and comparison to filter out (largely) similar words, and only keep the longest of them (to a certain degree, that is). This way, you can squeeze about two times more words in the custom dictionary than before.

Both the Java source and the JAR file have been updated.

Please note that it seems that the threshold is around 9000 bytes. Above that, the system won't read the file.
 
Reply With Quote
  #5  
Old 07-01-2005, 04:49 PM
Ekkie Tepsupornchai
Magi
Join Date: Feb 2002
Posts: 2,386

Menneisyys... You are an animal! This is fantastic stuff man!
 
Reply With Quote
  #6  
Old 07-03-2005, 07:41 PM
Menneisyys
5000+ Posts? I Should OWN This Site!
Join Date: Jun 2007
Posts: 5,067

Quote:
Originally Posted by Ekkie Tepsupornchai
Menneisyys... You are an animal! This is fantastic stuff man!
Thanks
 
Reply With Quote
  #7  
Old 10-25-2005, 07:38 PM
Menneisyys
5000+ Posts? I Should OWN This Site!
Join Date: Jun 2007
Posts: 5,067

In the meantime, a freeware tool, MikkoPPC, has been released, which is also able to compile a dyncompdict.dat file. (It, however, doesn't contain any kind of optimization, unlike my Java sources.)

Its documentation mentions the slightly, size-wise, outdated information I've provided in the first post in this thread. My later (see for example this article) experiments have shown that the dictionary file should not be larger than exactly 9000 bytes.
 
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:20 AM.