Log in

View Full Version : Project Gutenberg and MS Word


tregnier
02-17-2004, 02:54 PM
I have downloaded several books from the plain text Project Gutenberg files. I put the text into MS Word and then use the MS Word/Microsoft Reader conversion tool to convert the Word doc into a MS Reader file. The problem is that there are line returns in the Word file that cause the Reader file to be choppy.

Question: Have some of you found a way to address this? What are the solutions?

Thanks in advance!

binstpa
02-17-2004, 03:14 PM
I just use ubook. You can take the .txt file and copy it to your device and can avoid the whole conversion process in the first place.

tregnier
02-17-2004, 03:42 PM
Yes, this works great. uBook is the way to go. I appreciate your help, "binstpa"! :D

Jorgen
02-17-2004, 04:14 PM
>Question: Have some of you found a way to address this? What are the solutions?

Yes, I wrote a program that converts each paragraph to a long line. It can also deal with various other formats.

However, it is DOS based and therefore apparently beyond what most people can work with. Experience from other DOS programs I have published shown that I would end up with lots of emails asking for support. :(

You are welcome to send me a private message with an email address if you want the .EXE file and can work in a DOS box.

Jorgen

AndrewBurke
03-22-2004, 05:13 AM
Find the macro menu item in your version of Word and choose to record a new macro, give it a descriptive name and, for now assign it an easy to remember shortcut key (this can, of course be changed later when you decide to keep the macro). When text is pasted from an HTML screen proper paragraph code is replaced by two paragraph marks so you want to keep these; the first step is to replace these temporarily by some text that won't be wiped out by mistake.

With the macro recorder running choose Edit/Replace and choose to search the whole document. In the search field type "^P^P^P^P^P^P" (without the quotes) and in the replace field type "^P^P" and let the replace engine do it's thing. Repeat this process while still recording with 5, 4 and 3 paragraph marks in the search field and 2 in the replace. What this does is take any gap larger than two lines between text and reduce it to a proper paragraph gap. Now that we have the proper gaps established we stash them by replacing "^P^P" with "~~~" and move onto the next step which is replacing "^P" (the erroneous line breaks we want to get rid of) with a space.

This macro isn't over yet as the final phase is to replace "~~~" with "^P^P" and put back the "proper" formatting. As you can see, if we didn't use the "~~~" trick then the final search would have wiped out all the paragraph marks.

Good luck,
Andrew

portnoy
04-04-2004, 05:00 PM
Ok, this may seem a little strange, but I use a utility called "email stripper" to do exactly that. It is actually for removing the leading carats in forwarded messages, but it also reformats the text really well.

http://www.papercut.biz/emailStripper.htm

xendula
04-04-2004, 08:23 PM
Wow, this emailstripper is really idiot-proof. But it does not seem to manage long texts, at least the one I tried seemed to be too long. Still, nice little program, portnoy!

Jorgen
04-06-2004, 08:47 AM
But it does not seem to manage long texts, at least the one I tried seemed to be too long.

That is the problem with many of the text tools you find on the Internet: they either have problems with long paragraphs or problems with very long texts. And you won't know until it is too late. :(

That's why I wrote my own. I don't think it has any limitatations.

Jorgen

Cleisthenes
04-20-2004, 09:29 PM
does anybody make their own mp3 audiobooks? i've made lots from either .txt or .doc files using a tts program. i've searched the archives to see if anyone discusses this, but perhaps i missed it. anyway, i use Word to strip the files of the carats etc (so they don't mess up the tts voices) and make all my ebooks audiobooks. i find i listen to a lot more than i used to read. does anyone have any other techniques?

xendula
04-22-2004, 08:32 PM
does anybody make their own mp3 audiobooks? i've made lots from either .txt or .doc files using a tts program. i've searched the archives to see if anyone discusses this, but perhaps i missed it. anyway, i use Word to strip the files of the carats etc (so they don't mess up the tts voices) and make all my ebooks audiobooks. i find i listen to a lot more than i used to read. does anyone have any other techniques?
Sounds intriguing, but:
-what is tts?!? (text to speach?)
-what does a tts program do?!?
-which programs are out there and which one is best?/easiest to use?
-what are carats (the only ones I know are the ones that go on your finger :wink: )?!?
-how do carats mess up tts voices?!?

You see, a lot of interest and a lot of questions mean you'd have to post a little guide for absolute beginners - please!!!

tourdewolf
04-25-2004, 05:12 AM
I use Nextup's Text Aloud, works great!
http://www.nextup.com

Cleisthenes
04-25-2004, 12:58 PM
xendula wrote: "Sounds intriguing, but:
-what is tts?!? (text to speach?)
-what does a tts program do?!?
-which programs are out there and which one is best?/easiest to use?
-what are carats (the only ones I know are the ones that go on your finger )?!?
-how do carats mess up tts voices?!?

You see, a lot of interest and a lot of questions mean you'd have to post a little guide for absolute beginners - please!!!"

xendula: yes, tts = text to speech. as the last post prior to this one mentioned, textaloud is, imho, the best around. i've tried a lot of others, both for pc & mac, but textaloud's software proved to be the best. i didn't get great performance out of textaloud's bundled voices, however, which are the standard microsoft speech voices. it is well worth the $ to buy (i think, like, $49) ATT naturalvoices. the voices of crystal and mike come with the basic att package. crystal's voice is best because mike's bass tones seem to distort at higher volumes and runs the syllables together more than crystal's. carats, i.e. the > symbol, that appears in some email text and other symbols like / should, to my listening comprehensibilities, be eliminated before creating a wav or mp3 file from the text - which is why i 'wash' the text in ms word prior to loading it in textaloud for recording. during my making of hundreds of hours of text, i've developed certain 'rules' for how washing in ms word should be done (e.g. substituting semicolons for various words and other symbols) that result in the most comprehensible mp3. if you do go so far as to use textaloud and att naturalvoices, i'd be happy to detail my 'washing rules' for you.
i subscribe to audible.com, but, although i continue to subscribe, my own custom-made audiobooks are on way better and more varied subject matter. they are especially valuable if you are a writer and want to hear how your prose sounds rather than reads.
happy listening,
cleisthenes

xendula
04-26-2004, 08:35 PM
carats, i.e. the > symbol, that appears in some email text and other symbols like / should, to my listening comprehensibilities, be eliminated before creating a wav or mp3 file from the text - which is why i 'wash' the text in ms word prior to loading it in textaloud for recording. during my making of hundreds of hours of text, i've developed certain 'rules' for how washing in ms word should be done (e.g. substituting semicolons for various words and other symbols) that result in the most comprehensible mp3. if you do go so far as to use textaloud and att naturalvoices, i'd be happy to detail my 'washing rules' for you.
OK, I'm downloading the demo right now. Would you share your 'laundry technique' :wink: with us? This is interesting even for people who don't care about text to speech, so it's not too far off the topic.

Cleisthenes
04-28-2004, 12:20 PM
xendula:
eliminate apostrophes (this eliminates ATT Crystal from saying the word "apostrophe" without diminishing syntactic comprehension)
replace right brackets, left brackets, right parentheses, left parentheses, quotation marks, dashes, elipses and slashes with a semicolon (this inserts appropriate pauses in the tts speech)
change all II, i.e. roman numeral 2, to I;I (because Crystal elides the two roman numerals, thus making King Charles I - which she pronounces as "king charles eye" - and King Charles II - which she pronounces as "king charles eye"; thus the substitution of I;I for II causes Crystal to pronounce King Charles II as "king charles eye eye.") similarly, change all III's to I;I;I. no other roman numerals need to be altered.
also, remember the default voices on the TextAloud are of poor quality. to appreciate what it can do, it needs the ATT NaturalVoices.
cheers,
cleisthenes

xendula
04-28-2004, 08:52 PM
Thanks for all the info, Cleisthenes, will play a little with it and see how I like it.

Xendula