Windows Phone Thoughts - Daily News, Views, Rants and Raves

Check out the hottest Windows Mobile devices at our Expansys store!


Digital Home Thoughts

Loading feed...

Laptop Thoughts

Loading feed...

Android Thoughts

Loading feed...




Go Back   Thoughts Media Forums > WINDOWS PHONE THOUGHTS > Windows Phone News

Reply
 
Thread Tools Display Modes
  #1  
Old 07-15-2003, 02:40 PM
Andy Sjostrom
Pontificator
Join Date: Aug 2006
Posts: 1,177
Default Speech Recognition On Track

http://www.computerworld.com.au/ind...08&fp=16&fpid=0

We're slowly seeing more and more pieces of the speech recognition puzzle falling into place. Last year I wrote twice about Microsoft's SALT (Speech Application Language Tags) specification in the posts "Say Hi! to your Pocket PC" and Speech enabled Pocket Internet Explorer next year?. I recommend re-visiting them before reading on!

The Computerworld article "Microsoft releases Speech Server beta" says "Microsoft on Wednesday moved toward the integration of call centers and the Web with the release of the first public beta of its Microsoft Speech Server and a new beta version of its Speech Application Software Development Kit (SDK). ... Companies that need call centers can cut costs by automating them on the server, said Xuedong Huang, general manager for Microsoft speech technologies. Among other things, the server can interpret callers' requests and provide recorded or synthesized responses. Developers also can integrate the voice-based services with Web-based applications that can continue to run on a Web server as they do now. For example, a caller could ask for a stock quote verbally and have it displayed on a handheld device, he said."

Speech recognition is a processor, memory and storage intensive task. It will take quite some time before mobile devices are up to the task with any precision. Meanwhile and as connectivity increases, we will see more mobile devices connect to speech recognition servers that can process sound and return information in the form of text or application commands.

With regards to the alternative specification, VoiceXML, I wrote in one of my previous posts: "I generally feel Microsoft should play with the others in the W3C. Hopefully, they will merge over time, or at least define integration mechanisms". I am glad that the Computerworld article concludes: "Both Plakias and Microsoft's Huang look to the two specifications eventually merging under the W3C. Plakias said that could happen as soon as the end of 2004. Huang was less specific. "We want to find a way to converge with Voice XML, but how we're going to do that, I don't know," Huang said."

Very cool!
 
Reply With Quote
  #2  
Old 07-15-2003, 04:01 PM
egads
Theorist
Join Date: Aug 2006
Posts: 276

Great, its bad enough that we have to listen to people gab on their cell phones in restaurants and public places, now we have to listen to them talk to their PPC's :grumble:
 
Reply With Quote
  #3  
Old 07-15-2003, 06:13 PM
axe
Intellectual
Join Date: Aug 2006
Posts: 118
Default PC speech

I was cleaning out my basement this weekend and came across some old floppies (yeah that's 5 1/4 floppies!). Those go back to my XT machine. I think I could probably dig up floppies that go back to my C64 if i dug hard enough... Anyway, this weekend I found a "text to speech" disk. It struck me then that the text-to-speech even has not really progressed very far, and certainly the speech recognition hasn't.
Everytime I read about this, the common complaint is "this requires too much processing/RAM etc to be viable yet".
But how many tens-of-times faster is my current Athlon compared to my old 8088? And how much RAM to I have now? and they STILL can't get a better synthesis than the SAM-voice that dates back to my C64 in even the current PC MS-Reader text to speech add-in or IM client. I should be able to install my favourite action hero as the voice now and it should be crystal clear not the old "Do you want to play a game?" :robot: voice. Even my PPC is 50x faster and 60x the RAM my XT had. It should be able to read me text-based books or files or email when I'm on the road at the touch of a button or even by the coveted voice-commands. I am truely surprised there has not been more innovation here considering the thousands of road-warriors our there. There should be all kinds of killer apps.

Not being a programmer there isn't much I can do to help fill this obvious niche market, but there HAS to be money to be made here I'm just surprised there isn't a hundred BIG companies trying to take mine...

I think that the Processor/RAM problem is just a crutch, since my C64 could do the things I have seen done in the main-stream 15-20 years later.

My thoughts... :soapbox:
AXE
 
Reply With Quote
  #4  
Old 07-15-2003, 06:57 PM
egads
Theorist
Join Date: Aug 2006
Posts: 276
Default Re: PC speech

Quote:
Originally Posted by axe
I was cleaning out my basement this weekend and came across some old floppies (yeah that's 5 1/4 floppies!). Those go back to my XT machine. I think I could probably dig up floppies that go back to my C64 if i dug hard enough... Anyway, this weekend I found a "text to speech" disk. It struck me then that the text-to-speech even has not really progressed very far, and certainly the speech recognition hasn't.
Everytime I read about this, the common complaint is "this requires too much processing/RAM etc to be viable yet".
But how many tens-of-times faster is my current Athlon compared to my old 8088? And how much RAM to I have now? and they STILL can't get a better synthesis than the SAM-voice that dates back to my C64 in even the current PC MS-Reader text to speech add-in or IM client. I should be able to install my favourite action hero as the voice now and it should be crystal clear not the old "Do you want to play a game?" :robot: voice. Even my PPC is 50x faster and 60x the RAM my XT had. It should be able to read me text-based books or files or email when I'm on the road at the touch of a button or even by the coveted voice-commands. I am truely surprised there has not been more innovation here considering the thousands of road-warriors our there. There should be all kinds of killer apps.

Not being a programmer there isn't much I can do to help fill this obvious niche market, but there HAS to be money to be made here I'm just surprised there isn't a hundred BIG companies trying to take mine...

I think that the Processor/RAM problem is just a crutch, since my C64 could do the things I have seen done in the main-stream 15-20 years later.

My thoughts... :soapbox:
AXE
I agree. When OS2 Warp came out it had speech recognition built in. I was running a 66Mhz 486 with 64Meg of memory. While it was not perfect speech recognition it worked. It has been 7 years os so since Warp came out and I have yet to see something that works any better.

Like I've said in other posts, MicroSoft always gets their boosts in speed by throwing faster hardware with more memory at a problem, not by writing faster more clever software. :evil:
 
Reply With Quote
  #5  
Old 07-15-2003, 08:47 PM
michael
Ponderer
Join Date: Aug 2006
Posts: 103

SR and TTS really have come a long way. A good TTS engine needs a good amount of data, I'm working with one that takes up around a half gigabyte of disk space, a really good one takes up much more. The alternative for prompts that sound good is using an engine that can take fragments (i.e. one or more words) of recorded speech and combine them into the phrase that needs to be spoken, the trick is getting it to sound natural which is best achieved by recording the prompts well to account for whether the word is at the start of the phrase, in the middle or at the end and what the previous and next words are. As for SR thats improved a bunch, go back a few years and you required dedicated hardware and hours of training to use a system that required a pause between each word. Now all you need is a decent microphone and a litte training to be able to speak naturally for dictation. For command and control applications with a finite number of possible inputs then you don't even need training. Perfomance as gone up and requirements have gone down, it's now possible to get a great SR engine running on a PocketPC.
 
Reply With Quote
  #6  
Old 07-18-2003, 10:36 AM
Alicatt
Pupil
Join Date: Jun 2003
Posts: 24
Send a message via ICQ to Alicatt

back then I had a Kurtzweil / Xerox text to speech system it was a scsi flatbed scanner with a full length ISA card with a 68020 CPU and 4Mb ram it would scan in a document and then speak it out. It was big, clunky and expensive (�10,000 aprox) but it worked
 
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:32 PM.