CN1101577C

CN1101577C - Speech input memorandum

Info

Publication number: CN1101577C
Application number: CN 98117693
Authority: CN
Inventors: 刘迎建; 马梁
Original assignee: ZHONGZI HANGWANG SCIENCE AND TECHNOLOGY Co BEIJING
Current assignee: Hanwang Science and Technology Co., Ltd., Beijing
Priority date: 1998-09-09
Filing date: 1998-09-09
Publication date: 2003-02-12
Anticipated expiration: 2018-09-09
Also published as: CN1247347A

Abstract

The present invention relates to a speech input memorandum which belongs to a personal digital assistant, namely a called electronic memorandum, PDA for short. The present invention is a PDA product based on speech, and can record a name in the function of an ordinary PDA name card in a speech mode. In an inquiry process, as long as a human name is spoken, corresponding phone numbers can be displayed; the speech is used for reading the human name and company/address information, wherein the phone numbers can also be inputted by the speech. The present invention can also store 30 minutes of digital records in a segmented mode; the records can be played in preset time. The present invention can complete functions identical with those of a traditional PDA stroke and a traditional PDA memorandum in the speech mode.

Description

Speech input memorandum

The invention belongs to " personal digital assistant " (Personal Digital Assistant), promptly usually said electronic notebook is called for short " PDA ".

At present, the PDA product can be divided into two big classes:

With the low-grade PDA headed by homemade " Wenquxing " and Hong Kong " electricity is translated logical ", be characterized in keyboard input, adopt encoding schemes such as phonetic or five, convert the GB Chinese character to, store or inquire about.Be characterized in simple in structure, memory consumption is few, and CPU requires low.Thereby price is very low.But because keyboard is little, operation inconvenience.Add that the viewing area is little, the candidate of phonetic is many, and the phonetic input is had any problem.And many people can five or spelling input method, can't use.

Also have with the Taiwan instant translator, translate logically well, the high-grade PDA of the artificial representative of Hong Kong name adopts handwriting input, a lot of specific functions is arranged, as infrared interface, digital recording, beeper reception etc.Because carry out handwriting input, these products all are furnished with bigger LCD, pressure type touch-screen, the RAM that 128K is above, and 1 to 4,000,000 ROM.Also very high to the requirement of CPU simultaneously, processing power all meets or exceeds the level of 286-16.Therefore their price is all very high.The input method of handwriting recognition is more humane, but CPU processing power and storage space is required very high, and the system requirements of for example intelligent pen are: Pentium is more than 75,8,000,000 internal memories.Because the restriction of PDA processing power, the process of writing are still not too convenient, require carefully and neatly to write, and the discrimination of its handwriting input, adaptive faculty and speed, all also far short of what is expected than the identification software of PC version.A lot of people feel and are inconvenient.

At present, in PDA, also do not possess the Chinese speech input function,, can only be used in the mainframe computer though the phonetic entry of English and numeral thereof is abroad arranged.

The objective of the invention is to designed PDA and have following characteristics:

1. when " name " in the business card function imported, read name, the feature with sound identification module analysis and extraction voice stores in the cog region of storer.Simultaneously with compression module with compress speech, store in the recording zone of storer.

2. when retrieval, extract the sound characteristic of the name of reading with sound identification module, and with storer in the feature comparison, find corresponding clauses and subclauses.

3. input digit and English adopt the little vocabulary speech recognition of specific people, compare with the eigenwert of extracting in advance with the eigenwert of sound identification module with the voice of institute's reading word, letter, identify corresponding digital or letter.

4. when consulting content, with the broadcast of contracing of the voice decompress(ion) of name, address in the storer recording zone.But not common demonstration Chinese character.

5. when consulting content, telephone number is except that character display, and also available massage voice reading comes out.

Purpose of the present invention realizes in the following manner:

As shown in Figure 1, microphone is connected with the input amplifilter; The input amplifilter is connected with high-precision a/d converter; A/D converter is connected with speech recognition device and compress speech/decompression machine simultaneously; Speech recognition device links to each other with main control microprocessor by data bus and control signal wire.Speech compressor links to each other with main control microprocessor by data bus and control signal wire.Microprocessor is connected with storer, and wherein memory inside is divided into characteristic storage district and recording memory block.

The effect of this circuit is with the simulating signal of phonetic entry, is converted to digital code stream, and the characteristic extracting module of speech recognition device is extracted phonetic feature from speech data, and passes to microprocessor by data bus.Speech compressor is also passed to microprocessor by data bus after speech data is compressed.After processor obtains speech data and phonetic feature, they are deposited in respectively in the zones of different of storer use for identification and playback.

Microprocessor also is connected with decompressor, and decompressor is connected with D/A converter, and D/A converter is connected with the output amplifilter, and the output amplifilter is connected with loudspeaker.

The effect of this circuit is with the reduction of the speech data after the compression and plays.

Microprocessor also links to each other with the feature comparison module of speech recognition device by data bus and control signal wire.Speech recognition device directly links to each other with storer again.And microprocessor also is connected with display.

The effect of this circuit is feature comparison module, the eigenwert in the usefulness storer and this value comparison that speech recognition device passed to phonetic feature by microprocessor.And the result passed to microprocessor, to search information such as telephone number corresponding, phone is shown by display, and by the loudspeaker plays relevant information.

Outward appearance of the present invention is as shown in Figure 2: this notepad can be held in the palm, one-handed performance.And circuit output plughole 1 is housed, acknowledgement key 2, cancel key 3, loudspeaker 4, display screen 5, knob 6 and microphone 7.

The present invention compares with existing PDA.Have the following advantages:

Speech input memorandum does not need speech conversion is become literal, and is low to system requirements, do not miss the knowledge problem, and input easily.Ten numerals adopt the speech recognition input, and because of vocabulary is little, discrimination is very high, and speed is fast.Adopt the method for specific people's study, do not have the restriction of accent, language.Therefore phonetic entry is the input method of present most convenient.All inputs and inquiry, reading process all realize that by voice all operations all has voice suggestion, and blind person or the people who has defective vision also can easily use, and technology maturation, the low-grade PDA of close prices.

Description of drawings:

Fig. 1 is the speech input memorandum system diagram

Fig. 2 is the speech input memorandum outside drawing:

1-circuit output plughole; 2-acknowledgement key; 3-cancel key; 4-loudspeaker; 5-show

Screen; 6-knob; 7-microphone.

Fig. 3 is the voice output process flow diagram

Fig. 4 is the speech retrieval process flow diagram

Fig. 5 is the phonetic entry process flow diagram of numeral

Fig. 6 is the phonetic entry process flow diagram

Embodiment;

The present invention can use 2 joint 3A batteries, can use more than 30 days.Product appearance as shown in Figure 2, shell is streamlined, elegant in appearance, small and exquisite and easy to operate.The 16*80 lattice type LCD is adopted in positive demonstration, and 16 more eye-catching icons are arranged.

Knob and affirmation, cancel key are housed respectively in the left and right sides of product, and the upper end is circuit delivery outlet and microphone, and the lower end is provided with loudspeaker.

Adjusting knob upwards transfers increment to, is downwards decrement, can select digital 0-9 easily, alphabetical A-Z and each menu option.

Affirmation/cancel key: the outward appearance long strip type, respectively there is a switch at inner two ends up and down.This key is confirmed by top, is cancellation by the bottom.

The recorded content of this notepad:

Can write down 200 business cards or 30 minutes voice, after business card increased, record length can shorten.

Each business card has maximum 2 seconds name recording and personal information recording in maximum 5 seconds.Maximum 5 telephone numbers: phone 1, phone 2, fax, home telephone, BP.

Digital recording can be divided into multistage, amounts to maximum 30 minutes, all can set reproduction time for every section, then begins automatically to play, as appointment/stroke reminding.

Circuit output and microphone interface:

Use this interface, the user can connect earphone and the microphone of oneself.Also can computing machine be transcribed in recording and carry out speech recognition by circuit output.

The course of work of this notepad is as follows:

One, phonetic entry: the analog voice signal of microphone input, entering the input amplifilter becomes analog electrical signal, is converted to digital code stream by A/D converter again, passes to speech recognition device and speech compressor simultaneously.The characteristic extracting module of speech recognition device is extracted phonetic feature from speech data, and passes to microprocessor by data bus.Speech compressor is also passed to microprocessor by data bus after speech data is compressed.After processor obtains speech data and phonetic feature, they are deposited in respectively in the zones of different of storer.

During retrieval, sound identification module, the eigenwert in the speech recognition device usefulness storer and this value comparison of speech recognition device passed to phonetic feature by processor.And the result passed to microprocessor.

Two, voice output: as shown in Figure 3, behind the playback, microprocessor control voice decompressor and D/A converter are started working, microprocessor takes out the compressed voice data from storer simultaneously, pass to the voice decompressor, the data stream after decompressor will decompress is exported to D/A converter, is converted to simulating signal, after the amplification of output amplifilter, export by loudspeaker.

Three, speech retrieval: as shown in Figure 4, microprocessor control display display reminding information, and control A/D converter, speech compressor and speech recognition device are started working.The voice signal that microphone is gathered through amplification, filtering, carries out the A/D conversion with A/D converter to voice signal, obtains data stream.The characteristic extracting module of speech recognition device is extracted phonetic feature, and passes to microprocessor, and microprocessor controlling features comparison module compares the phonetic feature of the name stored in this phonetic feature and the storer, and the result is passed to microprocessor.Microprocessor shows corresponding recorded information according to the result, and enters editing mode.

Operate as follows: 1. primary option menu:

At first enter master menu after the start, turning knob, cursor switches between options such as " retrieval business card ", " browse business card ", " consulting recording ", " input business card ", " recording ", " default ", " internal memory situation ", " time set ", " voice training ", " recreation ", and dot matrix LCD goes up the icon that shows each function.Cursor is along with selecting button to move, and after stopping 1 second, the term pronunciation goes out this function title.Enter function by acknowledgement key, return previous menu by " cancellation ".2. business card input

Enter " business card input " function, cursor moves between " name input ", " personal information ", " input of name phonetic ", " phone input ".

1) name input service process as shown in Figure 4.

Operate as followsly, earlier cursor is moved on on " name ", one second attonity, then voice suggestion " please be read name ", pins " affirmation ", microprocesser initialization A/D converter, speech compressor and speech recognition device.Begin recording then, acknowledgement key unclamps the back to be finished, and surpasses 2 seconds, overtime warning.Repeat to record and then cover last content.The voice signal that microphone is gathered through amplification, filtering, carries out the A/D conversion with A/D converter to voice signal.Speech recognition device extracts phonetic feature, and deposits characteristic in storer by control chip.Simultaneously, speech compressor deposits data compression in storer by control chip.

2) personal information input: earlier cursor is moved on on " personal information ", one second attonity, then voice suggestion " personal information " is pinned " affirmation " and is begun recording, and finish the back of loosing one's grip, and surpasses 5 seconds, overtime warning.Repeat to record and to cover last content.In the Recording Process,, and extract phonetic feature without recognizer only with compression storage after the voice digitization.

3) telephone number input:

Available knob or phonetic entry, digital speech input service process as shown in Figure 5.

Operate as followsly, with selecting button that cursor is moved on to: any in phone 1, phone 2, home telephone, fax, the beeper enters by " affirmation ".Rotate to select button can be in phonetic entry, digital 0-9, space, return and select between deleting, finishing.During display digit 0-9, import and enter next numeral by " affirmation ", choosing " is returned and is deleted " and then deletes a numeral, pins back and deletes then full scale clearance in 2 seconds.Wherein, phonetic entry, return to delete and be icon, when choosing phonetic entry, pin " affirmation ", read a numeral, unclamp back identification, microprocessor initialization A/D of elder generation and speech recognition device, the voice signal that microphone is gathered through amplification, filtering, carries out the A/D conversion with A/D converter to voice signal.Identification module extracts phonetic feature, and with storer in the phonetic feature of 0-9 compare, and the result is passed to microprocessor.The microprocessor control display screen shows corresponding digital 0-9, enters the next one then automatically.Then store and get back to the upper level option by " cancellation " key.

4) by " cancellation " key, whether prompting stores, and returns the retrieval of master menu 3. business cards after the user selects:

Browse and the speech retrieval dual mode.

1) speech retrieval:

The retrieval course of work as shown in Figure 6.

Operate as followsly, select business card retrieval under the master menu, begin recording with LCD and voice suggestion user.The user pins " affirmation " key and reads name simultaneously and unclamp then.Microprocessor control A/D, speech recognition device are started working.The voice signal that microphone is gathered through amplification, filtering, carries out the A/D conversion with A/D converter to voice signal.Identification module extracts phonetic feature, and with storer in the phonetic feature of the name stored compare, and the result is passed to microprocessor.Microprocessor shows corresponding recorded information according to the result.Call over name, address, phone simultaneously.Stop and entering editing mode by " affirmation ".When the close candidate of a plurality of pronunciations was arranged, if rotary knob then can switch between each candidate's business card, display showed sequence number and first telephone number.Stopped 1 second, and then called over name, personal information and phone.Stop and entering editing mode by " affirmation " key.Return master menu by " cancellation " key.

2) browse:

After entering, cursor is moved between name, remarks, phone and " deletion business card " icon, when moving on to name, address, stop to read after one second with knob.

Enter respective selection by " affirmation ", make amendment, then return upper level by " cancellation ".4. numeral and English alphabet recording:

Adopt the little vocabulary speech recognition of specific people, specific people is meant: recognizer is trained at specific user's sound pronunciation.To unbred user, the low or not identification of discrimination.Little vocabulary is meant: identified range is confined to (be generally less than 1000) in the limited vocabulary, rather than instructs arbitrarily or statement.Identifying is exactly the voice phrase of input and the voice phrase of setting are in advance compared and to mate, and finds out the most close result.

1) recording

Operate as followsly, select sound-recording function, pin acknowledgement key, begin recording.

Whether the inquiry of recording back regularly broadcasts, and selects "Yes" then to continue to require setting-up time.Then can broadcast.

2) retrieval

Can only check in proper order.Show record length when checking, length and playback duration are set.Began after stopping 1 second to play.By after " affirmations " enter, can revise timing reproduction time or deletion.5. system's setting

Identification parameter, volume and voice suggestion can be set opens or closes.

Each operation all has Chinese character or figure and voice suggestion, and wherein voice suggestion can be closed.6.0-9 the study of numeral

Each numeral reads twice, to voice gather and digitizing after, extract feature and be kept in the storer with recognizer.Behind the learning success with regard to available phonetic entry numeral.

Claims

1. one kind by microphone, the speech input memorandum that input amplifilter, A/D converter, D/A converter, speech recognition device, speech compressor, voice decompressor, data bus, control signal wire, microprocessor, output amplifilter, loudspeaker, storer, button, knob, display screen constitute is characterized in that: microphone is connected with the input amplifilter; The input amplifilter is connected with high-precision a/d converter; A/D converter is connected with speech recognition device and compress speech/decompression machine simultaneously; Speech recognition device links to each other with microprocessor by data bus and control signal wire; Speech compressor links to each other with microprocessor by data bus and control signal wire; Microprocessor is connected with storer, and wherein memory inside is divided into characteristic storage district and recording memory block; Microprocessor also is connected with the voice decompressor, and the voice decompressor is connected with D/A converter, and D/A converter is connected with the output amplifilter, and the output amplifilter is connected with loudspeaker; Microprocessor also links to each other with the voice comparison module of speech recognition device by data bus and control signal wire; Speech recognition device directly links to each other with storer again; Microprocessor also is connected with display.

2. the described speech input memorandum of claim 1 is characterized in that the shell of notepad is equipped with microphone, and acknowledgement key is set, cancel key, knob, display screen and loudspeaker.