CN1235320A - Voice control instruction generating device under noise environment - Google Patents
Voice control instruction generating device under noise environment Download PDFInfo
- Publication number
- CN1235320A CN1235320A CN99116104A CN99116104A CN1235320A CN 1235320 A CN1235320 A CN 1235320A CN 99116104 A CN99116104 A CN 99116104A CN 99116104 A CN99116104 A CN 99116104A CN 1235320 A CN1235320 A CN 1235320A
- Authority
- CN
- China
- Prior art keywords
- digital
- voice
- signal
- signal processor
- flash memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015654 memory Effects 0.000 claims abstract description 20
- 239000004973 liquid crystal related substance Substances 0.000 abstract 1
- 238000000034 method Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 6
- 244000287680 Garcinia dulcis Species 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 210000001260 vocal cord Anatomy 0.000 description 3
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Images
Landscapes
- Electrically Operated Instructional Devices (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a voice control instruction generating device under a noise environment, which comprises an analog-digital converter, a digital-analog converter, a liquid crystal display, a power supply, a loudspeaker, a voice recognizer and the like, and is characterized in that the analog-digital converter and the digital-analog converter are connected with a throat microphone through a filter, and the voice recognizer is formed by connecting flash memories and (II), a combinational logic device, an encoder, a driver and a watchdog circuit by taking a digital signal processor as a core and is suitable for a strong noise environment.
Description
The present invention relates to a kind of phonetic control command generation device, especially a kind of phonetic control command generation device that under noise circumstance, uses.
In the present existing technology, utilize speech recognition technology to realize people's natural language and machine dialogue, i.e. man-machine conversation, the instruction that makes machine can understand people's phonetic order and go the executor to send, existing certain progress.For example, the U.S. Pat 050950 of U.S. International Business Machine Corporation (IBM) application, the U.S. Pat 08/254 of U.S. Motorola Inc. application, 844, the U.S. Pat 352251 of AT﹠T's application, its principal character is to adopt receiver by air transmitted as the receptacle of the people's of speech recognition equipment voice, and the voice that receive are discerned in speech recognition equipment or computing machine.The important indicator of speech recognition is the correct recognition rata of people's voice, since in considerable environment for use except the voice that the people sends, have neighbourhood noise, this noise mixes among people's voice are sent in instruction, correct recognition rata is reduced greatly, even wrong identification occurs, this has just limited and has utilized speech recognition technology correctly to produce steering order, realize man-machine conversation, application in many actual places.
The phonetic control command generation device that the purpose of this invention is to provide the high speech recognition technology of a kind of correct recognition rata that under noise circumstance, uses.
The present invention is by modulus and digital to analog converter, LCD, power supply, loudspeaker, parts such as speech recognition device are formed, it is characterized in that on modulus and digital to analog converter, being connected a laryngophone by wave filter, speech recognition device then is core with the digital signal processor, connect flash memory (I), (II), the combinational logic device, scrambler, driver, watchdog circuit is formed, modulus is connected with the serial port of digital signal processor with digital to analog converter, driver links together flash memory (II) and digital signal processor, the combinational logic device respectively with flash memory (I), (II), digital signal processor connects, and exports control signal by the combinational logic device by executive circuit.
The present invention is imported by wave filter by laryngophone owing to its voice signal, and utilize digital signal processor the voice command signal to be discerned for the core of speech recognition device, send control signal corresponding then, and laryngophone is different from the general sound transducer that utilizes air transmitted (as microphone), it must be close to the person's of saying the word throat, vocal cords vibrate when speaking, carbon film generation deformation in the laryngophone, its resistance is changed, thereby the voltage at its two ends is changed, so vibration signal is converted into electric signal, i.e. voice signal.The sound wave that conducts in the air can't make the carbon film generation deformation in the laryngophone, so the laryngophone impression is less than the sound of air transmitted, has very strong antijamming capability, can under strong noise environment, obtain the order person's of sending voice signal, and wave filter can amplify and low-pass filtering the voice signal that laryngophone obtains, prevent frequency alias, digital signal processor then guarantees the smooth realization of speech recognition algorithm and compress speech decompression scheduling algorithm.
The present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
Fig. 1 is the schematic diagram of phonetic control command generating means of the present invention;
Fig. 2 is the circuit diagram of phonetic control command generating means of the present invention;
Fig. 3 is a phonetic control command generating means software master control process flow diagram;
Fig. 4 is identification module (rec_mode) process flow diagram in the phonetic control command generating means software;
Fig. 5 is administration module (manage_mode) process flow diagram in the phonetic control command generating means software;
Fig. 6 is training module (train_mode) process flow diagram in the phonetic control command generating means software;
The hardware implementations of phonetic control command generating means of the present invention is illustrated by Fig. 1 and Fig. 2.Each functional block specification of system is as follows among Fig. 1: digital signal processing chip (DSP) 1: adopt ADSP2181, and clock 16.67M, 33MIPS, an instruction cycle is 30ns; Inside contains 16K digital data storer and 16K word program storer, is used for the realization of algorithm and the control of interface, and the also available ADSP2186 of ADSP2181 replaces.Flash memory (I) 2: adopt AT29C020, be used for program code stored and initialization data; Flash memory (II) 3: adopt AT29C020, be used for storaged voice order sample; Modulus digital to analog converter 4: adopt AD73311,16 D/A and A/D include gain control; Its effect is that the analog voice that laryngophone obtains is carried out the A/D conversion, then digitized voice signal being sent into dsp chip handles, also audio digital signals is carried out the D/A conversion, be reduced to the voice signal of simulation, be reduced to sound by phonetic speech power amplifier and loudspeaker.Can sample by the frequency that employing is higher than required sample frequency in addition, cooperate DSP utilization lifting Sampling techniques, improve the signal to noise ratio (S/N ratio) of input speech signal and the discrimination of system; Watchdog circuit 5: adopt MAX705, the operation of ADSP2181 is monitored, send the WDG signal when problem is arranged; Combinational logic 6: realize with programmable logic device (PLD), produce control signal by output signal decoding to data and address and other ADSP2181; Scrambler 7: realize the 16-4 scrambler with MC14419,16 keys encode (4 bit code); LCD display 8:16 * 2 dot matrix display modules; Being used for the display reminding language reaches for information about; Wave filter 9: laryngophone 11 signals are carried out pre-service, realize, realize amplification and filtering, and realize the impedance matching between laryngophone and the AD73311, prevent the drift of voice baseline to weak voice signal with operational amplifier; The decode results of the voice command that executive circuit 10 sends according to DSP realizes the control to external object.Laryngophone 11: speech transducer is converted into electric signal to the person's of saying the word vocal cord vibration signal, i.e. analog voice signal.
Fig. 2 is the concrete connecting circuit figure of phonetic control command generating means of the present invention, and executive circuit is different because of different controlling object, looks the concrete condition otherwise designed by the user.U01 is ADSP2181 among Fig. 2, i.e. the digital signal processing sheet; U02 is that AT29C020 is a flash memory (I) 2, is used for program code stored and initialization data, and flash memory (II) 3 is not on figure, and it can make sample card separately, is connected with system by J04; U03 is GAL16V8, is programmable logic device (PLD), by to the control to two flash memories of the realization of decoding of ADSP2181 part signal; U04, MAX705 is watchdog chip, produces systematic reset signal RESET when powering on, and can also produce supervisory signal WDG when there is problem in system; U05 is AD73311, it is D and D/A converter 4, it is the analog voice signal digitizing of being sent into by J052 or J053 by laryngophone 11 acquisitions, send into the serial port of ADSP2181 then by the DR signal line, it also can receive into the serial data from ADSP2181 output from the DT signal wire, carry out the D/A conversion then, passes through U12, MC34119, amplification after be connected to loudspeaker by J051 and be reduced to sound; U06-U09 is a chip for driving, finishes the address between ADSP2181 and the flash memory (II) 3 and the driving of data line; U10 is MC34119, is scrambler 7, and 4 * 4 keyboards are encoded, and is input among the ADSP2181 by PF4-PF7 then; U11, MC7805 is the voltage stabilizing chip; J03 is the connector of keyboard and system in addition, and J02 is the interface of system and LCD 8, and J01 is the interface of system and emulator.
Fig. 3-Fig. 6 the explanation of the software implementation method of phonetic control command generating means of the present invention.Apparatus of the present invention can adopt different speech recognition algorithms and voice compression algorithm, can be by using the present inventor to select for use.The course of work of phonetic control command generating means of the present invention now is described in conjunction with Fig. 3-Fig. 6.
Fig. 3 is the software master control process flow diagram of phonetic control command generating means.As seen from Figure 3, software is divided into three modules: (1) identification module rec_mode (2) administration module manage_mode (3) training module train_mode.Behind this device electrifying startup, be in the model selection state, wait for user's keyboard input, enter selected state then.
Fig. 4 is sound identification module (rec_mode) process flow diagram, this module realizes the voice signal of laryngophone input is discerned, then the result of identification (to coding that should voice command) is outputed to combinational logic circuit, remove to control the external control object then.As seen from the figure, the process of speech recognition is at first carried out speech detection, has judged whether phonetic entry; If have then these voice are carried out feature extraction, promptly extract the MFCC parameter of input voice; The laggard line parameter of parameter extraction relatively, promptly the characteristic parameter of input voice and the characteristic parameter (being template) that is stored in the voice command in the flash memory are compared, determine whether and certain template matches wherein, two kinds of situations are arranged here, first kind of situation is to mate fully, and then the template of being mated is the voice command of input, and at this time the matching template corresponding codes is the coding of input voice command, be input to combinational logic by data line, go to control external object then; Second kind of situation is incomplete coupling, at this time find three immediate voice command templates, and the playback respectively of their voice, allow the user judge, if wherein there is one to be the voice command of input, after then confirming, its voice coding is input to combinational logic by data line, goes to control external object then by the user; If three is not the voice command of input, then prompting allows the user from voice command of new input, repeats above-mentioned speech recognition process, up to identifying the result.
Fig. 5 is administration module (manage_mode) process flow diagram, and this module realizes management function, comprises the typing of command template, searches deletion, system's speech typing modification and playback, the management of keyboard etc.
Fig. 6 is training module (train_mode) process flow diagram, and this module is set up the template of voice command and realized the storage of template.The process of voice command training at first is a speech detection, has promptly judged whether phonetic entry; After having determined phonetic entry, these voice are carried out the processing of two aspects, the one, extract the feature of these voice, promptly calculate its MFCC parameter, the 2nd, this speech data is carried out compressed encoding; Allow the audio playback that has write down the user judge then, if the user keys in the quality of the dissatisfied voice command of information representation, then repeat above operation, if the user keys in the quality of the satisfied voice command of information representation, then point out the user to key in the coding of voice command, then voice command and coding thereof after the characteristic parameter (being template) of the voice command of input and the compression are deposited in the flash memory, at this time finished the once operation of training.
The present invention in use, and laryngophone 11 is fixing or be attached near the position larynx that sends instruction person, is used to accept the instruction of instructing the person of sending to send, and each instruction is generally a phrase, and a plurality of instructions are a plurality of phrases.The analog voice command signal that wave filter 9 is accepted by laryngophone 11 outputs after wave filter 9 pre-service, is input to analog to digital converter with the analog voice command signal after handling, and forms the digital speech command signal.The digital speech command signal inputs to digital signal processor 1, is the speech recognition device that core is formed with digital signal processor 1, and the recognizing voice command signal forms steering order, and steering order will output to predetermined controlled device.Wherein, flash memory (I) 2 is used for program code stored and initialization data; Flash memory (II) 3 is used to store the phonetic order sample through training study; Driver is finished the connection between digital signal processor 1 and the flash memory (II) 3; Combinational logic device 6 is used for the decoding of address and digital signal processor 1 output signal and produces control signal.LCD 8 is used for the necessary signal language of use.Power supply is the power supply of this device.Within 50, but the max cap. of phonetic order bar number of the present invention is 200 to general field of employment phonetic order bar number.When this device uses for many people, the speech samples of flash memory (II) 3 storage can be changed, or everyone corresponding flash memory (II) 3 is made into the voice sample card, makes movable grafting form.
The present invention is owing to adopt laryngophone to send the receiver of phonetic order as the instruction person of sending, vocal cord vibration voice signal when directly accepting to instruct the person of sending to send instruction, avoided voice when air transmitted, to sneak into the noise that exists in the environment for use, thereby avoided having reduced requirement for environment for use because neighbourhood noise is sneaked into the correct recognition rata decline that brings or brought the wrong identification of instruction.The present invention will be applicable to that the natural-sounding of carrying out personnel selection in public place, workshop, building site, the aeroamphibious delivery vehicle (car, ship, aircraft etc.) sends instruction and makes machine by the instruction running, replace people's manual operations with phonetic order; When being not easy to manual operations, can use apparatus of the present invention for disabled person's (forfeiture is spoken except the ability person); Can be used for allowing in the intelligent toy toy move by people's phonetic order; Can be used for the dialogue of people and robot, make the phonetic order action of robot by the people; And can use phonetic control command generation device of the present invention under the noise circumstance not existing.
Claims (1)
1, a kind of speech control command generator in noiseful environment, include modulus and digital to analog converter 4, LCD 8, power supply, loudspeaker, parts such as speech recognition device, it is characterized in that on modulus and digital to analog converter 4, being connected a laryngophone 11 by a wave filter 9, speech recognition device is a core with digital signal processor 1 then, connect flash memory (I) 2, (II) 3, combinational logic device 6, scrambler 7, driver, watchdog circuit 5 is formed, modulus is connected with the serial port of digital to analog converter 4 with digital signal processor 1, driver links together flash memory (II) 3 and digital signal processor, combinational logic device 6 respectively with flash memory (I) 2, (II) 3, digital signal processor 1 connects, and combinational logic device 6 is connected with controlled plant by executive circuit 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN99116104A CN1100305C (en) | 1999-03-31 | 1999-03-31 | Voice control instruction generating device under noise environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN99116104A CN1100305C (en) | 1999-03-31 | 1999-03-31 | Voice control instruction generating device under noise environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1235320A true CN1235320A (en) | 1999-11-17 |
CN1100305C CN1100305C (en) | 2003-01-29 |
Family
ID=5278949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN99116104A Expired - Fee Related CN1100305C (en) | 1999-03-31 | 1999-03-31 | Voice control instruction generating device under noise environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1100305C (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976186A (en) * | 2010-09-14 | 2011-02-16 | 方正科技集团苏州制造有限公司 | Voice recognition method of computer and computer |
CN104123930A (en) * | 2013-04-27 | 2014-10-29 | 华为技术有限公司 | Guttural identification method and device |
CN106535045A (en) * | 2016-11-30 | 2017-03-22 | 中航华东光电(上海)有限公司 | Audio enhancement processing module for laryngophone |
CN108182941A (en) * | 2017-12-28 | 2018-06-19 | 重庆柚瓣家科技有限公司 | For the human-computer interaction module under noisy environment |
CN118430541A (en) * | 2024-07-04 | 2024-08-02 | 青岛海尔乐信云科技有限公司 | Intelligent voice robot system |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107068145B (en) * | 2016-12-30 | 2019-02-15 | 中南大学 | Speech evaluating method and system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DK0796489T3 (en) * | 1994-11-25 | 1999-11-01 | Fleming K Fink | Method of transforming a speech signal using a pitch manipulator |
CN2262291Y (en) * | 1996-01-25 | 1997-09-10 | 蔡辉阳 | Automatic voice controller |
US5794187A (en) * | 1996-07-16 | 1998-08-11 | Audiological Engineering Corporation | Method and apparatus for improving effective signal to noise ratios in hearing aids and other communication systems used in noisy environments without loss of spectral information |
-
1999
- 1999-03-31 CN CN99116104A patent/CN1100305C/en not_active Expired - Fee Related
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976186A (en) * | 2010-09-14 | 2011-02-16 | 方正科技集团苏州制造有限公司 | Voice recognition method of computer and computer |
CN101976186B (en) * | 2010-09-14 | 2013-04-03 | 方正科技集团苏州制造有限公司 | Voice recognition method of computer and computer |
CN104123930A (en) * | 2013-04-27 | 2014-10-29 | 华为技术有限公司 | Guttural identification method and device |
CN106535045A (en) * | 2016-11-30 | 2017-03-22 | 中航华东光电(上海)有限公司 | Audio enhancement processing module for laryngophone |
CN108182941A (en) * | 2017-12-28 | 2018-06-19 | 重庆柚瓣家科技有限公司 | For the human-computer interaction module under noisy environment |
CN118430541A (en) * | 2024-07-04 | 2024-08-02 | 青岛海尔乐信云科技有限公司 | Intelligent voice robot system |
Also Published As
Publication number | Publication date |
---|---|
CN1100305C (en) | 2003-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1119794C (en) | Distributed voice recognition system | |
US7260529B1 (en) | Command insertion system and method for voice recognition applications | |
CN102723080B (en) | Voice recognition test system and voice recognition test method | |
JP3335178B2 (en) | Method and apparatus for transmitting voice samples to a voice activated data processing system | |
CN111933108B (en) | Automatic testing method for intelligent voice interaction system of intelligent network terminal | |
US6975986B2 (en) | Voice spelling in an audio-only interface | |
CN107134279A (en) | A kind of voice awakening method, device, terminal and storage medium | |
CN111914076B (en) | User image construction method, system, terminal and storage medium based on man-machine conversation | |
WO2004036939A1 (en) | Portable digital mobile communication apparatus, method for controlling speech and system | |
CN1912994A (en) | Tonal correction of speech | |
CN1100305C (en) | Voice control instruction generating device under noise environment | |
CN117056481A (en) | Cloud service industry dialogue help system based on large model technology and implementation method | |
CN116564286A (en) | Voice input method and device, storage medium and electronic equipment | |
JP2545914B2 (en) | Speech recognition method | |
CN116129861A (en) | Method and device for converting text into voice and training method of voice synthesis model | |
JPH03132797A (en) | Voice recognition device | |
CN115439958A (en) | Remote control system and method for intelligent door lock | |
JPH09179578A (en) | Syllable recognition device | |
CN110085212A (en) | A kind of audio recognition method for CNC program controller | |
CN108492822A (en) | A kind of audio recognition method based on commercial Application | |
CN117238275B (en) | Speech synthesis model training method and device based on common sense reasoning and synthesis method | |
CN114360485B (en) | Voice processing method, system, device and medium | |
JPH02171098A (en) | Voice recognition remote controller | |
CN112347233A (en) | Dialogue processing apparatus, vehicle including dialogue processing apparatus, and dialogue processing method | |
CN113870825A (en) | AI voice system capable of automatically synthesizing audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C19 | Lapse of patent right due to non-payment of the annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |