CN1235320A

CN1235320A - Voice control instruction generating device under noise environment

Info

Publication number: CN1235320A
Application number: CN99116104A
Authority: CN
Inventors: 张有为; 张歆奕; 何强
Original assignee: Wuyi University
Current assignee: Wuyi University
Priority date: 1999-03-31
Filing date: 1999-03-31
Publication date: 1999-11-17
Anticipated expiration: 2019-03-31
Also published as: CN1100305C

Abstract

The invention relates to a voice control instruction generating device under a noise environment, which comprises an analog-digital converter, a digital-analog converter, a liquid crystal display, a power supply, a loudspeaker, a voice recognizer and the like, and is characterized in that the analog-digital converter and the digital-analog converter are connected with a throat microphone through a filter, and the voice recognizer is formed by connecting flash memories and (II), a combinational logic device, an encoder, a driver and a watchdog circuit by taking a digital signal processor as a core and is suitable for a strong noise environment.

Description

Speech control command generator in noiseful environment

The present invention relates to a kind of phonetic control command generation device, especially a kind of phonetic control command generation device that under noise circumstance, uses.

In the present existing technology, utilize speech recognition technology to realize people's natural language and machine dialogue, i.e. man-machine conversation, the instruction that makes machine can understand people's phonetic order and go the executor to send, existing certain progress.For example, the U.S. Pat 050950 of U.S. International Business Machine Corporation (IBM) application, the U.S. Pat 08/254 of U.S. Motorola Inc. application, 844, the U.S. Pat 352251 of AT﹠T's application, its principal character is to adopt receiver by air transmitted as the receptacle of the people's of speech recognition equipment voice, and the voice that receive are discerned in speech recognition equipment or computing machine.The important indicator of speech recognition is the correct recognition rata of people's voice, since in considerable environment for use except the voice that the people sends, have neighbourhood noise, this noise mixes among people's voice are sent in instruction, correct recognition rata is reduced greatly, even wrong identification occurs, this has just limited and has utilized speech recognition technology correctly to produce steering order, realize man-machine conversation, application in many actual places.

The phonetic control command generation device that the purpose of this invention is to provide the high speech recognition technology of a kind of correct recognition rata that under noise circumstance, uses.

The present invention is by modulus and digital to analog converter, LCD, power supply, loudspeaker, parts such as speech recognition device are formed, it is characterized in that on modulus and digital to analog converter, being connected a laryngophone by wave filter, speech recognition device then is core with the digital signal processor, connect flash memory (I), (II), the combinational logic device, scrambler, driver, watchdog circuit is formed, modulus is connected with the serial port of digital signal processor with digital to analog converter, driver links together flash memory (II) and digital signal processor, the combinational logic device respectively with flash memory (I), (II), digital signal processor connects, and exports control signal by the combinational logic device by executive circuit.

The present invention is imported by wave filter by laryngophone owing to its voice signal, and utilize digital signal processor the voice command signal to be discerned for the core of speech recognition device, send control signal corresponding then, and laryngophone is different from the general sound transducer that utilizes air transmitted (as microphone), it must be close to the person's of saying the word throat, vocal cords vibrate when speaking, carbon film generation deformation in the laryngophone, its resistance is changed, thereby the voltage at its two ends is changed, so vibration signal is converted into electric signal, i.e. voice signal.The sound wave that conducts in the air can't make the carbon film generation deformation in the laryngophone, so the laryngophone impression is less than the sound of air transmitted, has very strong antijamming capability, can under strong noise environment, obtain the order person's of sending voice signal, and wave filter can amplify and low-pass filtering the voice signal that laryngophone obtains, prevent frequency alias, digital signal processor then guarantees the smooth realization of speech recognition algorithm and compress speech decompression scheduling algorithm.

The present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.

Fig. 1 is the schematic diagram of phonetic control command generating means of the present invention;

Fig. 2 is the circuit diagram of phonetic control command generating means of the present invention;

Fig. 3 is a phonetic control command generating means software master control process flow diagram;

Fig. 4 is identification module (rec_mode) process flow diagram in the phonetic control command generating means software;

Fig. 5 is administration module (manage_mode) process flow diagram in the phonetic control command generating means software;

Fig. 6 is training module (train_mode) process flow diagram in the phonetic control command generating means software;

The hardware implementations of phonetic control command generating means of the present invention is illustrated by Fig. 1 and Fig. 2.Each functional block specification of system is as follows among Fig. 1: digital signal processing chip (DSP) 1: adopt ADSP2181, and clock 16.67M, 33MIPS, an instruction cycle is 30ns; Inside contains 16K digital data storer and 16K word program storer, is used for the realization of algorithm and the control of interface, and the also available ADSP2186 of ADSP2181 replaces.Flash memory (I) 2: adopt AT29C020, be used for program code stored and initialization data; Flash memory (II) 3: adopt AT29C020, be used for storaged voice order sample; Modulus digital to analog converter 4: adopt AD73311,16 D/A and A/D include gain control; Its effect is that the analog voice that laryngophone obtains is carried out the A/D conversion, then digitized voice signal being sent into dsp chip handles, also audio digital signals is carried out the D/A conversion, be reduced to the voice signal of simulation, be reduced to sound by phonetic speech power amplifier and loudspeaker.Can sample by the frequency that employing is higher than required sample frequency in addition, cooperate DSP utilization lifting Sampling techniques, improve the signal to noise ratio (S/N ratio) of input speech signal and the discrimination of system; Watchdog circuit 5: adopt MAX705, the operation of ADSP2181 is monitored, send the WDG signal when problem is arranged; Combinational logic 6: realize with programmable logic device (PLD), produce control signal by output signal decoding to data and address and other ADSP2181; Scrambler 7: realize the 16-4 scrambler with MC14419,16 keys encode (4 bit code); LCD display 8:16 * 2 dot matrix display modules; Being used for the display reminding language reaches for information about; Wave filter 9: laryngophone 11 signals are carried out pre-service, realize, realize amplification and filtering, and realize the impedance matching between laryngophone and the AD73311, prevent the drift of voice baseline to weak voice signal with operational amplifier; The decode results of the voice command that executive circuit 10 sends according to DSP realizes the control to external object.Laryngophone 11: speech transducer is converted into electric signal to the person's of saying the word vocal cord vibration signal, i.e. analog voice signal.

Fig. 2 is the concrete connecting circuit figure of phonetic control command generating means of the present invention, and executive circuit is different because of different controlling object, looks the concrete condition otherwise designed by the user.U01 is ADSP2181 among Fig. 2, i.e. the digital signal processing sheet; U02 is that AT29C020 is a flash memory (I) 2, is used for program code stored and initialization data, and flash memory (II) 3 is not on figure, and it can make sample card separately, is connected with system by J04; U03 is GAL16V8, is programmable logic device (PLD), by to the control to two flash memories of the realization of decoding of ADSP2181 part signal; U04, MAX705 is watchdog chip, produces systematic reset signal RESET when powering on, and can also produce supervisory signal WDG when there is problem in system; U05 is AD73311, it is D and D/A converter 4, it is the analog voice signal digitizing of being sent into by J052 or J053 by laryngophone 11 acquisitions, send into the serial port of ADSP2181 then by the DR signal line, it also can receive into the serial data from ADSP2181 output from the DT signal wire, carry out the D/A conversion then, passes through U12, MC34119, amplification after be connected to loudspeaker by J051 and be reduced to sound; U06-U09 is a chip for driving, finishes the address between ADSP2181 and the flash memory (II) 3 and the driving of data line; U10 is MC34119, is

scrambler

7, and 4 * 4 keyboards are encoded, and is input among the ADSP2181 by PF4-PF7 then; U11, MC7805 is the voltage stabilizing chip; J03 is the connector of keyboard and system in addition, and J02 is the interface of system and LCD 8, and J01 is the interface of system and emulator.

Fig. 3-Fig. 6 the explanation of the software implementation method of phonetic control command generating means of the present invention.Apparatus of the present invention can adopt different speech recognition algorithms and voice compression algorithm, can be by using the present inventor to select for use.The course of work of phonetic control command generating means of the present invention now is described in conjunction with Fig. 3-Fig. 6.

Fig. 3 is the software master control process flow diagram of phonetic control command generating means.As seen from Figure 3, software is divided into three modules: (1) identification module rec_mode (2) administration module manage_mode (3) training module train_mode.Behind this device electrifying startup, be in the model selection state, wait for user's keyboard input, enter selected state then.

Fig. 4 is sound identification module (rec_mode) process flow diagram, this module realizes the voice signal of laryngophone input is discerned, then the result of identification (to coding that should voice command) is outputed to combinational logic circuit, remove to control the external control object then.As seen from the figure, the process of speech recognition is at first carried out speech detection, has judged whether phonetic entry; If have then these voice are carried out feature extraction, promptly extract the MFCC parameter of input voice; The laggard line parameter of parameter extraction relatively, promptly the characteristic parameter of input voice and the characteristic parameter (being template) that is stored in the voice command in the flash memory are compared, determine whether and certain template matches wherein, two kinds of situations are arranged here, first kind of situation is to mate fully, and then the template of being mated is the voice command of input, and at this time the matching template corresponding codes is the coding of input voice command, be input to combinational logic by data line, go to control external object then; Second kind of situation is incomplete coupling, at this time find three immediate voice command templates, and the playback respectively of their voice, allow the user judge, if wherein there is one to be the voice command of input, after then confirming, its voice coding is input to combinational logic by data line, goes to control external object then by the user; If three is not the voice command of input, then prompting allows the user from voice command of new input, repeats above-mentioned speech recognition process, up to identifying the result.

Fig. 5 is administration module (manage_mode) process flow diagram, and this module realizes management function, comprises the typing of command template, searches deletion, system's speech typing modification and playback, the management of keyboard etc.

Fig. 6 is training module (train_mode) process flow diagram, and this module is set up the template of voice command and realized the storage of template.The process of voice command training at first is a speech detection, has promptly judged whether phonetic entry; After having determined phonetic entry, these voice are carried out the processing of two aspects, the one, extract the feature of these voice, promptly calculate its MFCC parameter, the 2nd, this speech data is carried out compressed encoding; Allow the audio playback that has write down the user judge then, if the user keys in the quality of the dissatisfied voice command of information representation, then repeat above operation, if the user keys in the quality of the satisfied voice command of information representation, then point out the user to key in the coding of voice command, then voice command and coding thereof after the characteristic parameter (being template) of the voice command of input and the compression are deposited in the flash memory, at this time finished the once operation of training.

The present invention in use, and laryngophone 11 is fixing or be attached near the position larynx that sends instruction person, is used to accept the instruction of instructing the person of sending to send, and each instruction is generally a phrase, and a plurality of instructions are a plurality of phrases.The analog voice command signal that wave filter 9 is accepted by laryngophone 11 outputs after wave filter 9 pre-service, is input to analog to digital converter with the analog voice command signal after handling, and forms the digital speech command signal.The digital speech command signal inputs to digital signal processor 1, is the speech recognition device that core is formed with digital signal processor 1, and the recognizing voice command signal forms steering order, and steering order will output to predetermined controlled device.Wherein, flash memory (I) 2 is used for program code stored and initialization data; Flash memory (II) 3 is used to store the phonetic order sample through training study; Driver is finished the connection between digital signal processor 1 and the flash memory (II) 3; Combinational logic device 6 is used for the decoding of address and digital signal processor 1 output signal and produces control signal.LCD 8 is used for the necessary signal language of use.Power supply is the power supply of this device.Within 50, but the max cap. of phonetic order bar number of the present invention is 200 to general field of employment phonetic order bar number.When this device uses for many people, the speech samples of flash memory (II) 3 storage can be changed, or everyone corresponding flash memory (II) 3 is made into the voice sample card, makes movable grafting form.

The present invention is owing to adopt laryngophone to send the receiver of phonetic order as the instruction person of sending, vocal cord vibration voice signal when directly accepting to instruct the person of sending to send instruction, avoided voice when air transmitted, to sneak into the noise that exists in the environment for use, thereby avoided having reduced requirement for environment for use because neighbourhood noise is sneaked into the correct recognition rata decline that brings or brought the wrong identification of instruction.The present invention will be applicable to that the natural-sounding of carrying out personnel selection in public place, workshop, building site, the aeroamphibious delivery vehicle (car, ship, aircraft etc.) sends instruction and makes machine by the instruction running, replace people's manual operations with phonetic order; When being not easy to manual operations, can use apparatus of the present invention for disabled person's (forfeiture is spoken except the ability person); Can be used for allowing in the intelligent toy toy move by people's phonetic order; Can be used for the dialogue of people and robot, make the phonetic order action of robot by the people; And can use phonetic control command generation device of the present invention under the noise circumstance not existing.

Claims

1, a kind of speech control command generator in noiseful environment, include modulus and digital to analog converter 4, LCD 8, power supply, loudspeaker, parts such as speech recognition device, it is characterized in that on modulus and digital to analog converter 4, being connected a laryngophone 11 by a wave filter 9, speech recognition device is a core with digital signal processor 1 then, connect flash memory (I) 2, (II) 3, combinational logic device 6, scrambler 7, driver, watchdog circuit 5 is formed, modulus is connected with the serial port of digital to analog converter 4 with digital signal processor 1, driver links together flash memory (II) 3 and digital signal processor, combinational logic device 6 respectively with flash memory (I) 2, (II) 3, digital signal processor 1 connects, and combinational logic device 6 is connected with controlled plant by executive circuit 10.