CN1241746A

CN1241746A - Universal phonetic control command generator

Info

Publication number: CN1241746A
Application number: CN99116106A
Authority: CN
Inventors: 江太辉; 张歆奕; 宋国栋; 张有为
Original assignee: Wuyi University
Current assignee: Wuyi University
Priority date: 1999-03-31
Filing date: 1999-03-31
Publication date: 2000-01-19

Abstract

The universal speech control command generator of the present invention makes it possible for machine to execute according to natural human speech command intelligently. It consists of digital signal processor, microprocessor unit, flash memory, A/D converter, D/A converter, LED, speech receiver, loudspeaker or earphone,keyboard, power source controller, etc. It operates in double CPU mode nd has maximal command number of 256. It is universal, small, low in cost, and suitable for various speech recognition algorithms, and is applicable where the machine is to be controlled through natural human language.

Description

Universal phonetic control command generator

The present invention relates to a kind of natural language instruction that makes machine energy executor, realize the universal phonetic control command generator of machine intelligence.

In the present existing technology, utilize speech recognition technology to realize people's natural language and machine dialogue, it is human-computer dialogue, the instruction that makes machine can understand people's phonetic order and go correct executor to send, had in recent years suitable progress, make the degree of machine intelligence that quite rapidly raising be arranged, just enter the practical stage at many algorithms aspect the speech recognition technology method, for example, the US Patent No. 08/254,844 of U.S. Motorola Inc. application, US08/413,146, the European patent EP 95021139.3 of Dutch Philips Electronics company application etc. all provide such as utilizing the speech recognition algorithms such as neutral net, hidden Markov. But in the above-mentioned technology, do not realize the hardware design of the phonetic control command generator of machine intelligence.

The purpose of this invention is to provide the phonetic control command generator that a kind of highly versatile, structure essence are little, low-cost, can adopt the different phonetic recognizer.

The present invention is comprised of flash memory (I), flash memory (II), modulus and the parts such as digital to analog converter (A/D and D/A), liquid crystal display (LCD), receiver, loudspeaker (or earphone), keyboard and power supply, it is characterized in that also being provided with digital signal processor (DSP) and little processing (MPU), digital signal processor is connected with digital to analog converter with modulus by serial port, microprocessor is connected by serial line interface with digital signal processor, keyboard, liquid crystal display and interface circuit are connected and are connected with microprocessor, and receiver, loudspeaker then are connected on modulus and the digital to analog converter.

Universal phonetic control command generator of the present invention has adopted the mode of microprocessor (MPU) and the two CPU co-ordinations of digital signal processor (DSP), has solved the communication interface of MPU and DSP, has provided MPU and DSP communication specific command; Make MPU also finish simultaneously the function of keyboard interface, LCD interface, extraneous interface, power management and house dog, realized system minimizes; By program code and the initialization data of flash memory (I) storaged voice recognizer, can select algorithms of different and need not to change hardware configuration; Upper outside except being installed on machinery panel in LCD, keyboard, receiver, loudspeaker (or earphone) use, all the other hardware can be integrated in the printed board of a 4 * 7cm; The maximum number of output control instruction is 2⁸Article=256. Because universal phonetic control command generator highly versatile of the present invention, little, low-cost, the high discrimination of structure essence, so needing can be widely used in end user's natural-sounding control machine, make the intelligentized occasion of machinery equipment, for example, machine for producing device equipment, domestic electric appliance, communication apparatus, traffic delivery means, instrument and equipment.

The present invention will be further described below in conjunction with the drawings and specific embodiments.

Fig. 1 is the universal phonetic control command generator composition diagram;

Fig. 2 is the universal phonetic control command generator circuit diagram;

Fig. 3 realizes between microprocessor SMC88308 and the digital signal processing chip ADSP2186 leading to

The flow chart of letter;

Fig. 4 is the total empty flow chart of the software of universal phonetic control command generator;

Fig. 5 is the identification module flow chart;

Fig. 6 is the administration module flow chart;

Fig. 7 is the training module flow chart.

As shown in drawings, universal phonetic control command generator of the present invention is comprised of digital signal processor (DSP) 1, microprocessor (MPU) 2, flash memory (I) 3, flash memory (II) 4, A/D and D/A converter 5, liquid crystal display (LCD) 6, receiver 7, loudspeaker 8 (or earphone), keyboard and electric power controller etc. Receiver 7 is accepted the instruction person's of sending phonetic order, and each instruction is a phrase, and a plurality of instructions are a plurality of phrases. The analog voice instruction is converted to digital information by the A/D converter and is input among the DSP1 and processes, voice repayment is converted to analog information by the D/A converter and delivers to loudspeaker 8 (or earphone) and report to the instruction person of sending, so that the instruction of sending with prompt statement or affirmation to the instruction person of sending. Digital signal processor (DSP) 1 is the core component of speech recognition, finish speech recognition and compress speech scheduling algorithm, it by data/address bus and address bus and flash memory (I) 3 be connected II) 4 directly be connected, are connected connection with the D/A converter by data/address bus and A/D; Flash memory (I) 3 is used for storing program code and the initialization data of selected speech recognition algorithm; Flash memory (II) 4 is used for storing trained phonetic control command sample. Microprocessor (MPU) 2 is realized dual CPU with digital signal processor (DSP) 1, and MPU is connected by serial line interface with DSP, by specialized instructions communication and the operation of the present invention's design; MPU can be directly directly be connected with keyboard, liquid crystal display glass sheet and interface circuit, and inside comprises the watchdog circuit function. Liquid crystal display (LCD) 6 is used for the display reminding statement. Electric power controller is used for saving the management of DSP power consumption. 4 * 4 keyboard is used in the order input of training with management process. Control instruction exports outside controll plant to, and in the situation of 8 bits, the maximum number of control instruction is 2⁸Article=256.

Fig. 1 has clearly illustrated the composition of universal phonetic control command generator and the annexation between each part, and LCD display 6 reality wherein are the liquid-crystalline glasses sheet, do not contain the driving chip. As seen from Figure 2, universal phonetic control command generator of the present invention mainly is comprised of five chips, and system is very simple. These five chips are: (1) U1, ADSP2186, Digital Signal Processing (DSP) 1 chip, clock 16.67M, 33MIPS, an instruction cycle is 30n`s, inside contains 8K word program storage and 8K digital data memory, is used for the realization of speech recognition algorithm and voice compression algorithm; (2) U3, AT29C020, flash memory (I) 3 is used for program code stored and initialization data; (3) U2, AT29C020, flash memory (II) 4 is used for the storaged voice command template; (4) U5, AD73311, A/D and D/A conversion chip 5,16 D/A and A/D including gain control, and its is the analog voice signal digitlization that is obtained by microphone of being sent into by J052, then send into the serial port of ADSP2186 by the DR signal line, it also can receive into the serial data from ADSP2186 output from the DT holding wire, then carry out the D/A conversion, is connected to loudspeaker 8 by CON2 and is reduced to sound; (5) U7, SMC88308,8 single-chip microcomputers for EPSON company are characterized in: include the ROM of 8K BYTE and the RAM of 256K BYTE, be used for solidifying user program; Include the LCD drive circuit, can directly drive LCD panel, saved outside liquid crystal display drive circuit; Include WatchDog Timer, saved outside corresponding circuits; Input/output port is very abundant, can directly link to each other with keyboard matrix and need not extra keyboard coding circuit, direct coding corresponding to output command also, control external circuit; Include serial line interface, by SIN, the SOUT equisignal line can with the dsp chip Direct Communication; It also has power voltage monitoring circuit in addition, is convenient to power supply is managed etc. Therefore to be used be main characteristics of the present invention for MPU and DSP, and it has not only been reduced the area of system to greatest extent so that whole system simplifies to greatest extent, has reduced cost, has also improved the reliability of system; MPU and DSP share out the work and help one another in addition, DSP mainly realizes speech identifying function and compress speech playback function, other functions are then finished by MPU, reduce so to greatest extent the service time of DSP, thereby reduce the power consumption of whole system, because the power consumption of DSP is large, and the power consumption of MPU is very little, thereby makes the present invention also can be applicable to use the portable product of battery. (6) U6, MC7805 is the voltage stabilizing chip, for system provides stable power supply VCC; (7) U8, MAX705 is used for producing power-on reset signal RESET here; J5 is the connector of keyboard and MPU in addition, and J4 is the connector of system and liquid-crystalline glasses sheet, and J105 is the interface of system and emulator, and J6 is the instruction encoding delivery outlet.

Carry out serial communication between MPU and the DSP, its data transfer procedure as shown in Figure 3. MPU controls the operation of DSP and returns required data by sending custom-designed order. Mainly order as follows for three groups: 1. training order:

Order	Parameter	The data of returning	Explanation
Order	Parameter	The data of returning	Explanation	01H	Nothing	Nothing	Training function, typing order sound template

2. recognition command:

Order	Parameter	The data of returning	Explanation
Order	Parameter	The data of returning	Explanation	02H	Nothing	The coding of the order correspondence of recognition result	The command recognition function, the voice command of the current input of identification.

3. administration order:

Order	Parameter	The data of returning	Explanation
Order	Parameter	The data of returning	Explanation	03H	Key＝1	Nothing	Newly-built template
04H	Key＝2	Code	A upper template	03H	Key＝1	Nothing	Newly-built template
04H	Key＝2	Code	A upper template	05H	Key＝3	Code	Next template
06H	Key＝4	Nothing	The deletion template	05H	Key＝3	Code	Next template
06H	Key＝4	Nothing	The deletion template	07H	Key＝5	Code	The playback command word
08H	Key＝6	Code	The playback system word	07H	Key＝5	Code	The playback command word
08H	Key＝6	Code	The playback system word	09H	Key＝7	Nothing	The input system word

The software master control flow chart of universal phonetic control command generator as shown in Figure 4. Now in conjunction with the course of work of this this universal phonetic of flowchart text control generator. After system starts, wait for keyboard commands, can enter respectively three kinds of patterns, i.e. recognition mode and training mode and management mode. If enter recognition mode, then give an order by serial port, make ADSP2186 start speech recognition program, carry out the operation of speech recognition, then the result of identification, the information such as coding of the order that namely identifies are returned SMC88308, and send to demonstration, detailed process is as shown in Figure 5; If enter training mode, then give an order by serial port, make ADSP2186 start training program, carry out the training and operation of voice command, the coding of intermediate demand input command, and by serial port transmission data, detailed process is referring to Fig. 6; If the entrance management pattern is then given an order by serial port, make ADSP2186 start hypervisor, carry out corresponding bookkeeping, and return relevant data, referring to Fig. 7.

Fig. 5 is the flow chart of speech recognition. As seen from the figure, the process of speech recognition is at first carried out speech detection, has judged whether phonetic entry; If have then these voice are carried out feature extraction, namely extract the MFCC parameter of input voice; The laggard line parameter of parameter extraction relatively, namely the characteristic parameter of input voice and the characteristic parameter (being template) that is stored in the voice command in the flash memory are compared, determine whether and wherein certain template matches, two kinds of situations are arranged here, situation is to mate fully in first, the template of then being mated is the voice command of input, and the coding that at this time matching template is corresponding is the coding of input voice command, sends MPU back to by serial port; The second situation is Incomplete matching, at this time finds three immediate voice command templates, and the respectively playback of their voice, allow the user judge, if wherein there is one to be the voice command of input, after then being confirmed by the user, its voice coding is returned MPU; If three is not the voice command of input, then prompting allows the user from voice command of new input, repeats above-mentioned speech recognition process, until identify the result.

Fig. 6 is the flow chart of hypervisor, and it carries out template and search according to the keyboard commands that the user keys in, template deletion, playback command word, the operation of playback system word and recording system word.

Fig. 7 is voice command training program flow chart. The process of voice command training at first is speech detection, has namely judged whether phonetic entry; After having determined phonetic entry, these voice are carried out the processing of two aspects, the one, extract the feature of these voice, namely calculate its MFCC parameter, the 2nd, this speech data is carried out compressed encoding; Then allow the audio playback that has recorded the user judge, if user's key entry information represents the quality of dissatisfied voice command, then repeat above operation, the quality of voice command if user's key entry information is satisfied with, then prompting user is keyed in the coding of voice command, then voice command and coding thereof after the characteristic parameter (being template) of the voice command of input and the compression are deposited in the flash memory, at this time finished the once operation of training.

Claims

1, a kind of universal phonetic control command generator, include flash memory (I) 3, flash memory (II) 4, modulus and the parts such as digital to analog converter (A/D and D/A) 5, liquid crystal display (LCD) 6, receiver 7, loudspeaker (or earphone) 8, keyboard and power supply, it is characterized in that also being provided with digital signal processor (DSP) 1 and microprocessor (MPU) 2, digital signal processor 1 is connected with digital to analog converter with modulus by serial port and is connected, microprocessor 2 and digital signal processor 1 are connected by serial line interface, keyboard, liquid crystal display 6 and interface circuit directly are connected with microprocessor 2, and 8 of receiver 7 and loudspeakers are connected on modulus and the digital to analog converter 5.