CN107680596A

CN107680596A - Phonetic synthesis and identifying system based on virtual instrument

Info

Publication number: CN107680596A
Application number: CN201710879571.0A
Authority: CN
Inventors: 隋美丽; 朱青松; 吕江毅; 龙建
Original assignee: Beijing Polytechnic
Current assignee: Beijing Polytechnic
Priority date: 2017-09-26
Filing date: 2017-09-26
Publication date: 2018-02-09

Abstract

The present invention relates to a kind of phonetic synthesis and identifying system, and in particular to a kind of phonetic synthesis and identifying system based on virtual instrument, including：Voice acquisition module, sound identification module, voice synthetic module and simulation system operation module；The voice acquisition module is used to obtain voice signal by microphone；The voice signal that the sound identification module is used to microphone being collected into is identified, and voice signal is converted into text formatting；The voice synthetic module is used to the word content of text formatting being converted into voice signal, and voice signal is played out by loudspeaker；The simulation system operation module is used to identify and perform the operational order in voice signal.The present invention can enable the system to accomplish the functions such as voice collecting, phonetic synthesis, speech recognition and simulation system operation with the system of finishing man-machine interaction function.

Description

Phonetic synthesis and identifying system based on virtual instrument

Technical field

The present invention relates to a kind of phonetic synthesis and identifying system, and in particular to a kind of phonetic synthesis based on virtual instrument with Identifying system.

Background technology

Language is the most important the vehicle of communication of the mankind, is the various expression symbols that people carry out communication exchange.Meanwhile language The important channel that people and computer are communicated with each other is also become, and if people are want using reaching preferable between language and computer Communicative effect is, it is necessary to the support of this two key technologies of speech synthesis technique Yu speech recognition technology.Computer closes according to voice Into technology, text can be converted into voice signal and be communicated in the ear of people, while do not heard with speech recognition technology People's what is said or talked about language, so as to reach the purpose of man-machine interaction.Therefore the exploitation of speech synthesis technique and speech recognition technology Become a highly important problem in man-machine interaction.Therefore, it is necessary to which one can be made with the system of finishing man-machine interaction function The system can accomplish the functions such as voice collecting, phonetic synthesis, speech recognition and simulation system operation.

The content of the invention

It is an object of the invention to provide a kind of phonetic synthesis and identifying system based on virtual instrument, uses LabWindows/CVI is developed by Microsoft Speech SDK voice developing instruments, realized as development platform Voice collecting, phonetic synthesis, speech recognition and simulated operation function.

The invention provides a kind of phonetic synthesis and identifying system based on virtual instrument, including：Voice acquisition module, language Sound identification module, voice synthetic module and simulation system operation module；

Voice acquisition module is used to obtain voice signal by microphone；

The voice signal that sound identification module is used to microphone being collected into is identified, and voice signal is converted into text This form；

Voice synthetic module is used to the word content of text formatting being converted into voice signal, and by voice signal by raising Sound device plays out；

Simulation system operation module is used to identify and perform the operational order in voice signal.

Further, voice acquisition module is additionally operable to before voice signal is obtained：

Detect microphone parameters；

Microphone is initialized, and configures microphone parameters；

Microphone attribute is obtained, it is determined that the mode of collection；

Microphone is opened, carries out speech signal collection.

Further, voice synthetic module is specifically used for massage voice reading, pause is read aloud, word speed adjusts and volume adjusting.

Further, sound identification module is additionally operable to before voice signal identification：Voice is known by pattern matching method Other module is trained, and is specifically included：

The vector characteristic for repeatedly obtaining voice signal is recorded in ATL；

The each template obtained in the characteristic vector and ATL of voice signal to be identified carries out similarity-rough set, will be similar Soprano is spent to export as recognition result.

Further, simulation system operation module is specifically used for：

Key mouse input is simulated, including mouse is moved by voice command, clicks, double-click, right and left key selection operation And analogue-key operation；

Corresponding system operatio order is performed, including computer is shut down by voice command, restarts, reduce window Mouth, amplification window, open task manager, switching input method operation；

Corresponding application program is opened, including opens conventional software, notepad, browser, command Window；It is and simple Speech exchange.

Compared with prior art the beneficial effects of the invention are as follows：The system can be made with the system of finishing man-machine interaction function The functions such as voice collecting, phonetic synthesis, speech recognition and simulation system operation can be accomplished.

Brief description of the drawings

Fig. 1 is a kind of structured flowchart of phonetic synthesis and identifying system based on virtual instrument of the present invention.

Embodiment

The present invention is described in detail for shown each embodiment below in conjunction with the accompanying drawings, but it should explanation, these Embodiment is not limitation of the present invention, those of ordinary skill in the art according to these embodiment institute work energy, method, Or equivalent transformation or replacement in structure, belong within protection scope of the present invention.

Join shown in Fig. 1, present embodiments provide a kind of phonetic synthesis and identifying system based on virtual instrument, the system bag Include mainly by four module compositions：Voice acquisition module 10, sound identification module 20, voice synthetic module 30 and simulation system behaviour Make module 40, it is main to be set including the sound card that PC equipment, microphone, loudspeaker and PC equipment mainboards are carried in hardware design It is standby.

During software is realized, mainly by this virtual instrument platform of LabWindows/CVI, LabWindows/ CVI provides a user convenient panel design function, the software interface for designing voice interactive system, is based entirely on simultaneously ANSIC development platform is easy to use Microsoft Speech SDK voice developing instruments.

Each module of the system is described in detail below.

Voice acquisition module：

Voice acquisition module, realized mainly by configuration microphone parameters, this process can borrow on Software for Design Helping the API that windows is carried --- WAVE API are realized.To obtain voice signal, following steps need to be used：

(1) microphone parameters are detected, obtains and sound card number in computer be present, i.e., the number of existing audio input device Amount, is typically chosen acquiescence input equipment.

(2) microphone is initialized, and configures microphone parameters.

(3) microphone attribute is obtained, it is determined that the mode of collection, main how many sound channel of understanding microphone, typically Relatively conventional is single channel and binary channels, while determines sample frequency, mainly point three classes：11.025khz, 22.05khz and 44.1khz。

(4) microphone is opened, carries out speech signal collection.

By setting microphone parameters to realize speech signal collection function, wherein having applied to Windows api interface WAV-API interfaces carry out programming.The design cycle of this module is according to detection microphone, with microphone, acquisition Mike Wind attribute, determine that gathered data mode, unlatching microphone, waveform display processing are carried out.Specifically：

Microphone function is detected, the microphone apparatus being connected with computer can be detected, while in statistic computer equipment It can be used for the quantity of the sound card equipment of audio input.Simultaneously, additionally it is possible to obtain the association attributes of microphone apparatus, such as equipment ID, use state, the number of channels of support, data format, sample frequency etc..

Configure microphone function, main device id, user's call back function, the buffering area lattice of waveform format for matching somebody with somebody microphone Formula, buffer size, then open microphone apparatus, and whether test data receives intact with sending function.

(1) microphone attribute is obtained, it is determined that needing to be used for carrying out the digit of the sound card of speech signal collection, so that it is determined that adopting Sample digit.

(2) microphone is opened, collects voice signal, and by data storage.

(3) waveform display processing, the module need to be distinguished using 8 samplings or 16 samplings.Under 8 data acquisitions, Time domain signal waveform will be shown, and 16 potential energies enough show time domain signal waveform and spectrum waveform.Because 16 sound cards can be by language Sound signal accurately recognizes 65535 units, and 8 sound cards are only able to display to 256 units, cause the larger loss of signal.

Phonetic synthesis and identification module

Voice synthetic module and sound identification module, developed by Microsoft Speech SDK instruments. Microsoft Speech SDK it according to COM standard developments, including underlying protocol is completely independent in the form of C0M components In application layer, developer is set to eliminate the voice technology of complexity during application programming, the voice of let us is opened COM can be based on entirely by distributing.In this Project design, sound identification module is just by identification engine (Recognition Engine) Management, voice synthetic module are just responsible for by speech synthesis engine (Synthesis Engine).

Since the function of voice exploitation is completed jointly by com interface, then we must observe specific during design Working procedure.Overview is into short, being exactly that the operation principle of voice exploitation need to follow the operation principle of com component and general The operation principle (message driving mechanism) of window application, concrete implementation flow are as follows：

(1) COM platform initializations, it is ensured that COM is existing in the implementation procedure of whole program, in program end of run Discharged resource again before.

(2) each speech interface is defined into speech interface object, and will be according to specific job order, in speech recognition , it is necessary to set speech recognition grammar rule, voice signal identification in module, make identification engine in running order；Closed in voice Object, bright reading mode are read aloud, it is necessary to set into module, makes Compositing Engine in running order.

(3) in sound identification module, after syntax rule is identified, the message of speech recognition need to be sent to application program, So as to call the receptance function of identification message, this step is completed mainly by IspRecoContext interfaces.Get voice simultaneously After the message of identification, ISpPhrase interfaces will obtain the result of speech recognition.Above-mentioned steps can circulate, until stopping language Untill method rule.

(4) in voice synthetic module, object is mainly read aloud by definition, mode of operation when reading aloud is (synchronous or different Step), call IspVoice phonetic syntheses interface to complete.

(5) when exiting voice interactive system, COM platforms need to be uninstalled, to prevent there is system mistake.

The major function of the voice synthetic module is made up of four parts：

(1) massage voice reading function, the TTS transform portions of module core the most are realized, wherein user can be according to two kinds Mode carries out TTS conversions, and a kind of is the content of text needed by input through keyboard oneself, then clicks on massage voice reading button and carries out Phonetic synthesis operates；Another kind is by opening a text, by the content of text in software importing text, and is shown Show in a text box of software interface, then click on massage voice reading button again and carry out phonetic synthesis operation；

(2) function of reading aloud is suspended, user can read aloud in pause when not needing function of reading aloud；

(3) word speed regulatory function, user can adjust bright reading rate during massage voice reading according to the custom of oneself；

(4) volume adjusting function, sound size during massage voice reading is adjusted.

Realize that the function mainly applies to phonetic synthesis (ISpVoice) interface, its major function is to realize Text To Speech Conversion.Its effect is realized by its abundant function performance, e.g., uses SpeechLib_ISpeechVoiceSpeak Text data can be converted into speech waveform by function, so that computer is sociable.When in asynchronous working, it can use SpeechLib_ISpeechVoiceGetStatus obtains pronunciation state and text position etc..In the interface, other are permitted More member functions, pass through SpeechLib_ISpeechVoiceSetVolume, SpeechLib_ISpeechVoiceSetRate Etc. these member functions can realize pronouncing frequency, the pronunciation synthesis attribute such as volume are adjusted that therefore the interface is language The core of sound Compositing Engine (Synthesis Engine).

This sound identification module needs the course of work in 2 stages using pattern matching method in use is developed：

(1) training stage.User need by the inner vocabulary of vocabulary (vocabulary can be self-defined as needed by user) according to It is secondary to say one time even several times, and the vector characteristic of the voice signal sent is recorded in ATL.

(2) cognitive phase.The characteristic vector of the voice signal sent and each template in ATL are carried out phase by user Compare like degree, exported similarity soprano as recognition result.

The above-mentioned two stage be user in use, improve the step that has to carry out of voice interactive system phonetic recognization rate Suddenly, the performance of this voice interactive system could be played as far as possible.

This sound identification module major function can be divided into two major classes：

The first kind is that the audio based on MCI (Media Control Interface, MCI) function plays Device：

(1) audio file function is opened, the function is similar with opening text function in voice synthetic module, the audio of support Form has tri- kinds of forms of mp3, wav, wma, meets the daily need of user enough.

(2) audio playing function, it is mainly used in being used for speech recognition by two-channel mixed mode broadcasting audio file.

(3) playing progress bar display function.

(4) audio plays pause, stops function, is mainly used in stopping audio playing, and recovering that audio plays only need to point again Hit audio file.

Second class is the sound identification module based on MicroSoft Speech SDK, and such is the most key of this module Function：

(1) speech identifying function.Using speech identifying function, user can use two ways to transmit voice to computer Signal, a kind of transmitted by the peripheral hardware microphone apparatus of computer, and another kind is by opening a sound recorded in advance Frequency file, and change the microphone apparatus of computer, select two-channel audio mixing pattern, then, will be direct when playing audio file Voice signal is obtained from sound card, rather than is obtained from the loudspeaker of computer, ensures the definition of voice signal.

(2) stop speech identifying function, speech recognition can be suspended at any time, and recover only again tap on language during identification Sound recognition button.

Simulation system operation module

Simulation system operation module is built upon on the basis of speech recognition, is carried out by the voice command identified corresponding Operation, the system operatio type of the simulation designed in this project is divided into following 4 clock：

(1) key mouse input is simulated.User is allowed to move, click to mouse by voice command, double-clicking, right and left key selects Operation is selected etc., while allows user to be operated by voice command analogue-key.

(2) corresponding system operatio order is performed.Allow user shut down, restarted to computer by voice command, Reduce window, amplification window, open task manager, switching input method etc. operation.

(3) corresponding application program is opened.Allow user to open some conventional softwares, for example, word, excel, ppt, Notepad, browser, command Window etc..

(4) simple speech exchange.One " hello " is said to computer, it can respond you one " hello ".

Simulation system operation module is on the basis of speech recognition, whether judges the result according to the result of speech recognition Meet voice command form, if meeting, perform；If not meeting, do not perform and continue to monitor whether outside has phonetic entry.

The phonetic synthesis based on virtual instrument that the present embodiment provides has concentrated voice collecting, voice to close with identifying system Into, speech recognition and simulation system operation module, the medium of man-machine interaction to a certain extent, can be turned into.It is based on LabWindows/CVI phonetic synthesis is low with identifying system cost, efficiency high, easily study.Using in LabWindows/CVI Analog hardware resource establishes Virtual Instrument, realizes real-time collection to voice signal, analyzing and processing, feature extraction, intelligence Synthesis and identification etc., by Application of Virtual in speech recognition and synthesis system, realize the software implementation of instrument, real body Show the thought of " software is exactly instrument ", be easy to man-machine interaction.

Those listed above is a series of to be described in detail only for feasibility embodiment of the invention specifically Bright, they simultaneously are not used to limit the scope of the invention, all equivalent implementations made without departing from skill spirit of the present invention Or change should be included in the scope of the protection.

It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.

Claims

1. a kind of phonetic synthesis and identifying system based on virtual instrument, it is characterised in that including：Voice acquisition module, voice Identification module, voice synthetic module and simulation system operation module；

The voice acquisition module is used to obtain voice signal by microphone；

The voice signal that the sound identification module is used to microphone being collected into is identified, and voice signal is converted into text This form；

The voice synthetic module is used to the word content of text formatting being converted into voice signal, and by voice signal by raising Sound device plays out；

The simulation system operation module is used to identify and perform the operational order in voice signal.

2. a kind of phonetic synthesis and identifying system based on virtual instrument according to claim 1, it is characterised in that described Voice acquisition module is additionally operable to before voice signal is obtained：

Detect microphone parameters；

Microphone is initialized, and configures microphone parameters；

Microphone is opened, carries out speech signal collection.

3. a kind of phonetic synthesis and identifying system based on virtual instrument according to claim 2, it is characterised in that described Voice synthetic module is specifically used for massage voice reading, pause is read aloud, word speed adjusts and volume adjusting.

4. a kind of phonetic synthesis and identifying system based on virtual instrument according to claim 3, it is characterised in that described Sound identification module is additionally operable to before voice signal identification：The sound identification module is instructed by pattern matching method Practice, specifically include：

The each template obtained in the characteristic vector and ATL of voice signal to be identified carries out similarity-rough set, by similarity most High person exports as recognition result.

5. a kind of phonetic synthesis and identifying system based on virtual instrument according to claim 4, it is characterised in that simulation Operation module is specifically used for：

Key mouse input is simulated, including by voice command mouse is moved, clicks, double-clicks, right and left key selection operation and mould Intend button operation；

Corresponding system operatio order is performed, including computer is shut down by voice command, restart, reduces window, put Big window, open task manager, switching input method operation；

Corresponding application program is opened, including opens conventional software, notepad, browser, command Window；And simple language Sound exchanges.