CN109949808A

CN109949808A - The speech recognition appliance control system and method for compatible mandarin and dialect

Info

Publication number: CN109949808A
Application number: CN201910198788.4A
Authority: CN
Inventors: 朱建强
Original assignee: Shanghai Hua Zhen Electronic Technology Co Ltd
Current assignee: Shanghai Hua Zhen Electronic Technology Co Ltd
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2019-06-28

Abstract

The present invention provides the speech recognition appliance control systems and method of a kind of compatible mandarin and dialect, and wherein appliance control system includes: voice receiving module, using local audio input device, receive the audio of setting as the first signal；First signal is carried out identifying processing by speech recognition module, and using identifying processing result as second signal；Household appliance control module, second signal is calculated according to the logic of setting and judges whether calculated result belongs to setting results set, if calculated result belongs to setting results set, control instruction is then issued as third signal according to the setting operation set of setting results set mapping, if calculated result is not belonging to setting results set, failure command is issued as fourth signal；Household electrical appliances operation module receives third signal and makes specified operation according to third signal；The present invention has the advantages that high reliablity, maintenance cost are low, can greatly improve the instruction identification rate of specific user.

Description

The speech recognition appliance control system and method for compatible mandarin and dialect

Technical field

The present invention relates to technical field of voice recognition, and in particular, to a kind of speech recognition of compatible mandarin and dialect Appliance control system and method.

Background technique

China is household electrical appliances, home equipment manufacture big country, annual big small household appliances, home equipment production capacity up to 2,000,000,000.According to Solution, domestic many household appliances manufacturers are using interactive voice voice control as most important strategy.

Speech recognition lands most successful field as current artificial intelligence, existing Mandarin Chinese speech recognition, Through good user experience can be brought, a kind of sound control method and household electrical appliances as disclosed in patent document CN108932947A Equipment, wherein this method comprises: receiving multiple voice messagings, multiple voice messagings are classified, and in every class voice messaging Middle one voice messaging of selection executes corresponding control operation, by adopting the above technical scheme, that is, has implemented integrally multiclass voice letter The control of breath operates, and accurately selects a voice messaging to go to execute in every class voice messaging, noisy in current environment In the case of, control operation is carried out to household appliance only in accordance with a small amount of voice messaging, solves voice noise in environment in the related technology In the case where miscellaneous, the low problem of voice control device accuracy, avoiding more people while controlling household appliance leads to household appliance It can not identify that control operation to error-prone situation, ensure that the accuracy of voice control household appliance.

But other than big city, two wires, three lines to small towns, many user's familys still say dialect, need a kind of side Method is able to solve actual use person, such as old man, woman, these say the user of dialect in two or three line city cities and towns etc., can also be with Voice control household electrical appliances, the identification of original mandarin can also be compatible with, allow in this way the interactive voice of people and household appliance it is more natural, And hommization.

Summary of the invention

For the defects in the prior art, the object of the present invention is to provide the speech recognitions of a kind of compatible mandarin and dialect Appliance control system and method.

The speech recognition appliance control system of a kind of compatible mandarin and dialect that provide according to the present invention, including voice connect By module, speech recognition module, household appliance control module and household electrical appliances operation module；

Voice receiving module: using local audio input device, the audio of setting is received as original signal；

Front end processing block: original signal is received, and original signal progress front-end processing is obtained into the first signal；

Speech recognition module: the first signal is subjected to pattern-recognition, and using pattern recognition result as second signal；

Household appliance control module: second signal is calculated according to the logic of setting and judges whether calculated result belongs to sets Determine results set, if calculated result belongs to setting results set, is sent out according to the setting operation set of setting results set mapping Control instruction is as third signal out, if calculated result is not belonging to setting results set, issues failure command as the 4th letter Number；

Household electrical appliances operation module: it receives third signal and specified operation is made according to third signal；

The front-end processing includes that speech characteristic value extracts.

Preferably, the speech recognition appliance control system of the compatible mandarin and dialect further include:

Voice playing module: the recognition result information of third signal and fourth signal and voice broadcast setting is received.

Preferably, the calculating in household appliance control module includes that the calculating of dialect acoustic model and mandarin acoustic model calculate, Second signal is carried out dialect acoustic model according to the logic of setting respectively to calculate and the mandarin acoustic model side of being calculated It says calculated result and mandarin calculated result, if dialect calculated result belongs to setting results set, uses dialect calculated result, Otherwise, then mandarin calculated result is used.

Preferably, the household appliance control module includes dialect training submodule；

Dialect trains submodule: user is selected after training instruction, and the voice that repeatedly training study user specifies is called out Awake word to establish voice wake-up word and to the mapping between training instruction, and updates dialect acoustic model.

Preferably, the audio input device includes microphone or microphone array；The speech characteristic value extracts packet It includes and speech characteristic value is extracted by mel-frequency cepstrum, the speech characteristic value includes speech characteristic vector coding；The front end Processing further includes any one of noise reduction, speech terminals detection and voice framing this three or appoints multiple combinations；

The voice framing, which refers to original signal, is divided into multiple speech frames with the time span set；

The end-point detection refers to the detection that sound end is carried out according to the time domain parameter of original signal, thus by original letter Number divide into voice signal period and non-speech audio period；

The noise reduction includes the filtering of stable state noise and/or the inhibition of dynamic noise；

Wherein, time domain parameter includes short-time magnitude and/or short-time zero-crossing rate；The filtering of stable state noise includes passing through webrtc Algorithm filtering；The inhibition of dynamic noise includes being inhibited by the beam forming of microphone array.

The speech recognition household electric appliance control method of a kind of compatible mandarin and dialect that provide according to the present invention, including voice connect By step, speech recognition steps, home wiring control step and household electrical appliances operating procedure；

Voice receives step: using local audio input device, receiving the audio of setting as the first signal；

Speech recognition steps: the first signal is subjected to identifying processing, and using identifying processing result as second signal；

Home wiring control step: second signal is calculated according to the logic of setting and judges whether calculated result belongs to sets Determine results set, if calculated result belongs to setting results set, is sent out according to the setting operation set of setting results set mapping Control instruction is as third signal out, if calculated result is not belonging to setting results set, issues failure command as the 4th letter Number；

Household electrical appliances operating procedure: it receives third signal and specified operation is made according to third signal；

The identifying processing includes that speech characteristic value extracts.

Preferably, the speech recognition household electric appliance control method of the compatible mandarin and dialect further include:

Voice plays step: receiving the recognition result information of third signal and fourth signal and voice broadcast setting.

Preferably, the calculating in home wiring control step includes that the calculating of dialect acoustic model and mandarin acoustic model calculate, Second signal is carried out dialect acoustic model according to the logic of setting respectively to calculate and the mandarin acoustic model side of being calculated It says calculated result and mandarin calculated result, if dialect calculated result belongs to setting results set, uses dialect calculated result, Otherwise, then mandarin calculated result is used.

Preferably, the home wiring control step includes dialect training sub-step；

Dialect trains sub-step: user is selected after training instruction, and repeatedly the specified voice of training wakes up word, to build Vertical voice wakes up word and to the mapping between training instruction.

Preferably, the audio input device includes microphone or microphone array；The speech characteristic value extracts packet It includes and speech characteristic value is extracted by mel-frequency cepstrum, the speech characteristic value includes speech characteristic vector coding；The identification Processing further includes any one of noise reduction, speech terminals detection and voice framing this three or appoints multiple combinations；

Compared with prior art, the present invention have it is following the utility model has the advantages that

1, the speech recognition appliance control system of compatible mandarin provided by the invention and dialect, have structure it is simple, can The advantage high by property, maintenance cost is low；

2, the speech recognition appliance control system and method for compatible mandarin provided by the invention and dialect can identify On the basis of dialect, further active training, to greatly improve the instruction identification rate of specific user, or even can be real The now indifference identification between each languages；

3, the speech recognition appliance control system and method for compatible mandarin provided by the invention and dialect, by noise reduction, Speech terminals detection and voice framing have effectively carried out validity screening to input voice, to reduce speech characteristic value Calculation amount needed for extracting, and then improve the efficiency of speech recognition.

Detailed description of the invention

Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:

Fig. 1 is dialect learning training flow chart

Fig. 2 is speech recognition flow chart

Fig. 3 is the schematic diagram of speech recognition appliance control system.

Specific embodiment

The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, several changes and improvements can also be made.These belong to the present invention Protection scope.

The front-end processing includes that speech characteristic value extracts.

Specifically, the speech recognition appliance control system of the compatible mandarin and dialect further include:

Calculating in household appliance control module includes that the calculating of dialect acoustic model and mandarin acoustic model calculate, i.e., by second Signal carries out the calculating of dialect acoustic model according to the logic of setting respectively and mandarin acoustic model is calculated dialect and calculates knot Fruit and mandarin calculated result use dialect calculated result, otherwise, then if dialect calculated result belongs to setting results set Using mandarin calculated result.

The household appliance control module includes dialect training submodule；

The audio input device includes microphone or microphone array；It includes passing through plum that the speech characteristic value, which extracts, Your frequency cepstral extracts speech characteristic value, and the speech characteristic value includes speech characteristic vector coding；The front-end processing is also wrapped It includes any one of noise reduction, speech terminals detection and voice framing this three or appoints multiple combinations；The voice framing refers to Original signal is divided into multiple speech frames with the time span set；The end-point detection refers to the time domain according to original signal Parameter carries out the detection of sound end, so that original signal is divided into voice signal period and non-speech audio period；It is described Noise reduction includes the filtering of stable state noise and/or the inhibition of dynamic noise；

The identifying processing includes that speech characteristic value extracts.

More specifically, the speech recognition household electric appliance control method of the compatible mandarin and dialect further include:

Voice plays step: receiving the recognition failures prompting message of fourth signal and voice broadcast setting.

Calculating in home wiring control step includes that the calculating of dialect acoustic model and mandarin acoustic model calculate, i.e., by second Signal carries out the calculating of dialect acoustic model according to the logic of setting respectively and mandarin acoustic model is calculated dialect and calculates knot Fruit and mandarin calculated result use dialect calculated result, otherwise, then if dialect calculated result belongs to setting results set Using mandarin calculated result.

The home wiring control step includes dialect training sub-step；

The audio input device includes microphone or microphone array；It includes passing through plum that the speech characteristic value, which extracts, Your frequency cepstral extracts speech characteristic value, and the speech characteristic value includes speech characteristic vector coding；The identifying processing also wraps It includes any one of noise reduction, speech terminals detection and voice framing this three or appoints multiple combinations；

Further, preference of the invention is related to technical field of voice recognition, disclose a kind of compatible mandarin and The speech recognition appliance control system of accent recognition, the program include: voice receiving device, speech recognition equipment, voice broadcasting Device, domestic electric appliances controller, key, touch display screen, appliance system.

1, voice receiving device utilizes the sound pick-up outfit of local (terminal is embedded system), such as single microphone, Mike Wind array, lasting reception recording, by recording output to speech recognition equipment.

2, speech recognition equipment after receiving recording, carries out noise reduction, end-point detection, voice framing, speech characteristic value extraction (voice coding), voice coding are sent in dialect acoustic model and mandarin acoustic model simultaneously and carry out identification calculating, preferentially adopt With accent recognition as a result, if accent recognition is come to nothing, then using the result of mandarin identification.

3, domestic electric appliances controller after receiving recognition result, does logic judgment, then control appliance system is gone to go to execute.

The invention especially adapts to the identification for China under mandarin is not universal enough at present or family's dialect environment, uses Family can both identify with mandarin, can also with accent recognition, can voice control household electrical appliances, be at present in speech recognition in household electrical appliances Most suitable solution in practical application.

In preference of the present invention, including following process:

1, user's training dialect acoustic model

Word and every phonetic order are waken up, needs learning training multipass, training is into dialect acoustic model.

The foundation of dialect acoustic model, dialect wake up word and phonetic order through overfitting, and training is arrived in dialect acoustic model, The speech characteristic vector (voice coding) of speech frame and every frame in dialect acoustic model including phonetic order.Dialect acoustics , can be by the operation of key, touch display screen on household electrical appliances in the training process of model, to waking up, word, arbitrarily control refers to It enables and individually or all learns or delete.

2, lasting to receive recording input:

Utilize the sound pick-up outfit of local (terminal can be embedded system, PC or other SOC systems), such as single wheat Gram wind, microphone array, lasting reception recording, after receiving recording, carry out noise reduction, end-point detection, voice framing, phonetic feature Value extracts (voice coding).

Voice framing, the voice data that exactly recording is come in, is divided into the same each frame of length, general using tens millis One frame of second.

End-point detection is analyzed the recording of input, by voice signal voice and the non-speech audio period distinguish It comes, determines the starting point of voice signal.It can be carried out using the time domain parameter-short-time magnitude and short-time zero-crossing rate of voice The end-point detection of voice.First it is contemplated that using the amplitude of signal as feature, mute section and voice segments are distinguished.As long as setting one A thresholding, when signal amplitude be more than the thresholding, be considered as voice and start, being considered as voice below amplitude reduction thresholding terminates. End-point detection is carried out to voice signal, starting point, the terminal of each input voice is accurately determined, advantageously reduces system operations Amount improves system performance.

Voice de-noising, including using the filtering to stable state noise and the inhibition to dynamic noise.Dynamic noise passes through wheat The beam forming of gram wind array inhibits, and stable state noise is filtered by webrtc algorithm.

Using MFCC (Mel-scale Frequency Cepstrum Coefficient) feature, to extract phonetic feature Value needs to carry out voice signal the processing such as frequency-domain transform, Cepstrum Transform, difference, finally obtains a 40 dimensions left side in this module Right characteristic vector.

3, speech recognition process:

It records while being sent in dialect acoustic model and mandarin acoustic model and carry out identification calculating, dialect acoustic model is deposited Storage is calculated in local, accent recognition in local completion.Mandarin acoustic model can in local, can also in cloud server end, That is mandarin identification is completed on local or Cloud Server.

4, recognition result judges:

It is preferential using accent recognition as a result, if accent recognition is come to nothing, then using the result of mandarin identification.Such as Fruit is all without recognition result, then it is assumed that this time recognition failures.

5, home wiring control.

Result after speech recognition is transmitted to the control panel of household electrical appliances, and home wiring control plate control appliance system executes operation.

In preference of the present invention:

Speech recognition module, master chip use X1800/X1830 chip, include the DDR of 64M/128M byte in the chip RAM, plug-in 128M byte flash run built-in Linux 3.1.0 system.

Voice receiving module is the digital microphone array of 2 or 4 st DT05.

Voice playing module includes the loudspeaker of 3 watts of analog amplifier and 4 Europe.

Household appliance control module, including single-chip microcontroller and peripheral drive circuit are essentially 8 or 32 monolithics at present on household electrical appliances Machine, such as the STM8/STM32 chip of ST company.

Phonological component (voice receiving module, speech recognition module and front end processing block) and home wiring control part (family Electric control module, household electrical appliances operation module and voice playing module) between can be led to by interfaces such as serial ports or IIC Recognition result is sent to home wiring control part by letter, phonological component, and household appliance control module believes the electrically operated rear result of family, key Breath, touch display screen touch information feed back to phonological component.

Further, letter is made to dialect acoustic training model, mandarin acoustic model, accent recognition, mandarin identification Singly it is described below:

1, dialect acoustic training model.

The foundation of dialect acoustic model, dialect wake up word and phonetic order through overfitting, and training is arrived in dialect acoustic model, The speech characteristic vector (voice coding) of speech frame and every frame in dialect acoustic model including phonetic order.

When training, voice wakes up word, phonetic control command, and user can do any definition, not limit languages, does not limit interior Hold, in order to improve recognition effect, every requires training repeatedly, and user can enter the training of common phonetic control command.

2, mandarin acoustic model, the foundation of mandarin acoustic model, in order to adapt to all ages and classes, different geographical, difference Crowd, different channels, different terminals and different noise circumstance application environment, acquire a large amount of voice corpus and corpus of text to instruct Practice mandarin acoustic model, the modeling method of acoustic model includes but is not limited to DNN (deep neural network), HMM (hidden Ma Erke Husband's model), GMM (Gauss model).

3, accent recognition, the user recording after front-end processing after voice coding, are sent in dialect acoustic model and are identified It calculates, returns to recognition result, corresponding serial number and identification score value when recognition result is trained.

4, mandarin identifies, either local or Cloud Server, generated in advance mandarin acoustic model, language Sound coding, which is sent in acoustic model, carries out identification calculating, returns to recognition result, recognition result is the corresponding text of voice and identification Score value.

5, recognition result judges, recognition result is preferentially using identification accent recognition as a result, if accent recognition score value is lower than Threshold value then uses mandarin recognition result, if mandarin recognition result is also below recognition threshold, this recognition failures.

Practical application scene is as follows:

Such as air-conditioning, it is mandarin identification " small Fang little Fang " that the factory voice of default, which wakes up word, and user can oneself instruction Practice the wake-up word of any saying, such as " the small Bai little Bai " of Shanghai native language, in use, user directly say Shanghai native language " little Bai is small It is white ", air-conditioning can be equally waken up, and voice prompting " I you say " can accomplish unusual personalization.

Any definition of phonetic control command, such as certain phonetic control command of factory defaulted are " opening air-conditioning ", User oneself can train the instruction arbitrarily spoken, such as " please air-conditioning is opened " of Shanghai native language, in use, user directly says " please air-conditioning is opened " of sea words, air-conditioning will automatically open, and voice prompting " opening air-conditioning for you ".User speaks standard Chinese pronunciation " opening air-conditioning ", air-conditioning can also identify.

Mandarin and accent recognition system can satisfy the personalization of user, allow interactive voice, voice control more close to The rigid need of user, the electrical equipment of family after all, such as washing machine, kitchen ventilator, idle call comparison mostly or old man and woman, There are also the crowds in many two or three lines city, and directly speak a dialect.The present invention had both met the basic need of mandarin identification It asks, also meets personalized rigid demand, be speech recognition most suitable solution at present.

In the description of the present application, it is to be understood that term " on ", "front", "rear", "left", "right", " is erected at "lower" Directly ", the orientation or positional relationship of the instructions such as "horizontal", "top", "bottom", "inner", "outside" is orientation based on the figure or position Relationship is set, description the application is merely for convenience of and simplifies description, rather than the device or element of indication or suggestion meaning are necessary It with specific orientation, is constructed and operated in a specific orientation, therefore should not be understood as the limitation to the application.

Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims

1. the speech recognition appliance control system of a kind of compatible mandarin and dialect, which is characterized in that including voice receiving module, Speech recognition module, household appliance control module and household electrical appliances operation module；

Household appliance control module: second signal is calculated according to the logic of setting and judges whether calculated result belongs to setting knot Fruit set issues control according to the setting operation set of setting results set mapping if calculated result belongs to setting results set System instruction is used as third signal, if calculated result is not belonging to setting results set, issues failure command as fourth signal；

The front-end processing includes that speech characteristic value extracts.

2. the speech recognition appliance control system of compatible mandarin according to claim 1 and dialect, which is characterized in that institute State the speech recognition appliance control system of compatible mandarin and dialect further include:

3. the speech recognition appliance control system of compatible mandarin according to claim 1 and dialect, which is characterized in that family Calculating in electric control module includes that the calculating of dialect acoustic model and mandarin acoustic model calculate, i.e., presses second signal respectively The calculating of dialect acoustic model is carried out according to the logic of setting and dialect calculated result and mandarin is calculated in mandarin acoustic model Calculated result uses dialect calculated result if dialect calculated result belongs to setting results set, otherwise, then uses mandarin Calculated result.

4. the speech recognition appliance control system of compatible mandarin according to claim 3 and dialect, which is characterized in that institute Stating household appliance control module includes dialect training submodule；

Dialect trains submodule: user is selected after training instruction, and the voice that repeatedly training study user specifies wakes up word, To establish voice wake-up word and to the mapping between training instruction, and update dialect acoustic model.

5. the speech recognition appliance control system of compatible mandarin according to any one of claim 1 to 4 and dialect, It is characterized in that, the audio input device includes microphone or microphone array；Feature in the speech characteristic value extraction Value includes voice coding；The identifying processing further includes any in noise reduction, speech terminals detection and voice framing this three Kind appoints multiple combinations.

6. the speech recognition household electric appliance control method of a kind of compatible mandarin and dialect, which is characterized in that including voice receive step, Speech recognition steps, home wiring control step and household electrical appliances operating procedure；

Home wiring control step: second signal is calculated according to the logic of setting and judges whether calculated result belongs to setting knot Fruit set issues control according to the setting operation set of setting results set mapping if calculated result belongs to setting results set System instruction is used as third signal, if calculated result is not belonging to setting results set, issues failure command as fourth signal；

The identifying processing includes that speech characteristic value extracts.

7. the speech recognition household electric appliance control method of compatible mandarin according to claim 6 and dialect, which is characterized in that institute State the speech recognition household electric appliance control method of compatible mandarin and dialect further include:

8. the speech recognition household electric appliance control method of compatible mandarin according to claim 6 and dialect, which is characterized in that family Calculating in electric control step includes that the calculating of dialect acoustic model and mandarin acoustic model calculate, i.e., presses second signal respectively The calculating of dialect acoustic model is carried out according to the logic of setting and dialect calculated result and mandarin is calculated in mandarin acoustic model Calculated result uses dialect calculated result if dialect calculated result belongs to setting results set, otherwise, then uses mandarin Calculated result.

9. the speech recognition household electric appliance control method of compatible mandarin according to claim 8 and dialect, which is characterized in that institute Stating home wiring control step includes dialect training sub-step；

Dialect trains sub-step: user is selected after training instruction, and repeatedly the specified voice of training wakes up word, to establish language Sound wakes up word and to the mapping between training instruction.

10. the speech recognition household electric appliance control method of compatible mandarin according to any one of claims 6 to 9 and dialect, It is characterized in that, the audio input device includes microphone or microphone array；Spy in the speech characteristic value extraction Value indicative includes voice coding；The identifying processing further includes appointing in noise reduction, speech terminals detection and voice framing this three A kind of or multiple combinations.