CN109119073A - Audio recognition method, system, speaker and storage medium based on multi-source identification - Google Patents
Audio recognition method, system, speaker and storage medium based on multi-source identification Download PDFInfo
- Publication number
- CN109119073A CN109119073A CN201810673599.3A CN201810673599A CN109119073A CN 109119073 A CN109119073 A CN 109119073A CN 201810673599 A CN201810673599 A CN 201810673599A CN 109119073 A CN109119073 A CN 109119073A
- Authority
- CN
- China
- Prior art keywords
- recognition
- speech
- sound box
- intelligent sound
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
Abstract
The invention discloses a kind of audio recognition method, system, speaker and storage mediums based on multi-source identification, which comprises obtains user speech by intelligent sound box;The user speech that intelligent sound box will acquire identifies the user speech by least two speech recognition platforms, obtains at least two recognition results;Intelligent sound box obtains at least two recognition result, compares at least two recognition result that at least two speech recognition platforms identify;Intelligent sound box exports identical at least two recognition result;It is exported again after intelligent sound box is same at least two recognition result progress having differences.Row identifies when the present invention in intelligent sound box by being arranged at least two speech recognition platforms to user speech, it is exported when recognition result is identical, in recognition result difference, carries out obtaining final recognition result after sameization to be exported again, greatly improve intelligent sound box precision of identifying speech.
Description
Technical field
The present invention relates to field of speech recognition more particularly to a kind of audio recognition methods based on multi-source identification, system, sound
Case and storage medium.
Background technique
Speech recognition is the key technology by realizing human-computer interaction with the order of machine recognition user voice, can be shown
The mode for improving human-computer interaction is write so that user can complete more multitask while saying order.Speech recognition is to pass through
Speech recognition engine that online or off-line training obtains is realized.Speech recognition process can be generally divided into training stage and knowledge
The other stage.In the training stage, the mathematical model being based on according to speech recognition engine statistically obtains sound from training data
Learn model (acoustic model, AM) and vocabulary (lexicon).In the recognition stage, speech recognition engine uses acoustics
Model and vocabulary handle the voice of input, obtain speech recognition result.For example, being carried out from the audiograph of input sound
Then feature extraction obtains phoneme (such as [i], [o] etc.) sequence according to acoustic model, finally from vocabulary to obtain feature vector
Middle positioning and the higher word of aligned phoneme sequence matching degree, even sentence.
In speech recognition system, more than one speech recognition engine may be loaded with and come while identifying same voice.Example
Such as, the first speech recognition engine can be speaker's related voice identification (speaker-dependent automatic
Speech recognition, SD-ASR) engine, it is trained to identify the voice from speaker dependent and export include pair
The recognition result for the score answered.Second speech recognition engine can be the independent voice identification (speaker-independent that speaks
Automatic speech recognition, SI-ASR) engine, it can identify the voice from any user and export packet
Include the recognition result of corresponding score.
In the application of speech recognition, other than human-computer interaction, there are also the applications of social software, and user speech is converted
It is exported for text, either human-computer interaction or social application, how to improve the precision of speech recognition is all a problem.
Summary of the invention
The purpose of the present invention is in view of the above-mentioned drawbacks of the prior art, providing a kind of voice knowledge based on multi-source identification
Other method, system, speaker and storage medium.
The technical solution adopted by the present invention is that providing a kind of audio recognition method based on multi-source identification, the method packet
It includes:
User speech is obtained by intelligent sound box;
The user speech that intelligent sound box will acquire knows the user speech by least two speech recognition platforms
Not, at least two recognition results are obtained;
Intelligent sound box obtains at least two recognition result, compares at least two speech recognition platforms and identifies
At least two recognition result arrived;
Intelligent sound box exports identical at least two recognition result;
It is exported again after intelligent sound box is same at least two recognition result progress having differences.
Preferably, the user speech that the intelligent sound box will acquire is by least two speech recognition platforms to the user
Voice is identified, before obtaining at least two recognition results, the method also includes:
Intelligent sound box is provided with the different speech recognition platforms of at least two recognition strategies as at least two voice
Identifying platform;
The vocal print of user is acquired and stored by intelligent sound box;
The user speech that will acquire is denoised.
The user speech is identified using at least two speech recognition platforms, speech recognition can be improved
Precision, and the speech recognition platforms for selecting at least two recognition strategies different are as at least two described in the identification user speech
A speech recognition platforms, under different recognition strategies, obtained recognition result precision is more guaranteed.It acquires and stores user's
The vocal print of user is carried out speech recognition, available higher accuracy of identification as identification sample by vocal print.To user's language
Sound is denoised, and makes source of sound be easier to be identified, while also improving accuracy of identification.
Preferably, the intelligent sound box wrap with exporting again and again at least two recognition result having differences
It includes:
Difference section is distinguished in intelligent sound box, and context semantic analysis is used to the difference section;
It calls the convolutional Neural training pattern of cloud computing calculate the semanteme of at least two recognition result, determines it
In one exported as recognition result.
Obtained at least two identification of the user speech is identified by least two speech recognition platforms
As a result it is not necessarily the same, when at least two recognition result is not identical, which identification knot of output can not be determined
Fruit.The convolutional Neural training pattern in cloud computing is called calculate the semanteme of at least two recognition result, to obtain
The recognition result for meeting semantic habit in semantic base is exported, because recognition result meets semantic habit by model calculating,
So the result precision of identification can be improved.
Preferably, the intelligent sound box wrap with exporting again and again at least two recognition result having differences
It includes:
Select at least one corresponding second speech recognition engine of at least two speech recognition platforms to the user
Voice is identified again, obtains multiple second recognition results;
The multiple recognition result and the multiple second recognition result are compared;
The highest recognition result of same rate is selected to be exported.
It for the recognition result having differences, is again identified that by the second speech engine, increases the number of identification, improved
The precision of identification.
Preferably, the intelligent sound box wrap with exporting again and again at least two recognition result having differences
It includes:
Difference section is distinguished, the difference section is searched for generally;
Selection is searched for the highest recognition result of matching degree generally and is exported.
By searching for generally to difference section, difference section is searched the highest content of matching degree and is replaced, and searches for
Content semantically meeting habit, the precision of speech recognition equally can be improved.
A kind of speech recognition system based on multi-source identification is also provided, the system comprises:
Input module is arranged in intelligent sound box for obtaining user speech;
At least two speech recognition modules are arranged in intelligent sound box for identifying to the user speech, obtain
At least two recognition results;
Contrast module is arranged in intelligent sound box for comparing the institute that at least two speech recognition module identifies
State at least two recognition results;
Same module is arranged in intelligent sound box same for carrying out at least two recognition result having differences
One;
Output module is arranged in intelligent sound box for exporting to same at least two recognition result.
Preferably, at least two speech recognition module is the different speech recognition module of at least two recognition strategies,
The speech recognition module includes:
Submodule is stored, for storing the vocal print of collected user;
Submodule is denoised, for denoising to the user speech of acquisition.
Preferably, the same module includes:
Cloud computing submodule, it is semantic for analyzing difference section context, call the convolutional Neural training pattern of cloud computing
Calculate the semanteme of at least two recognition result;
Submodule is searched for, for searching for generally to difference section;
At least one second speech recognition submodule on the speech recognition module is set, for user's language
Sound again identifies that, obtains multiple second recognition results.
A kind of intelligent sound box is also provided, the intelligent sound box includes processor and memory, is stored in the memory
At least one instruction, at least one section of program, code set or instruction set, at least one instruction, at least one section of program, institute
It states code set or described instruction collection is loaded by the processor and executed to realize that the voice as the aforementioned based on multi-source identification is known
Other method.
A kind of computer readable storage medium is also provided, at least one instruction, at least one are stored in the storage medium
Duan Chengxu, code set or instruction set, at least one instruction, at least one section of program, the code set or the described instruction
Collection is loaded by the processor and is executed to realize the audio recognition method as the aforementioned based on multi-source identification.
Compared with prior art, the present invention at least has the advantages that the present invention by being arranged in intelligent sound box
When at least two speech recognition platforms are to user speech row identify, exported when recognition result is identical, recognition result not
Meanwhile carrying out obtaining final recognition result after sameization being exported again, greatly improve intelligent sound box precision of identifying speech.
Detailed description of the invention
Fig. 1 is the audio recognition method flow chart based on multi-source identification of the embodiment of the present invention;
Fig. 2 is a kind of flow chart that sameization of the embodiment of the present invention is handled;
Fig. 3 is another flow chart that sameization of the embodiment of the present invention is handled;
Fig. 4 is another flow chart that sameization of the embodiment of the present invention is handled;
Fig. 5 is the speech recognition system module map based on multi-source identification of the embodiment of the present invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and examples.
As shown in Figure 1, the invention proposes a kind of audio recognition method based on multi-source identification, it is described to be identified based on multi-source
Audio recognition method implement a kind of speech recognition environment, the environment includes: terminal.Wherein, the terminal can be
Intelligent sound box, smart phone, tablet computer, laptop and desktop computer etc., the present invention is not to the product class of the terminal
Type does specific restriction.The terminal is equipped with social or human-computer interaction class application, and the social or human-computer interaction class application
Terminal built-in microphone and display device can be called.
In embodiments of the present invention, the environment is preferably intelligent sound box, and setting can acquire language in the intelligent sound box
The microphone and display screen of sound are additionally provided with speech identifying function module in intelligent sound box, certainly, as a kind of possible
Implementation environment, the intelligent sound box can carry out network signal connection with the voice platform for providing speech-recognition services.
The described method includes:
11, user speech is obtained;Specifically, taking sound equipment to obtain user's language by being arranged in the intelligent sound box
Sound, it is described that sound equipment is taken to can be the equipment that microphone etc. has sound collection.
Further, in order to obtain better source of sound, can in taking sound equipment settable denoising device, in the source of sound
Head is denoised, and source quality is improved, to reduce the factor of interference speech recognition.The user speech can be turned by voice
Parallel operation is converted to audio signal.
Further, the audio signal can state digital signal by conversion and be output to speech recognition platforms.
12, multi-platform speech recognition, the user speech that the intelligent sound box will acquire are flat by least two speech recognitions
Platform identifies the user speech, obtains at least two recognition results;The multi-platform speech recognition 12 is to will acquire
To user speech be sent to the step of different speech recognition platforms carry out speech recognitions, while the voice for obtaining multiple platforms is known
Not as a result, comparing judgement to the speech recognition result of multiple platforms of acquisition.
Specifically, the multi-platform speech recognition can be realized in the intelligent sound box, it can be in the intelligent sound box
In build multiple groups speech recognition engine, and semantic base is respectively configured, it should be noted that builds in the intelligent sound box is more
In group speech recognition engine, the speech recognition engine of each group uses different recognition strategies, in the configuration of the semantic base, institute
Combined strategy semantic in semantic base is stated also to be different.Such as difference can be the feature of speech recognition engine
In extraction module, Mel cepstrum coefficient, or perception linear predictor coefficient can be used for the extracting method of speech feature vector
It carries out;Difference can also be the foundation of acoustic model, such as use Hidden Markov Model-gauss hybrid models, or
Convolutional neural networks or deep layer nerve net carry out.In semantic base on semantic combined strategy, difference can be language
The emphasis of method, such as, semantic side focuses on the tense of verb, semantic emphasis in some semantic bases in some semantic bases
It is the difference of nearly justice or unisonance, semantic side focuses on the integrality of syntactic structure in some semantic bases.
As a kind of possible embodiment, the multiple groups speech recognition engine and corresponding configuration built in the intelligent sound box
Semantic base in, the speech recognition engine can using multiple semantic bases obtain speech recognition result, that is to say, that one group of language
Sound identifies that engine can obtain multiple speech recognition results by multiple semantic bases.
Certainly, as alternatively possible embodiment, the multiple groups speech recognition engine built in the intelligent sound box and
In the semantic base of corresponding configuration, it can also be multiple groups speech recognition engine by a semantic base and obtain multiple speech recognition results.
In addition, as another possible embodiment, the multiple groups speech recognition engine built in this intelligent sound box of writing
And in the semantic base of corresponding configuration, when being identified every time to voice, can for one group of identification engine random fit one or
Multiple semantic bases obtain speech recognition result.
Wherein, the output of institute's speech recognition result can use digital signal, convenient for comparison.
13, whether speech recognition result is identical, the intelligent sound box acquisition at least two recognition result, described in comparison
At least two recognition result that at least two speech recognition platforms identify;If the speech recognition result of multiple platforms
Identical, then it is accurate to represent speech recognition, is transferred to step 16, can be exported, the speech recognition platforms include: that speech recognition is drawn
It holds up and semantic base, the speech recognition engine is made of voice recognition chip and its circuit, the speech recognition engine is according to language
Sound signal matches the semanteme in semantic base, obtains the semanteme to match with the voice signal.
14, at least two recognition result that has differences carry out it is same after export again;If the voice of multiple platforms
Recognition result is different, and different reasons occur may be that sound source is interfered, and platform can not accurately identify, it is also possible to platform
Caused by recognition strategy is different, this when, needs were same to different recognition result progress, determined final recognition result again
It is exported.
15, identical at least two recognition result is exported.Digital and analogue signals are additionally provided in the terminal to turn
Parallel operation, the digital signal that speech recognition platforms are identified are converted to analog signal and export, and meet recognition result more
The reading habit of people.
In embodiments of the present invention, at least two speech recognition platforms that pass through identify the user speech,
Before obtaining at least two recognition results, the method also includes:
The speech recognition platforms for selecting at least two recognition strategies different are as at least two speech recognition platforms;It adopts
The speech recognition result obtained with different recognition strategies, because the speech recognition engine used is different, the configuration of semantic base is not
Together, the precision of speech recognition result is guaranteed, effectively increases the precision of speech recognition result.
Acquire and store the vocal print of user;Vocal print (Voiceprint) is the carrying speech letter that electricity consumption acoustic instrument is shown
The sound wave spectrum of breath.Vocal print not only has specificity, but also has the characteristics of relative stability.After adult, the sound of people can be protected
Hold for a long time stablize relatively it is constant.No matter talker is deliberately to imitate other people sound and the tone, or whisper in sb.'s ear is softly talked, even if mould
Remarkably true to lifely imitative, vocal print is not but identical always.By vocal print characteristic, speech recognition platforms can be made to be easier to capture user's
Voice band, to improve the precision of speech recognition.
The user speech that will acquire is denoised.From the collected user speech of the microphone of the terminal built-in, as
Voice to be identified can generate noise because of the influence of external environment, also referred to as noise, can make to voice to be identified in acquisition
At interference, to influence the precision of speech recognition, make the accuracy decline of speech recognition.
Specifically, can be denoised to voice to be identified to obtain more good source of sound, ambient sound can be carried out in advance
Acquisition, converts digital audio and video signals for ambient sound by analog-digital converter, which can be used as reference signal,
When denoising to voice to be identified, the digital signal of this part is eliminated.
The user speech is identified using at least two speech recognition platforms, speech recognition can be improved
Precision, and the speech recognition platforms for selecting at least two recognition strategies different are as at least two described in the identification user speech
A speech recognition platforms, under different recognition strategies, obtained recognition result precision is more guaranteed.It acquires and stores user's
The vocal print of user is carried out speech recognition, available higher accuracy of identification as identification sample by vocal print.To user's language
Sound is denoised, and makes source of sound be easier to be identified, while also improving accuracy of identification.
As shown in Fig. 2, in embodiments of the present invention, described pair of at least two recognition result having differences carries out same
It exports again and again, comprising steps of
21, difference section is distinguished, context semantic analysis is used to the difference section;Ignore difference section, in semanteme
The front and back same section of the difference section of recognition result is matched in library, obtains similar semanteme.
22, model training calls the convolutional Neural training pattern of cloud computing to carry out calculating at least two recognition result
Semanteme.
It before calling the convolutional Neural training pattern of cloud computing to be calculated, needs to be trained model, keeps convolution refreshing
The predicted value to recognition result semanteme can be calculated rapidly through network.The predicted value replaces the difference section in recognition result, with
The determining recognition result of identical part composition in recognition result.
23, it determines recognition result, determines that one of them is exported as recognition result.After digital analog converter is converted,
The exportable display device in terminal of determining recognition result, it is of course also possible to directly export shape without being converted to analog signal
At an instruction to terminal.
Obtained at least two identification of the user speech is identified by least two speech recognition platforms
As a result it is not necessarily the same, when at least two recognition result is not identical, which identification knot of output can not be determined
Fruit.The convolutional Neural training pattern in cloud computing is called calculate the semanteme of at least two recognition result, to obtain
The recognition result for meeting semantic habit in semantic base is exported, because recognition result meets semantic habit by model calculating,
So the result precision of identification can be improved.
As shown in figure 3, as a kind of embodiment of the invention, described pair of at least two recognition result having differences
Progress is same to be exported again and again, comprising the following steps:
31, the second speech recognition engine selects at least one corresponding second language of at least two speech recognition platforms
Sound identification engine identifies the user speech again, obtains multiple second recognition results;Before being changed without semantic base
It puts, the second speech recognition engine is set on same speech recognition platforms, precision of identifying speech can be improved, reduce voice and know
Other engine bring physical influence.
32, recognition result compares, and the multiple recognition result and the multiple second recognition result are compared;More
In a recognition result, compares and obtain the least recognition result of difference section, the as same highest recognition result of rate, then to same
The highest recognition result of rate carries out semantic analysis, obtains determining recognition result.
33, it determines recognition result, the highest recognition result of same rate is selected to be exported.After digital analog converter is converted,
The exportable display device in terminal of determining recognition result, it is of course also possible to directly export shape without being converted to analog signal
At an instruction to terminal.
It for the recognition result having differences, is again identified that by the second speech engine, increases the number of identification, improved
The precision of identification.
As shown in figure 4, as another embodiment of the invention, the described pair of at least two identification knot having differences
Fruit progress is same to be exported again and again, comprising:
41, difference section is distinguished, semanteme identical before and after difference section is taken out as keyword.
42, it searches for generally, the difference section is searched for generally and replaced;By search key, to be counted
It measures subject to most search results, fuzzy replacement is carried out to the difference section, obtain the recognition result for being best suitable for semantic habit
As determining recognition result.
43, recognition result is determined, selection is searched for the highest recognition result of matching degree generally and exported.Through digital analog converter
After conversion, the exportable display device in terminal of determining recognition result, it is of course also possible to direct without being converted to analog signal
Output forms an instruction to terminal.
By searching for generally to difference section, difference section is searched the highest content of matching degree and is replaced, and searches for
Content semantically meeting habit, the precision of speech recognition equally can be improved.
As a kind of possible embodiment, in order to being searched for generally faster as a result, can match for the intelligent sound box
A retrieval semantic base is set, the semantic field for speech recognition is stored in the retrieval semantic base.
As shown in figure 5, also providing a kind of speech recognition system based on multi-source identification, the system is applied to an end
End, the terminal can be smart phone, tablet computer, laptop and desktop computer etc., and the present invention is not to the terminal
Product type do specific restriction.The terminal is equipped with social or human-computer interaction class application, and the social or man-machine friendship
Mutual class application can call terminal built-in microphone and display device.
The system comprises:
Input module 51, for obtaining user speech;The input module 51 is the microphone for being built in the terminal.
At least two speech recognition modules 52 obtain at least two identification knots for identifying to the user speech
Fruit;The speech recognition module 53 is the voice recognition chip being arranged in using cloud, and the terminal is provided with analog-digital converter,
User speech is converted into audio signal.
The audio signal can also be converted into signal of communication, and the signal of communication is uploaded to cloud and is identified.
Certainly, the voice recognition chip in terminal also can be set in the speech recognition module 53.
Contrast module 54 identifies that obtain described at least two know for comparing at least two speech recognition module 52
Other result;The contrast module 54 is one for handling the processing chip of data.
Same module 55, it is same for being carried out at least two recognition result having differences;The same module
55 pairs of recognition results carry out same in digital level.
Output module 56, for being exported to same at least two recognition result.The output module setting
56 at the terminal, and digital and analogue signals converter is additionally provided in terminal, and the digital signal that speech recognition platforms are identified is converted
It is exported for analog signal, recognition result is made more to meet the reading habit of people.
In embodiments of the present invention, at least two speech recognition module 52 is the different language of at least two recognition strategies
Sound identification module 53, the speech recognition module 53 include:
Submodule is stored, for storing the vocal print of collected user;Cloud is being applied in the storage submodule setting, right
The vocal print of user carries out cloud storage.
Submodule is denoised, for denoising to the user speech of acquisition.The denoising submodule, which can be set, to be applied
Cloud carries out digital denoising to user speech.
In embodiments of the present invention, the same module 55 includes:
Cloud computing submodule, it is semantic for analyzing difference section context, call the convolutional Neural training pattern of cloud computing
Calculate the semanteme of at least two recognition result;Specifically, being carried out in the convolutional Neural training pattern for calling cloud computing
It before calculating, needs to be trained model, convolutional neural networks is enable to calculate the predicted value to recognition result semanteme rapidly.It should
Predicted value replaces the difference section in recognition result, the determining recognition result with part identical in recognition result composition.
Submodule is searched for as the embodiment of same module 55 a kind of in the present invention, for carrying out fuzzy search to difference section
Rope;Semanteme identical before and after difference section is taken out as keyword.By search key, to obtain the most search of quantity
As a result subject to, fuzzy replacement is carried out to the difference section, the recognition result for obtaining being best suitable for semantic habit is as determining knowledge
Other result.
As the embodiment of same module 55 another in the present invention, it is arranged on the speech recognition module 53 at least
One the second speech recognition submodule obtains multiple second recognition results for again identifying that the user speech.Not more
Under the premise of changing semantic base, the second speech recognition submodule is set on same speech recognition platforms, speech recognition can be improved
Precision reduces speech recognition module bring physical influence.By the multiple recognition result and the multiple second recognition result
It compares;In multiple recognition results, comparison obtains the least recognition result of difference section, the as same highest identification of rate
As a result, carrying out semantic analysis to the highest recognition result of same rate again, determining recognition result is obtained.
Certainly, three kinds of embodiments of the same module be can be simultaneous.
A kind of intelligent sound box is also provided, the server includes processor and memory, be stored in the memory to
Few an instruction, at least one section of program, code set or instruction set, it is at least one instruction, at least one section of program, described
Code set or described instruction collection are loaded by the processor and are executed to realize the speech recognition as the aforementioned based on multi-source identification
Method.
A kind of computer readable storage medium is also provided, at least one instruction, at least one are stored in the storage medium
Duan Chengxu, code set or instruction set, at least one instruction, at least one section of program, the code set or the described instruction
Collection is loaded by the processor and is executed to realize the audio recognition method as the aforementioned based on multi-source identification.
Above-described embodiment is merely to illustrate a specific embodiment of the invention.It should be pointed out that for the general of this field
For logical technical staff, without departing from the inventive concept of the premise, several deformations and variation can also be made, these deformations and
Variation all should belong to protection scope of the present invention.
Claims (10)
1. a kind of audio recognition method based on multi-source identification, is used for intelligent sound box, which is characterized in that the described method includes:
User speech is obtained by intelligent sound box;
The user speech that intelligent sound box will acquire identifies the user speech by least two speech recognition platforms, obtains
To at least two recognition results;
Intelligent sound box obtains at least two recognition result, compares what at least two speech recognition platforms identified
At least two recognition result;
Intelligent sound box exports identical at least two recognition result;
It is exported again after intelligent sound box is same at least two recognition result progress having differences.
2. the audio recognition method as described in claim 1 based on multi-source identification, which is characterized in that the intelligent sound box will obtain
The user speech taken identifies the user speech by least two speech recognition platforms, obtains at least two identification knots
Before fruit, the method also includes:
Intelligent sound box is provided with the different speech recognition platforms of at least two recognition strategies as at least two speech recognition
Platform;
The vocal print of user is acquired and stored by intelligent sound box;
The user speech that will acquire is denoised.
3. the audio recognition method as described in claim 1 based on multi-source identification, which is characterized in that the intelligent sound box is to depositing
Export again and again together at least two recognition result of difference, comprising:
Difference section is distinguished in intelligent sound box, and context semantic analysis is used to the difference section;
It calls the convolutional Neural training pattern of cloud computing calculate the semanteme of at least two recognition result, determines wherein one
It is a to be exported as recognition result.
4. the audio recognition method as described in claim 1 based on multi-source identification, which is characterized in that the intelligent sound box is to depositing
Export again and again together at least two recognition result of difference, comprising:
Select at least one corresponding second speech recognition engine of at least two speech recognition platforms to the user speech
It is identified again, obtains multiple second recognition results;
The multiple recognition result and the multiple second recognition result are compared;
The highest recognition result of same rate is selected to be exported.
5. the audio recognition method as described in claim 1 based on multi-source identification, which is characterized in that the intelligent sound box is to depositing
Export again and again together at least two recognition result of difference, comprising:
Difference section is distinguished, the difference section is searched for generally;
Selection is searched for the highest recognition result of matching degree generally and is exported.
6. a kind of speech recognition system based on multi-source identification, which is characterized in that the system comprises:
Input module is arranged in intelligent sound box for obtaining user speech;
At least two speech recognition modules are arranged in intelligent sound box for identifying to the user speech, obtain at least
Two recognition results;
Contrast module, be arranged in intelligent sound box for compare at least two speech recognition module identify described in extremely
Few two recognition results;
Same module is arranged in intelligent sound box same for carrying out at least two recognition result having differences;
Output module is arranged in intelligent sound box for exporting to same at least two recognition result.
7. the speech recognition system as claimed in claim 6 based on multi-source identification, which is characterized in that at least two voice
Identification module is the different speech recognition module of at least two recognition strategies, and the speech recognition module includes:
Submodule is stored, for storing the vocal print of collected user;
Submodule is denoised, for denoising to the user speech of acquisition.
8. the speech recognition system based on multi-source identification as claimed in claims 6 or 7, which is characterized in that the same module
Include:
Cloud computing submodule, it is semantic for analyzing difference section context, call the convolutional Neural training pattern of cloud computing to carry out
Calculate the semanteme of at least two recognition result;
Submodule is searched for, for searching for generally to difference section;
At least one second speech recognition submodule on the speech recognition module is set, is used for the user speech again
Secondary identification obtains multiple second recognition results.
9. a kind of intelligent sound box, which is characterized in that the intelligent sound box includes processor and memory, be stored in memory to
Few an instruction, at least one section of program, code set or instruction set, it is at least one instruction, at least one section of program, described
Code set or described instruction collection are loaded as the processor and are executed to realize the base as described in any one of claims 1 to 5
In the audio recognition method of multi-source identification.
10. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium
Few one section of program, code set or instruction set, at least one instruction, at least one section of program, the code set or described
Instruction set as the processor loads and execute with realize as described in any one of claims 1 to 5 based on multi-source identification
Audio recognition method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810673599.3A CN109119073A (en) | 2018-06-25 | 2018-06-25 | Audio recognition method, system, speaker and storage medium based on multi-source identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810673599.3A CN109119073A (en) | 2018-06-25 | 2018-06-25 | Audio recognition method, system, speaker and storage medium based on multi-source identification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109119073A true CN109119073A (en) | 2019-01-01 |
Family
ID=64822455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810673599.3A Pending CN109119073A (en) | 2018-06-25 | 2018-06-25 | Audio recognition method, system, speaker and storage medium based on multi-source identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109119073A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767758A (en) * | 2019-01-11 | 2019-05-17 | 中山大学 | Vehicle-mounted voice analysis method, system, storage medium and equipment |
CN110634481A (en) * | 2019-08-06 | 2019-12-31 | 惠州市德赛西威汽车电子股份有限公司 | Voice integration method for outputting optimal recognition result |
CN110853635A (en) * | 2019-10-14 | 2020-02-28 | 广东美的白色家电技术创新中心有限公司 | Speech recognition method, audio annotation method, computer equipment and storage device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1811915A (en) * | 2005-01-28 | 2006-08-02 | 中国科学院计算技术研究所 | Estimating and detecting method and system for telephone continuous speech recognition system performance |
CN101807399A (en) * | 2010-02-02 | 2010-08-18 | 华为终端有限公司 | Voice recognition method and device |
CN105810188A (en) * | 2014-12-30 | 2016-07-27 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN107045496A (en) * | 2017-04-19 | 2017-08-15 | 畅捷通信息技术股份有限公司 | The error correction method and error correction device of text after speech recognition |
-
2018
- 2018-06-25 CN CN201810673599.3A patent/CN109119073A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1811915A (en) * | 2005-01-28 | 2006-08-02 | 中国科学院计算技术研究所 | Estimating and detecting method and system for telephone continuous speech recognition system performance |
CN101807399A (en) * | 2010-02-02 | 2010-08-18 | 华为终端有限公司 | Voice recognition method and device |
CN105810188A (en) * | 2014-12-30 | 2016-07-27 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN107045496A (en) * | 2017-04-19 | 2017-08-15 | 畅捷通信息技术股份有限公司 | The error correction method and error correction device of text after speech recognition |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767758A (en) * | 2019-01-11 | 2019-05-17 | 中山大学 | Vehicle-mounted voice analysis method, system, storage medium and equipment |
CN109767758B (en) * | 2019-01-11 | 2021-06-08 | 中山大学 | Vehicle-mounted voice analysis method, system, storage medium and device |
CN110634481A (en) * | 2019-08-06 | 2019-12-31 | 惠州市德赛西威汽车电子股份有限公司 | Voice integration method for outputting optimal recognition result |
CN110634481B (en) * | 2019-08-06 | 2021-11-16 | 惠州市德赛西威汽车电子股份有限公司 | Voice integration method for outputting optimal recognition result |
CN110853635A (en) * | 2019-10-14 | 2020-02-28 | 广东美的白色家电技术创新中心有限公司 | Speech recognition method, audio annotation method, computer equipment and storage device |
CN110853635B (en) * | 2019-10-14 | 2022-04-01 | 广东美的白色家电技术创新中心有限公司 | Speech recognition method, audio annotation method, computer equipment and storage device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10347244B2 (en) | Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response | |
US10726830B1 (en) | Deep multi-channel acoustic modeling | |
US11043205B1 (en) | Scoring of natural language processing hypotheses | |
CN110265040B (en) | Voiceprint model training method and device, storage medium and electronic equipment | |
Ferrer et al. | Study of senone-based deep neural network approaches for spoken language recognition | |
WO2020228173A1 (en) | Illegal speech detection method, apparatus and device and computer-readable storage medium | |
US7966171B2 (en) | System and method for increasing accuracy of searches based on communities of interest | |
CN111916111B (en) | Intelligent voice outbound method and device with emotion, server and storage medium | |
CN110675859B (en) | Multi-emotion recognition method, system, medium, and apparatus combining speech and text | |
US11081104B1 (en) | Contextual natural language processing | |
CN110570853A (en) | Intention recognition method and device based on voice data | |
CN111694940A (en) | User report generation method and terminal equipment | |
CN106875936A (en) | Audio recognition method and device | |
CN111433847A (en) | Speech conversion method and training method, intelligent device and storage medium | |
CN110136721A (en) | A kind of scoring generation method, device, storage medium and electronic equipment | |
CN110164416B (en) | Voice recognition method and device, equipment and storage medium thereof | |
CN109119073A (en) | Audio recognition method, system, speaker and storage medium based on multi-source identification | |
CN114818649A (en) | Service consultation processing method and device based on intelligent voice interaction technology | |
CN110852075B (en) | Voice transcription method and device capable of automatically adding punctuation marks and readable storage medium | |
Mirishkar et al. | CSTD-Telugu corpus: Crowd-sourced approach for large-scale speech data collection | |
CN110809796B (en) | Speech recognition system and method with decoupled wake phrases | |
CN115641850A (en) | Method and device for recognizing ending of conversation turns, storage medium and computer equipment | |
CN110781329A (en) | Image searching method and device, terminal equipment and storage medium | |
CN115985320A (en) | Intelligent device control method and device, electronic device and storage medium | |
CN112259077B (en) | Speech recognition method, device, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190101 |
|
WD01 | Invention patent application deemed withdrawn after publication |