CN109686365A - A kind of audio recognition method and speech recognition system - Google Patents
A kind of audio recognition method and speech recognition system Download PDFInfo
- Publication number
- CN109686365A CN109686365A CN201811599441.2A CN201811599441A CN109686365A CN 109686365 A CN109686365 A CN 109686365A CN 201811599441 A CN201811599441 A CN 201811599441A CN 109686365 A CN109686365 A CN 109686365A
- Authority
- CN
- China
- Prior art keywords
- information
- voice
- error correction
- recognition result
- coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000012937 correction Methods 0.000 claims abstract description 104
- 230000007613 environmental effect Effects 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims description 6
- 235000013399 edible fruits Nutrition 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- NGVDGCNFYWLIFO-UHFFFAOYSA-N pyridoxal 5'-phosphate Chemical compound CC1=NC=C(COP(O)(O)=O)C(C=O)=C1O NGVDGCNFYWLIFO-UHFFFAOYSA-N 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention provides a kind of audio recognition method and system, this method comprises the following steps: S1, the voice messaging with error message for obtaining user's input and the voice error correction information to voice messaging progress error correction, and stored voice message and voice error correction information respectively;S2, it carries out preliminary treatment to voice messaging and voice error correction information, and treated voice messaging and voice error correction information is encoded;S3, respectively according to encoding speech information and voice error correction information coding is counter releases corresponding text information, and the anti-text information released of contrast phone information coding and the anti-text information released of voice error correction information coding obtain the first recognition result;S4, the environmental information that user inputs the voice messaging is obtained, the second recognition result is obtained according to environmental information;S5, the second recognition result and dictinary information are compared, obtains final recognition result.The present invention can quickly identify voice messaging, improve working efficiency.
Description
Technical field
The present invention relates to technical field of voice recognition more particularly to a kind of audio recognition method and speech recognition systems.
Background technique
It is to have accurate definition that speech recognition system, which selects the requirement of Recognition unit, can obtain enough data and be trained,
With generality.English generallys use context-sensitive phoneme modeling, and the coarticulation of Chinese is serious not as good as English, can adopt
It is modeled with syllable.Training data size needed for system is related with model complexity.Modelling obtains excessively complicated so that super
The ability of provided training data is gone out, can performance sharply declined.
In the prior art, it by microphone input voice messaging, re-enters, is unfavorable for if input error can only delete
Voice messaging quickly identifies, reduces working efficiency.
Summary of the invention
Technical problem to be solved by the present invention lies in provide one kind when input voice information has mistake, do not deleting
In the case where except the voice messaging inputted, the audio recognition method of voice messaging can be quickly identified.
In order to solve the above technical problem, the present invention provides a kind of audio recognition methods, and this method comprises the following steps:
S1, the voice messaging with error message for obtaining user's input and the voice error correction to voice messaging progress error correction
Information, and the voice messaging and voice error correction information are stored respectively;
S2, it carries out preliminary treatment to the voice messaging and voice error correction information, and treated voice messaging and voice is entangled
Wrong information extraction characteristic information is simultaneously encoded, and encoding speech information and voice error correction information coding are obtained;
S3, corresponding text information, and contrast phone counter released according to encoding speech information and voice error correction information coding respectively
The anti-text information released of information coding and voice error correction information encode the anti-text information released, and obtain the first recognition result;
S4, the environmental information that user inputs the voice messaging is obtained, the second recognition result is obtained according to the environmental information;
S5, second recognition result and dictinary information are compared, obtains final recognition result, and will be described final
Recognition result is presented to the user.
Wherein, preliminary treatment is carried out to the voice messaging and voice error correction information in the step S2 to specifically include:
The voice messaging and voice error correction information are filtered respectively, and respectively to the voice messaging after filtering processing
It is sampled with voice error correction information;
The voice messaging after sampling and the voice error correction information after sampling are encoded respectively, obtain encoding speech information and language
Sound error correction information coding.
Wherein, it is specifically included in the step S3:
The encoding speech information and existing acoustic model and speech model are compared, the encoding speech information is obtained
To the similar coding of the acoustic model and speech model, and counter to release the encoding speech information corresponding according to similar coding
First text information;
The voice error correction information is encoded and is compared with existing acoustic model and speech model, the voice error correction is obtained
The similar coding of information coding and the acoustic model and speech model, and the voice error correction letter is released according to similar coding is counter
Breath encodes corresponding second text information;
First text information and second text information are compared, highest first text information of similarity is obtained
With the second text information, and with second text information replace the first text information in it is similar with second text information
Part forms the first recognition result.
Wherein, the acoustic model is Hidden Markov Model.
Wherein, the step S4 is specifically included:
Acquisition user inputs the image of the voice messaging local environment, and identifies the environmental information in image,
The possibility demand that the user is obtained according to the environmental information filters out the second identification knot according to the possible demand
Fruit.
Wherein, the step S5 is specifically included:
Second recognition result is compared with dictinary information, the second recognition result for not meeting language format is rejected, obtains
Obtain third recognition result;
The recognition result of third recognition result and user's storage is subjected to similarity comparison, and according to similarity from big to small suitable
Sequence is arranged, and user is showed.
The present invention provides a kind of speech recognition system, the system comprises:
Acquiring unit, for obtaining the voice messaging with error message of user's input and to voice messaging progress error correction
Voice error correction information, and store the voice messaging and voice error correction information respectively;
Processing unit, for carrying out preliminary treatment to the voice messaging and voice error correction information, and voice is believed to treated
Breath and voice error correction information are extracted characteristic information and are encoded, and encoding speech information and voice error correction information coding are obtained;
It is counter to push away recognition unit, for releasing corresponding text letter according to encoding speech information and voice error correction information coding are counter respectively
Breath, and the anti-text information released of contrast phone information coding and voice error correction information encode the anti-text information released, and obtain
First recognition result;
Context awareness unit inputs the environmental information of the voice messaging for obtaining user, is obtained according to the environmental information
Second recognition result;
Recognition unit is compared, the recognition result for storing second recognition result with user compares, and obtains final
Recognition result, and the final recognition result is presented to the user.
Wherein, the anti-recognition unit that pushes away includes:
First comparison is counter to push away unit, for carrying out pair the encoding speech information and existing acoustic model and speech model
Than obtaining the similar coding of the encoding speech information to the acoustic model and speech model, and push away according to similar coding is counter
Corresponding first text information of the encoding speech information out;
Second comparison is counter to push away unit, and voice error correction information coding and existing acoustic model and speech model are carried out pair
Than obtaining the voice error correction information coding and the similar coding of the acoustic model and speech model, and according to similar coding
It is counter to release corresponding second text information of the voice error correction information coding;
Replacement unit is compared, first text information and second text information are compared, obtains similarity highest
The first text information and the second text information, and replaced in the first text information with second text information with described second
The similar part of text information forms the first recognition result.
The beneficial effect of the embodiment of the present invention is: the present invention by voice messaging to acquisition and voice error correction information into
Row coding, and according to encoding speech information and voice error correction information coding respectively obtain it is counter push away text information, compare the two it is anti-
Text information is pushed away, counter is pushed away what the high voice error correction information of similarity encoded in corresponding text information replacement encoding speech information
Text information obtains environmental information locating for user's input voice information, and believe according to environment to obtain the first recognition result
Breath carries out screening to the first recognition result and obtains the second recognition result, by comparing the second recognition result with dictinary information
To obtain final recognition result.The audio recognition method of the embodiment of the present invention, when voice inputs and there is mistake, without deleting
Except re-entering, be conducive to voice messaging and quickly identify, improves working efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of audio recognition method of the embodiment of the present invention.
Fig. 2 is a kind of structural schematic diagram of speech recognition system of the embodiment of the present invention.
Specific embodiment
The explanation of following embodiment be with reference to attached drawing, can be to the specific embodiment implemented to the example present invention.
It is illustrated referring to Fig. 1, the embodiment of the present invention one provides a kind of audio recognition method, and this method includes as follows
Step:
S1, the voice messaging with error message for obtaining user's input and the voice error correction to voice messaging progress error correction
Information, and the voice messaging and voice error correction information are stored respectively.
Specifically, voice input is carried out by voice input option, there are error message in the voice messaging of the typing,
If there is lesser mistake in Input Process, voice error correction typing option is selected to carry out error correction typing, error correction typing only needs
The information at the position of typing mistake, error correction typing is voice error correction information, and voice messaging and voice error correction information are carried out respectively
Storage.
It illustrates, it is assumed that for it is expected that the voice messaging of typing is " finding nearest gas station ", user is in typing
In the process, for some reason, the voice messaging of typing is " finding close gas station ", and user selects voice error correction typing choosing
, the voice error correction information of typing is " nearest ".
S2, preliminary treatment is carried out to the voice messaging and voice error correction information, and to treated voice messaging and language
Sound error correction information is extracted characteristic information and is encoded, and encoding speech information and voice error correction information coding are obtained.
Specifically, voice messaging and voice error correction information are filtered, eliminate voice messaging and voice error correction letter
Noise and echo in breath increase the quality of voice messaging and voice error correction information, entangle to filtered voice messaging and voice
Wrong information is sampled, and converts analog signals into digital signal by A/D converter, respectively to the number after voice messaging conversion
Digital signal after word signal and the conversion of voice error correction information is encoded and is extracted characteristic information, obtain encoding speech information and
Voice error correction information coding.
Characteristic information is frequency cepstral coefficient MFCC feature, and frequency cepstral coefficient MFCC feature is based on the non-of sound frequency
The linear transformation of the logarithmic energy line spectrum of linear melscale (Mel scale), is converted to frequency domain for time-domain signal with FFT first,
Its logarithmic energy is composed later and carries out convolution with the triangular filter group being distributed according to Mel scale, finally to each filter
The vector that output is constituted carries out discrete cosine transform, takes top n coefficient, and PLP still goes to calculate LPC parameter with De Binfa, but
When calculating autocorrelation parameter is also the method for carrying out DCT to the logarithmic energy spectrum of auditory stimulus.Voice messaging is carried out just
Step processing, so as to promote the quality of voice messaging and error correction voice messaging, facilitates the quality for promoting subsequent identification.
S3, respectively according to encoding speech information and voice error correction information coding is counter releases corresponding text information, and compare
The anti-text information released of encoding speech information and voice error correction information encode the anti-text information released, and obtain the first identification knot
Fruit.
Specifically, the encoding speech information and existing acoustic model and speech model are compared, described in acquisition
The similar coding of encoding speech information and the acoustic model and speech model, and the voice letter is released according to similar coding is counter
Breath encodes corresponding first text information;Voice error correction information coding is carried out with existing acoustic model and speech model
Comparison obtains the voice error correction information coding and the similar coding of the acoustic model and speech model, and according to similar volume
Code is counter to release corresponding second text information of the voice error correction information coding;By first text information and second text
Word information compares, and obtains highest first text information of similarity and the second text information, and believed with second text
Part similar with second text information in breath the first text information of replacement, forms the first recognition result.
Acoustic model is one of part mostly important in speech recognition system, and current dominant systems mostly use hidden Ma Er
Section's husband's model models, and the concept of hidden Markov model is a discrete time-domain finite-state automata, hidden Markov
Model HMM refers to that the internal state external world of this Markov model is invisible, and the external world can only see the output valve at each moment,
Language model is a simple, unified, abstract formal system, and language objective fact passes through the description of language model, than
Electronic computer is more suited to be automatically processed, thus language model has great meaning for the information processing of natural language
Justice, by comparing and analyzing, to be combined into the option met, each note has opposite coding, and stored good
Acoustic model and language model compare, and then select all similar codings, complete preliminary identification, help to be promoted just
The efficiency and quality of identification are walked, the output valve of acoustic model is usually exactly from the calculated acoustic feature of each frame, these are special
Sign is exactly the coding of acoustics, and language model be according to language objective fact and the language abstract mathematics that carry out model, and these
Feature is exactly the coding of language, so that convenient and collected voice coding carries out cross validation, thus in the result obtained
It compares, selects phase knowledge and magnanimity highest, and then text information can be released according to coding is anti-.
For example, obtaining similarity by acoustic model and speech model by taking " finding nearest gas station " as an example
Highest coding releases multiple groups text information so as to counter, then passes through the comparison between voice error correction information and voice messaging, into
And the highest voice messaging of similarity and voice error correction information can be selected, such as in " finding close gas station " " close " and entangle
One of " nearest " in wrong voice messaging is similar to best, so as to be replaced, alternately, it is also possible to identify certainly
At " instruction, which is covered into moral frame, swims exhibition ", and error correction voice messaging is identified as " mouth into moral ", which is replaced, then becomes " instruction cover mouth
Exhibition is swum into moral frame " as one of alternative, it is also possible to " instruction covers tight-lipped gas station " occur certainly this alternative, it can be seen that,
First recognition result is " finding close gas station " either " instruction covers tight-lipped gas station " or " instruction cover mouth swims exhibition into moral frame ".
S4, acquisition user input environmental information locating when the voice messaging, obtain second according to the environmental information
Recognition result.
Specifically, the environment photo that periphery situation when user inputs the voice messaging is shot by picture pick-up device, takes the photograph
Picture equipment uses high definition image pick-up device, to identify for environment locating at that time, passes through environment locating for identification user
Position, and then can substantially judge the demand of user, for example, environment position locating for user may be urban district, highway,
Suburb may be office building, cell or the hotel in urban district with the biggish noun of the urban district degree of association, be associated with highway biggish
Noun may be gas station, parking lot, garage etc. on highway, and being associated with suburb biggish may be suburb village name
Deng.By the location information where identification user, it is hereby achieved that it is associated with biggish noun with corresponding location information, according to
Incongruent recognition result obvious in the first recognition result can be rejected by being associated with biggish noun.
For example, still by taking " finding nearest gas station " as an example, by obtaining the photo of user's input voice information,
Known to user be on highway at that time, according to be associated with maximum noun with highway may be gas station on highway, parking lot, repair
Depot etc., thus " instruction cover mouth swims exhibition into moral frame " in the first recognition result can be rejected, to obtain the second recognition result
" finding close gas station " or " instruction covers tight-lipped gas station ".
S5, the dictinary information of second recognition result and storage is compared, obtains final recognition result, and will
The final recognition result is presented to the user.
By comparing the dictinary information of the second recognition result and storage, the identification of language rule will not obviously be met
Result information is deleted, so that final recognition result is obtained, by the passing identification information of final recognition result and user's storage
It compares, obtains the similarity of each final recognition result, show institute to user according to the sequence of similarity from big to small
Final recognition result is stated, the final recognition result is inquired convenient for user, to select the expected identification knot of user
Fruit improves the efficiency and quality of identification.
It after user has selected final recognition result, is played out by loudspeaker, correct recognition result is carried out
Storage facilitates and reminds other staff, to determine recognition result again, recognition result is stored, consequently facilitating being expanded
It fills, is used convenient for users to next time.
A kind of audio recognition method of the embodiment of the present invention is carried out by voice messaging to acquisition and voice error correction information
Coding, and according to encoding speech information and voice error correction information coding obtain respectively it is counter push away text information, compare counter pushing away for the two
Text information counter pushes away text for what the high voice error correction information of similarity encoded in corresponding text information replacement encoding speech information
Word information obtains environmental information locating for user's input voice information, and according to environmental information to obtain the first recognition result
Screening is carried out to the first recognition result and obtains the second recognition result, by by the second recognition result and dictinary information compare from
And obtain final recognition result.The audio recognition method of the embodiment of the present invention, when voice inputs and there is mistake, without deleting
It re-enters, is conducive to voice messaging and quickly identifies, improve working efficiency.
Based on the embodiment of the present invention one, second embodiment of the present invention provides a kind of speech recognition systems, as shown in Fig. 2, this is
System 1 includes:
Acquiring unit 11, for obtaining the voice messaging of user's input and carrying out the voice error correction of error correction to the voice messaging of input
Information, and the voice messaging and voice error correction information are stored respectively;
Processing unit 12, for carrying out preliminary treatment to the voice messaging and voice error correction information, and to treated voice
Information and voice error correction information are extracted characteristic information and are encoded, and encoding speech information and voice error correction information coding are obtained;
It is counter to push away recognition unit 13, for counter releasing corresponding text according to encoding speech information and voice error correction information coding respectively
Information, and the anti-text information released of contrast phone information coding and voice error correction information encode the anti-text information released, and obtain
Obtain the first recognition result;
Context awareness unit 14 inputs the environmental information of the voice messaging for obtaining user, is picked according to the environmental information
Except in the preliminary recognition result with the incoherent recognition result of the environmental information, obtain the second recognition result;
Recognition unit 15 is compared, the recognition result for storing second recognition result with user compares, and obtains most
Whole recognition result, and the final recognition result is presented to the user.
Wherein, the anti-recognition unit 13 that pushes away includes:
First comparison is counter to push away unit, for carrying out pair the encoding speech information and existing acoustic model and speech model
Than obtaining the similar coding of the encoding speech information to the acoustic model and speech model, and push away according to similar coding is counter
Corresponding first text information of the encoding speech information out;
Second comparison is counter to push away unit, and voice error correction information coding and existing acoustic model and speech model are carried out pair
Than obtaining the voice error correction information coding and the similar coding of the acoustic model and speech model, and according to similar coding
It is counter to release corresponding second text information of the voice error correction information coding;
Replacement unit is compared, first text information and second text information are compared, obtains similarity highest
The first text information and the second text information, and replaced in the first text information with second text information with described second
The similar part of text information forms the first recognition result.
The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly
It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.
Claims (8)
1. a kind of audio recognition method, which comprises the steps of:
S1, the voice messaging with error message for obtaining user's input and the voice error correction to voice messaging progress error correction
Information, and the voice messaging and voice error correction information are stored respectively;
S2, it carries out preliminary treatment to the voice messaging and voice error correction information, and treated voice messaging and voice is entangled
Wrong information extraction characteristic information is simultaneously encoded, and encoding speech information and voice error correction information coding are obtained;
S3, corresponding text information, and contrast phone counter released according to encoding speech information and voice error correction information coding respectively
The anti-text information released of information coding and voice error correction information encode the anti-text information released, and obtain the first recognition result;
S4, acquisition user input environmental information locating when the voice messaging, obtain the second identification according to the environmental information
As a result;
S5, second recognition result and dictinary information are compared, obtains final recognition result, and will be described final
Recognition result is presented to the user.
2. the method according to claim 1, wherein to the voice messaging and voice error correction in the step S2
Information carries out preliminary treatment and specifically includes:
The voice messaging and voice error correction information are filtered respectively, and respectively to the voice messaging after filtering processing
It is sampled with voice error correction information;
The voice messaging after sampling and the voice error correction information after sampling are encoded respectively, obtain encoding speech information and language
Sound error correction information coding.
3. according to the method described in claim 2, it is characterized in that, being specifically included in the step S3:
The encoding speech information and existing acoustic model and speech model are compared, the encoding speech information is obtained
To the similar coding of the acoustic model and speech model, and counter to release the encoding speech information corresponding according to similar coding
First text information;
The voice error correction information is encoded and is compared with existing acoustic model and speech model, the voice error correction is obtained
The similar coding of information coding and the acoustic model and speech model, and the voice error correction letter is released according to similar coding is counter
Breath encodes corresponding second text information;
First text information and second text information are compared, highest first text information of similarity is obtained
With the second text information, and with second text information replace the first text information in it is similar with second text information
Part forms the first recognition result.
4. according to the method described in claim 3, it is characterized by:
The acoustic model is Hidden Markov Model.
5. according to the method described in claim 4, it is characterized in that, the step S4 is specifically included:
Acquisition user inputs the image of the voice messaging local environment, and identifies the environmental information in image,
The possibility demand that the user is obtained according to the environmental information filters out the second identification knot according to the possible demand
Fruit.
6. according to the method described in claim 5, it is characterized in that, the step S5 is specifically included:
Second recognition result is compared with dictinary information, the second recognition result for not meeting language format is rejected, obtains
Obtain third recognition result;
The recognition result of third recognition result and user's storage is subjected to similarity comparison, and according to similarity from big to small suitable
Sequence is arranged, and user is showed.
7. a kind of speech recognition system, which is characterized in that the system comprises:
Acquiring unit, for obtaining the voice messaging with error message of user's input and to voice messaging progress error correction
Voice error correction information, and store the voice messaging and voice error correction information respectively;
Processing unit, for carrying out preliminary treatment to the voice messaging and voice error correction information, and voice is believed to treated
Breath and voice error correction information are extracted characteristic information and are encoded, and encoding speech information and voice error correction information coding are obtained;
It is counter to push away recognition unit, for releasing corresponding text letter according to encoding speech information and voice error correction information coding are counter respectively
Breath, and the anti-text information released of contrast phone information coding and voice error correction information encode the anti-text information released, and obtain
First recognition result;
Context awareness unit inputs the environmental information of the voice messaging for obtaining user, is obtained according to the environmental information
Second recognition result;
Recognition unit is compared, the recognition result for storing second recognition result with user compares, and obtains final
Recognition result, and the final recognition result is presented to the user.
8. system according to claim 7, which is characterized in that the anti-recognition unit that pushes away includes:
First comparison is counter to push away unit, for carrying out pair the encoding speech information and existing acoustic model and speech model
Than obtaining the similar coding of the encoding speech information to the acoustic model and speech model, and push away according to similar coding is counter
Corresponding first text information of the encoding speech information out;
Second comparison is counter to push away unit, and voice error correction information coding and existing acoustic model and speech model are carried out pair
Than obtaining the voice error correction information coding and the similar coding of the acoustic model and speech model, and according to similar coding
It is counter to release corresponding second text information of the voice error correction information coding;
Replacement unit is compared, first text information and second text information are compared, obtains similarity highest
The first text information and the second text information, and replaced in the first text information with second text information with described second
The similar part of text information forms the first recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811599441.2A CN109686365B (en) | 2018-12-26 | 2018-12-26 | Voice recognition method and voice recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811599441.2A CN109686365B (en) | 2018-12-26 | 2018-12-26 | Voice recognition method and voice recognition system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109686365A true CN109686365A (en) | 2019-04-26 |
CN109686365B CN109686365B (en) | 2021-07-13 |
Family
ID=66188586
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811599441.2A Active CN109686365B (en) | 2018-12-26 | 2018-12-26 | Voice recognition method and voice recognition system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109686365B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334271A (en) * | 2019-05-21 | 2019-10-15 | 北京奇艺世纪科技有限公司 | A kind of search result optimization method, system, electronic equipment and storage medium |
CN111356022A (en) * | 2020-04-18 | 2020-06-30 | 徐琼琼 | Video file processing method based on voice recognition |
CN111524511A (en) * | 2020-04-01 | 2020-08-11 | 黑龙江省农业科学院农业遥感与信息研究所 | Agricultural technology consultation man-machine conversation method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104951077A (en) * | 2015-06-24 | 2015-09-30 | 百度在线网络技术(北京)有限公司 | Man-machine interaction method and device based on artificial intelligence and terminal equipment |
CN105206260A (en) * | 2015-08-31 | 2015-12-30 | 努比亚技术有限公司 | Terminal voice broadcasting method, device and terminal voice operation method |
CN105374356A (en) * | 2014-08-29 | 2016-03-02 | 株式会社理光 | Speech recognition method, speech assessment method, speech recognition system, and speech assessment system |
CN107818781A (en) * | 2017-09-11 | 2018-03-20 | 远光软件股份有限公司 | Intelligent interactive method, equipment and storage medium |
CN107993653A (en) * | 2017-11-30 | 2018-05-04 | 南京云游智能科技有限公司 | The incorrect pronunciations of speech recognition apparatus correct update method and more new system automatically |
CN108595412A (en) * | 2018-03-19 | 2018-09-28 | 百度在线网络技术(北京)有限公司 | Correction processing method and device, computer equipment and readable medium |
-
2018
- 2018-12-26 CN CN201811599441.2A patent/CN109686365B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105374356A (en) * | 2014-08-29 | 2016-03-02 | 株式会社理光 | Speech recognition method, speech assessment method, speech recognition system, and speech assessment system |
CN104951077A (en) * | 2015-06-24 | 2015-09-30 | 百度在线网络技术(北京)有限公司 | Man-machine interaction method and device based on artificial intelligence and terminal equipment |
CN105206260A (en) * | 2015-08-31 | 2015-12-30 | 努比亚技术有限公司 | Terminal voice broadcasting method, device and terminal voice operation method |
CN107818781A (en) * | 2017-09-11 | 2018-03-20 | 远光软件股份有限公司 | Intelligent interactive method, equipment and storage medium |
CN107993653A (en) * | 2017-11-30 | 2018-05-04 | 南京云游智能科技有限公司 | The incorrect pronunciations of speech recognition apparatus correct update method and more new system automatically |
CN108595412A (en) * | 2018-03-19 | 2018-09-28 | 百度在线网络技术(北京)有限公司 | Correction processing method and device, computer equipment and readable medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334271A (en) * | 2019-05-21 | 2019-10-15 | 北京奇艺世纪科技有限公司 | A kind of search result optimization method, system, electronic equipment and storage medium |
CN110334271B (en) * | 2019-05-21 | 2022-01-11 | 北京奇艺世纪科技有限公司 | Search result optimization method and system, electronic device and storage medium |
CN111524511A (en) * | 2020-04-01 | 2020-08-11 | 黑龙江省农业科学院农业遥感与信息研究所 | Agricultural technology consultation man-machine conversation method and system |
CN111356022A (en) * | 2020-04-18 | 2020-06-30 | 徐琼琼 | Video file processing method based on voice recognition |
Also Published As
Publication number | Publication date |
---|---|
CN109686365B (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097894B (en) | End-to-end speech emotion recognition method and system | |
WO2021208287A1 (en) | Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium | |
US11514891B2 (en) | Named entity recognition method, named entity recognition equipment and medium | |
CN107945805B (en) | A kind of across language voice identification method for transformation of intelligence | |
CN106057206B (en) | Sound-groove model training method, method for recognizing sound-groove and device | |
CN107369439B (en) | Voice awakening method and device | |
US8949125B1 (en) | Annotating maps with user-contributed pronunciations | |
CN110827801A (en) | Automatic voice recognition method and system based on artificial intelligence | |
US20140365221A1 (en) | Method and apparatus for speech recognition | |
CN113488058B (en) | Voiceprint recognition method based on short voice | |
WO2009075990A1 (en) | Grapheme-to-phoneme conversion using acoustic data | |
CN109686365A (en) | A kind of audio recognition method and speech recognition system | |
Ismail et al. | Mfcc-vq approach for qalqalahtajweed rule checking | |
CN111862952B (en) | Dereverberation model training method and device | |
WO2023030235A1 (en) | Target audio output method and system, readable storage medium, and electronic apparatus | |
CN111667834A (en) | Hearing-aid device and hearing-aid method | |
CN108364655A (en) | Method of speech processing, medium, device and computing device | |
CN113782026A (en) | Information processing method, device, medium and equipment | |
KR20170086233A (en) | Method for incremental training of acoustic and language model using life speech and image logs | |
CN113516987B (en) | Speaker recognition method, speaker recognition device, storage medium and equipment | |
CN109346104A (en) | A kind of audio frequency characteristics dimension reduction method based on spectral clustering | |
CN114724589A (en) | Voice quality inspection method and device, electronic equipment and storage medium | |
CN113409774A (en) | Voice recognition method and device and electronic equipment | |
CN115700880A (en) | Behavior monitoring method and device, electronic equipment and storage medium | |
CN112951237A (en) | Automatic voice recognition method and system based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |