CN110503958A - Audio recognition method, system, mobile terminal and storage medium - Google Patents
Audio recognition method, system, mobile terminal and storage medium Download PDFInfo
- Publication number
- CN110503958A CN110503958A CN201910812864.6A CN201910812864A CN110503958A CN 110503958 A CN110503958 A CN 110503958A CN 201910812864 A CN201910812864 A CN 201910812864A CN 110503958 A CN110503958 A CN 110503958A
- Authority
- CN
- China
- Prior art keywords
- identification
- sentence
- confidence
- text
- target text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012163 sequencing technique Methods 0.000 claims abstract description 37
- 238000004364 calculation method Methods 0.000 claims abstract description 26
- 238000004590 computer program Methods 0.000 claims description 7
- 238000013461 design Methods 0.000 abstract description 13
- 238000012549 training Methods 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/027—Syllables being the recognition units
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
The present invention is suitable for technical field of voice recognition, provides a kind of audio recognition method, system, mobile terminal and storage medium, this method comprises: voice to be identified is inputted finite state converter, to obtain a plurality of identification sentence;Confidence calculations are carried out to every identification sentence respectively, to obtain multiple sentence confidence levels and be ranked up, to obtain identification sequencing table;Target text feature is obtained, and is sequentially matched with the identification sentence in identification sequencing table according to target text feature, when successful match, current identification sentence is exported.The present invention passes through the design based on confidence calculations, the sequence of reliability is carried out to identification sentence in a manner of reaching and use the sequence of sentence confidence level, and by carrying out matched design based on the other sentence of institute in target text feature and identification sequencing table, influence of the data to recognition result of the accent due to people, the noise of environment, training pattern is reduced, and then improves the accuracy and availability of speech recognition.
Description
Technical field
The invention belongs to technical field of voice recognition more particularly to a kind of audio recognition method, system, mobile terminal and deposit
Storage media.
Background technique
With the development of speech recognition technology and the continuous improvement of discrimination, speech recognition technology is answered more and more
For in each scene of daily life.Wherein, in vocal print field there are two identification, one is content of text Speaker Identification,
The other is dynamic digital Speaker Identification, both have all used speech identifying function, and only voice content is correctly
It will do it people's identification of further speaking, therefore, the accuracy of speech recognition is particularly important.
In existing audio recognition method, since the accent of people, the noise of environment, data of training pattern etc. influence, make
The accuracy rate for obtaining speech recognition is low, reduces the availability of Application on Voiceprint Recognition.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of audio recognition method, system, mobile terminal and storage medium, purport
It is low carrying out identification accuracy caused by speech recognition using sound wave model in solving existing audio recognition method
The problem of.
The embodiments of the present invention are implemented as follows, a kind of audio recognition method, which comprises
Voice to be identified is obtained, and the voice to be identified is inputted into finite state converter, to obtain a plurality of identification language
Sentence;
Confidence calculations are carried out to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and according to described
Sentence confidence level is ranked up the identification sentence, to obtain identification sequencing table;
Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the knowledge
Other sentence is matched, and when successful match, the current identification sentence is exported.
Further, described the step of carrying out confidence calculations to identification sentence described in every respectively, includes:
Sequentially the syllable of the voice to be identified is obtained, to obtain multiple target syllables, each target sound
Section is corresponding with the identification text in the correspondence identification sentence;
The target syllable is matched with the text database being locally pre-stored, it is described to obtain target text table
The corresponding relationship being stored in target text table between multiple groups different literals and confidence value;
Sequentially by every identification sentence, the presently described corresponding identification text of target syllable and the target
Text table is matched, to obtain objective degrees of confidence;
Sequentially the corresponding objective degrees of confidence of the identification texts all in every identification sentence is summed
It calculates, to obtain multiple sentence confidence levels.
Further, described sequentially by the corresponding target of the identification texts all in every identification sentence
After confidence level carries out the step of read group total, the method also includes:
Judge between the adjacent identification text with the presence or absence of incidence relation;
If so, obtaining associated confidence, and the associated confidence sums the correspondence objective degrees of confidence.
Further, include: with the presence or absence of the step of incidence relation between the adjacent identification text of the judgement
A group word is carried out to the adjacent identification text respectively, to obtain multiple identification phrases;
Locally pre-stored association phrase table is obtained, and the identification phrase is matched with the association phrase table;
When the identification phrase and the association phrase table successful match, the corresponding adjacent institute of the identification phrase is determined
It states and identifies that there are incidence relations between text.
Further, before described the step of carrying out confidence calculations to identification sentence described in every respectively, the side
Method further include:
Sequentially judge in every identification sentence with the presence or absence of default specified word;
If it is not, then the identification sentence is deleted.
Further, described the step of being ranked up according to the sentence confidence level to the identification sentence, includes:
Size sequence is carried out to the identification sentence according to the sentence confidence level, and when sequence serial number is equal to default serial number
When, stop sequence.
Further, it is described according to the target text feature sequentially with it is described identification sequencing table in the identification language
Sentence the step of being matched includes:
When the target text feature be target text length when, according to the target text length sequentially with the identification
The identification sentence in sequencing table carries out length matching;
When the target text feature be target text content when, according to the target text content sequentially with the identification
The identification sentence in sequencing table carries out text matches.
The another object of the embodiment of the present invention is to provide a kind of speech recognition system, the system comprises:
Voice obtains module, inputs finite state converter for obtaining voice to be identified, and by the voice to be identified,
To obtain a plurality of identification sentence;
Confidence calculations module, for carrying out confidence calculations to identification sentence described in every respectively, to obtain multiple languages
Sentence confidence level, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table;
Sentence output module, for obtaining target text feature, and according to the target text feature sequentially with the knowledge
The identification sentence in other sequencing table is matched, and when successful match, the current identification sentence is exported.
The another object of the embodiment of the present invention is to provide a kind of mobile terminal, including storage equipment and processor, institute
It states storage equipment and runs the computer program so that the mobile terminal execution for storing computer program, the processor
Above-mentioned audio recognition method.
The another object of the embodiment of the present invention is to provide a kind of storage medium, is stored with institute in above-mentioned mobile terminal
The step of computer program used, which realizes above-mentioned audio recognition method when being executed by processor.
The embodiment of the present invention, through the design based on confidence calculations, in a manner of reaching and use the sequence of sentence confidence level
Carry out the sequence of reliability to identification sentence, and by based on described in the target text feature and the identification sequencing table
Identify that sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to recognition result
Influence, and then improve the accuracy and availability of speech recognition.
Detailed description of the invention
Fig. 1 is the flow chart for the audio recognition method that first embodiment of the invention provides;
Fig. 2 is the flow chart for the audio recognition method that second embodiment of the invention provides;
Fig. 3 is the flow chart for the audio recognition method that third embodiment of the invention provides;
Fig. 4 is the structural schematic diagram for the speech recognition system that fourth embodiment of the invention provides;
Fig. 5 is the structural schematic diagram for the mobile terminal that fifth embodiment of the invention provides.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
Embodiment one
Referring to Fig. 1, be the flow chart for the audio recognition method that first embodiment of the invention provides, comprising steps of
Step S10 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain
Item identifies sentence;
Wherein, voice to be identified can be obtained by the way of sound pick-up, i.e., ought receive adopting for the output of user
When collection instruction, sound pick-up is opened to trigger the acquisition for carrying out voice to be identified by control, it is preferred that in the step, limited shape
The model parameter of state converter can be selected independently according to the demand of user;
Specifically, the text between identification sentence can be identical or not identical, but sentence composed by text is respectively not
It is identical, i.e., repeat statement is not present between all identification sentences;
Step S20 carries out confidence calculations to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and
The identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table;
Wherein, which can be calculated based on the mode of preset model or preset algorithm, the sentence confidence
Degree shows as the accuracy of corresponding identification sentence, i.e., when while statement confidence level is higher, then determines the accurate of corresponding identification sentence
It spends higher;
Specifically, the sequence quantity in identification sequencing table can be configured according to user demand, in the step, retain 100
Item identifies sentence, i.e., stops sequence when identifying that sequence serial number reaches 100 in sequencing table;
Preferably, in the step, the screening of sentence confidence level can be also carried out in such a way that confidence threshold value is set, that is, is existed
Before the step of being ranked up according to the sentence confidence level to the identification sentence, whether each sentence confidence level is sequentially judged
Greater than confidence threshold value, and when determining sentence confidence level less than confidence threshold value, corresponding identification sentence is deleted, with
Make to sort for corresponding identification sentence without identification, and then effectively raises the efficiency of follow-up data processing;
Step S30 obtains target text feature, and according to the target text feature sequentially and in the identification sequencing table
The identification sentence matched, when successful match, the current identification sentence is exported;
Wherein, which includes content of text or text size, and text content is what user returned in advance
Text and/or number are formed, and text length is therefore the word length of user preset limitation can use in the step
Characters matching or the matched mode of length to carry out the matching between the target text feature and the identification sentence, and are worked as
When determining successful match, then determine that the current identification sentence is most reliable speech recognition result;
The present embodiment, by the design based on confidence calculations, to knowledge in a manner of reaching and use the sequence of sentence confidence level
Other sentence carries out the sequence of reliability, and by based on the identification in the target text feature and the identification sequencing table
Sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to the shadow of recognition result
It rings, and then improves the accuracy and availability of speech recognition.
Embodiment two
Referring to Fig. 2, be the flow chart for the audio recognition method that second embodiment of the invention provides, comprising steps of
Step S11 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain
Item identifies sentence;
Wherein, voice to be identified can be obtained by the way of sound pick-up, i.e., ought receive adopting for the output of user
When collection instruction, sound pick-up is opened to trigger the acquisition for carrying out voice to be identified by control, it is preferred that in the step, limited shape
The model parameter of state converter can be selected independently according to the demand of user;
Step S21 sequentially obtains the syllable of the voice to be identified, to obtain multiple target syllables, Mei Gesuo
It is corresponding with the identification text in the correspondence identification sentence to state target syllable;
Wherein it is possible to by using the corresponding acquisition for carrying out the target syllable of mode of acoustic wave analysis, such as get
Target syllable be " wan ", " wo ", " ni " etc., and in the present embodiment, the target syllable got is sequentially opposite with identification text
It answers;
Step S31 matches the target syllable, with the text database being locally pre-stored to obtain target text
Table;
Wherein, the mapping relations being stored in the text database between different text tables and corresponding syllable therefore should
By the matching based on target syllable and text database in step, corresponding target text table can accurately be obtained;
Step S41, sequentially by every identification sentence, the corresponding identification text of presently described target syllable with
The target text table is matched, to obtain objective degrees of confidence;
Wherein, the corresponding relationship being stored in the target text table between multiple groups different literals and confidence value, therefore,
In the step, by carrying out matched design based on the identification text and the target text table, to obtain the identification language
The corresponding objective degrees of confidence of each syllable in sentence;
Step S51, sequentially by the corresponding objective degrees of confidence of the identification texts all in every identification sentence
Read group total is carried out, to obtain multiple sentence confidence levels, and the identification sentence is carried out according to the sentence confidence level
Sequence, to obtain identification sequencing table;
In the step, the sentence confidence level can also be calculated by the way of averaging, that is, pass through calculating pair
It should identify in sentence after the sum of all described objective degrees of confidence, the word length based on the identification sentence is averaged,
To obtain the sentence confidence level;
Specifically, in the step, the step of identification sentence is ranked up according to sentence confidence level packet
It includes:
Size sequence is carried out to the identification sentence according to the sentence confidence level, and when sequence serial number is equal to default serial number
When, stop sequence;For example, then stopping row when identifying that sequence serial number reaches 100 in sequencing table when this presets serial number 100
Sequence;
Preferably, in the step, the screening of sentence confidence level can be also carried out in such a way that confidence threshold value is set, that is, is existed
Before the step of being ranked up according to the sentence confidence level to the identification sentence, whether each sentence confidence level is sequentially judged
Greater than confidence threshold value, and when determining sentence confidence level less than confidence threshold value, corresponding identification sentence is deleted, with
Make to sort for corresponding identification sentence without identification, and then effectively raises the efficiency of follow-up data processing;
Step S61 obtains target text feature, and according to the target text feature sequentially and in the identification sequencing table
The identification sentence matched, when successful match, the current identification sentence is exported;
Wherein, which includes content of text or text size, and text content is that user is pre-set
Text and/or number are formed, and text length is therefore the word length of user preset limitation can use in the step
Characters matching or the matched mode of length to carry out the matching between the target text feature and the identification sentence, and are worked as
When determining successful match, then determine that the current identification sentence is most reliable speech recognition result;
Preferably, in the step, it is described according to the target text feature sequentially in the identification sequencing table described in
Identify that the step of sentence is matched includes:
When the target text feature be target text length when, according to the target text length sequentially with the identification
The identification sentence in sequencing table carries out length matching;
When the target text feature be target text content when, according to the target text content sequentially with the identification
The identification sentence in sequencing table carries out text matches;
The present embodiment, by the design based on confidence calculations, to knowledge in a manner of reaching and use the sequence of sentence confidence level
Other sentence carries out the sequence of reliability, and by based on the identification in the target text feature and the identification sequencing table
Sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to the shadow of recognition result
It rings, and then improves the accuracy and availability of speech recognition.
Embodiment three
Referring to Fig. 3, be the flow chart for the audio recognition method that third embodiment of the invention provides, comprising steps of
Step S12 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain
Item identifies sentence;
Step S22 sequentially judges in every identification sentence with the presence or absence of default specified word;
Wherein, which can carry out text setting according to user demand, and the default specified word can be with
Mode for multiple texts or phrase is constituted, and in the step, is designed by the judgement based on the default specified word, with effective
Determine whether the corresponding identification sentence is accurate;
When the judging result of step S22 is no, step S32 is executed;
Step S32 deletes the identification sentence;
When the judging result of step S22, which is, is, step S42 is executed;
Step S42 sequentially obtains the syllable of the voice to be identified, to obtain multiple target syllables, Mei Gesuo
It is corresponding with the identification text in the correspondence identification sentence to state target syllable;
Step S52 matches the target syllable, with the text database being locally pre-stored to obtain target text
Table;
Wherein, the corresponding relationship being stored in the target text table between multiple groups different literals and confidence value;
Step S62, sequentially by every identification sentence, the corresponding identification text of presently described target syllable with
The target text table is matched, to obtain objective degrees of confidence;
Step S72 judges between the adjacent identification text with the presence or absence of incidence relation;
Specifically, in the step, it is described to judge to wrap the step of whether there is incidence relation between the adjacent identification text
It includes:
A group word is carried out to the adjacent identification text respectively, to obtain multiple identification phrases;
Locally pre-stored association phrase table is obtained, and the identification phrase is matched with the association phrase table;
When the identification phrase and the association phrase table successful match, the corresponding adjacent institute of the identification phrase is determined
It states and identifies that there are incidence relations between text;
When the judging result of step S72, which is, is, step S82 is executed;
Step S82 obtains associated confidence, and the associated confidence seeks the correspondence objective degrees of confidence
With;
When the judging result of step S72 is no, directly execution step S92;
Step S92, sequentially by the corresponding objective degrees of confidence of the identification texts all in every identification sentence
Read group total is carried out, to obtain multiple sentence confidence levels, and the identification sentence is carried out according to the sentence confidence level
Sequence, to obtain identification sequencing table;
Step S102, obtain target text feature, and according to the target text feature sequentially with the identification sequencing table
In the identification sentence matched, when successful match, the current identification sentence is exported;
In the present embodiment, through the design based on confidence calculations, in a manner of reaching and use the sequence of sentence confidence level pair
Identify that sentence carries out the sequence of reliability, and by based on the knowledge in the target text feature and the identification sequencing table
Other sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to recognition result
It influences, and then improves the accuracy and availability of speech recognition.
Example IV
Referring to Fig. 4, being the structural schematic diagram for the speech recognition system 100 that fourth embodiment of the invention provides, including language
Sound obtains module 10, confidence calculations module 11 and sentence output module 12, in which:
Voice obtains module 10, converts for obtaining voice to be identified, and by the voice input finite state to be identified
Device, to obtain a plurality of identification sentence.
Confidence calculations module 11, it is multiple to obtain for carrying out confidence calculations to identification sentence described in every respectively
Sentence confidence level, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table.
Wherein, the confidence calculations module 11 is also used to: sequentially the syllable of the voice to be identified is obtained, with
Multiple target syllables are obtained, each target syllable is corresponding with the identification text in the correspondence identification sentence;By institute
It states target syllable to be matched with locally pre-stored text database, to obtain target text table, in the target text table
The corresponding relationship being stored between multiple groups different literals and confidence value;It is presently described sequentially by every identification sentence
The corresponding identification text of target syllable is matched with the target text table, to obtain objective degrees of confidence;It sequentially will be every
The corresponding objective degrees of confidence progress read group total of all identification texts in identification sentence described in item, to obtain multiple institutes
Predicate sentence confidence level.
Preferably, the confidence calculations module 11 is also used to: being judged between the adjacent identification text with the presence or absence of pass
Connection relationship;If so, obtaining associated confidence, and the associated confidence sums the correspondence objective degrees of confidence.
Further, the confidence calculations module 11 is also used to: a group word is carried out to the adjacent identification text respectively,
To obtain multiple identification phrases;Obtain locally pre-stored association phrase table, and by the identification phrase and the association phrase
Table is matched;When the identification phrase and the association phrase table successful match, the corresponding phase of the identification phrase is determined
There are incidence relations between the adjacent identification text.
In the present embodiment, the confidence calculations module 11 is also used to: according to the sentence confidence level to the identification language
Sentence carries out size sequence, and when the serial number that sorts is equal to default serial number, stops sequence.
Sentence output module 12, for obtaining target text feature, and according to the target text feature sequentially with it is described
The identification sentence in identification sequencing table is matched, and when successful match, the current identification sentence is exported.
Wherein, the sentence output module 12 is also used to: when the target text feature is target text length, according to
The target text length sequentially carries out length matching with the identification sentence in the identification sequencing table;When the target text
Eigen be target text content when, according to the target text content sequentially with it is described identification sequencing table in the identification language
Sentence carries out text matches.
In addition, in the present embodiment, the speech recognition system 100 further include:
Sentence removing module 13, for sequentially judging in every identification sentence with the presence or absence of default specified word;If
It is no, then the identification sentence is deleted.
The present embodiment, by the design based on confidence calculations, to knowledge in a manner of reaching and use the sequence of sentence confidence level
Other sentence carries out the sequence of reliability, and by based on the identification in the target text feature and the identification sequencing table
Sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to the shadow of recognition result
It rings, and then improves the accuracy and availability of speech recognition.
Embodiment five
Referring to Fig. 5, being the mobile terminal 101 that fifth embodiment of the invention provides, including equipment and processor are stored,
The storage equipment runs the computer program so that the mobile terminal for storing computer program, the processor
101 execute above-mentioned audio recognition method.
The present embodiment additionally provides a kind of storage medium, is stored thereon with calculating used in above-mentioned mobile terminal 101
Machine program, the program when being executed, include the following steps:
Voice to be identified is obtained, and the voice to be identified is inputted into finite state converter, to obtain a plurality of identification language
Sentence;
Confidence calculations are carried out to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and according to described
Sentence confidence level is ranked up the identification sentence, to obtain identification sequencing table;
Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the knowledge
Other sentence is matched, and when successful match, the current identification sentence is exported.The storage medium, such as:
ROM/RAM, magnetic disk, CD etc..
It is apparent to those skilled in the art that for convenience and simplicity of description, only with above-mentioned each function
The division progress of unit, module can according to need and for example, in practical application by above-mentioned function distribution by different function
Energy unit or module are completed, i.e., the internal structure of storage device is divided into different functional unit or module, more than completing
The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit,
It can be each unit to physically exist alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated
Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list
Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.
It will be understood by those skilled in the art that composed structure shown in Fig. 4 is not constituted to speech recognition of the invention
The restriction of system may include perhaps combining certain components or different component cloth than illustrating more or fewer components
It sets, and the audio recognition method in Fig. 1-3 also uses more or fewer components shown in Fig. 4, or the certain components of combination,
Or different component layout is realized.The so-called unit of the present invention, module etc. refer to that one kind can be known by the target voice
The performed simultaneously function of processor (not shown) in other system enough completes the series of computation machine program of specific function, can deposit
It is stored in the storage equipment (not shown) of the target voice identifying system.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (10)
1. a kind of audio recognition method, which is characterized in that the described method includes:
Voice to be identified is obtained, and the voice to be identified is inputted into finite state converter, to obtain a plurality of identification sentence;
Confidence calculations are carried out to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and according to the sentence
Confidence level is ranked up the identification sentence, to obtain identification sequencing table;
Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the identification language
Sentence is matched, and when successful match, the current identification sentence is exported.
2. audio recognition method as described in claim 1, which is characterized in that described to be carried out respectively to identification sentence described in every
The step of confidence calculations includes:
Sequentially the syllable of the voice to be identified is obtained, to obtain multiple target syllables, each target syllable is equal
It is corresponding with the identification text in the correspondence identification sentence;
The target syllable is matched with the text database being locally pre-stored, to obtain target text table, the target
The corresponding relationship being stored in text table between multiple groups different literals and confidence value;
Sequentially by every identification sentence, the presently described corresponding identification text of target syllable and the target text
Table is matched, to obtain objective degrees of confidence;
The corresponding objective degrees of confidence of the identification texts all in every identification sentence is sequentially subjected to read group total,
To obtain multiple sentence confidence levels.
3. audio recognition method as claimed in claim 2, which is characterized in that described sequentially by institute in every identification sentence
After having the step of corresponding objective degrees of confidence of the identification text carries out read group total, the method also includes:
Judge between the adjacent identification text with the presence or absence of incidence relation;
If so, obtaining associated confidence, and the associated confidence sums the correspondence objective degrees of confidence.
4. audio recognition method as claimed in claim 3, which is characterized in that be between the adjacent identification text of the judgement
It is no that there are the step of incidence relation to include:
A group word is carried out to the adjacent identification text respectively, to obtain multiple identification phrases;
Locally pre-stored association phrase table is obtained, and the identification phrase is matched with the association phrase table;
When the identification phrase and the association phrase table successful match, the corresponding adjacent knowledge of the identification phrase is determined
There are incidence relations between other text.
5. audio recognition method as described in claim 1, which is characterized in that described to be carried out respectively to identification sentence described in every
Before the step of confidence calculations, the method also includes:
Sequentially judge in every identification sentence with the presence or absence of default specified word;
If it is not, then the identification sentence is deleted.
6. audio recognition method as described in claim 1, which is characterized in that it is described according to the sentence confidence level to the knowledge
The step of other sentence is ranked up include:
Size sequence is carried out to the identification sentence according to the sentence confidence level, and when the serial number that sorts is equal to default serial number,
Stop sequence.
7. audio recognition method as described in claim 1, which is characterized in that it is described according to the target text feature sequentially with
The step of identification sentence in the identification sequencing table is matched include:
When the target text feature is target text length, sequentially sorted with the identification according to the target text length
The identification sentence in table carries out length matching;
When the target text feature is target text content, sequentially sorted with the identification according to the target text content
The identification sentence in table carries out text matches.
8. a kind of speech recognition system, which is characterized in that the system comprises:
Voice obtains module, inputs finite state converter for obtaining voice to be identified, and by the voice to be identified, with
To a plurality of identification sentence;
Confidence calculations module is set for carrying out confidence calculations to identification sentence described in every respectively with obtaining multiple sentences
Reliability, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table;
Sentence output module is sequentially arranged with the identification for obtaining target text feature, and according to the target text feature
The identification sentence in sequence table is matched, and when successful match, the current identification sentence is exported.
9. a kind of mobile terminal, which is characterized in that including storage equipment and processor, the storage equipment is calculated for storing
Machine program, the processor runs the computer program so that the mobile terminal execution is according to claim 1 to any one of 7
The audio recognition method.
10. a kind of storage medium, which is characterized in that it is stored with calculating used in mobile terminal as claimed in claim 9
Machine program realizes the step of the described in any item audio recognition methods of claim 1 to 7 when the computer program is executed by processor
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910812864.6A CN110503958A (en) | 2019-08-30 | 2019-08-30 | Audio recognition method, system, mobile terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910812864.6A CN110503958A (en) | 2019-08-30 | 2019-08-30 | Audio recognition method, system, mobile terminal and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110503958A true CN110503958A (en) | 2019-11-26 |
Family
ID=68590573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910812864.6A Pending CN110503958A (en) | 2019-08-30 | 2019-08-30 | Audio recognition method, system, mobile terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110503958A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111627446A (en) * | 2020-05-29 | 2020-09-04 | 国网浙江省电力有限公司信息通信分公司 | Communication conference system based on intelligent voice recognition technology |
CN112633201A (en) * | 2020-12-29 | 2021-04-09 | 交通银行股份有限公司 | Multi-mode in-vivo detection method and device, computer equipment and storage medium |
CN117033612A (en) * | 2023-08-18 | 2023-11-10 | 中航信移动科技有限公司 | Text matching method, electronic equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067780A (en) * | 2007-06-21 | 2007-11-07 | 腾讯科技(深圳)有限公司 | Character inputting system and method for intelligent equipment |
CN101802812A (en) * | 2007-08-01 | 2010-08-11 | 金格软件有限公司 | Automatic context sensitive language correction and enhancement using an internet corpus |
CN106486126A (en) * | 2016-12-19 | 2017-03-08 | 北京云知声信息技术有限公司 | Speech recognition error correction method and device |
CN106847288A (en) * | 2017-02-17 | 2017-06-13 | 上海创米科技有限公司 | The error correction method and device of speech recognition text |
CN107564528A (en) * | 2017-09-20 | 2018-01-09 | 深圳市空谷幽兰人工智能科技有限公司 | A kind of speech recognition text and the method and apparatus of order word text matches |
CN109065031A (en) * | 2018-08-02 | 2018-12-21 | 阿里巴巴集团控股有限公司 | Voice annotation method, device and equipment |
CN109192194A (en) * | 2018-08-22 | 2019-01-11 | 北京百度网讯科技有限公司 | Voice data mask method, device, computer equipment and storage medium |
CN109524008A (en) * | 2018-11-16 | 2019-03-26 | 广东小天才科技有限公司 | Voice recognition method, device and equipment |
CN109801628A (en) * | 2019-02-11 | 2019-05-24 | 龙马智芯(珠海横琴)科技有限公司 | A kind of corpus collection method, apparatus and system |
-
2019
- 2019-08-30 CN CN201910812864.6A patent/CN110503958A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067780A (en) * | 2007-06-21 | 2007-11-07 | 腾讯科技(深圳)有限公司 | Character inputting system and method for intelligent equipment |
CN101802812A (en) * | 2007-08-01 | 2010-08-11 | 金格软件有限公司 | Automatic context sensitive language correction and enhancement using an internet corpus |
CN106486126A (en) * | 2016-12-19 | 2017-03-08 | 北京云知声信息技术有限公司 | Speech recognition error correction method and device |
CN106847288A (en) * | 2017-02-17 | 2017-06-13 | 上海创米科技有限公司 | The error correction method and device of speech recognition text |
CN107564528A (en) * | 2017-09-20 | 2018-01-09 | 深圳市空谷幽兰人工智能科技有限公司 | A kind of speech recognition text and the method and apparatus of order word text matches |
CN109065031A (en) * | 2018-08-02 | 2018-12-21 | 阿里巴巴集团控股有限公司 | Voice annotation method, device and equipment |
CN109192194A (en) * | 2018-08-22 | 2019-01-11 | 北京百度网讯科技有限公司 | Voice data mask method, device, computer equipment and storage medium |
CN109524008A (en) * | 2018-11-16 | 2019-03-26 | 广东小天才科技有限公司 | Voice recognition method, device and equipment |
CN109801628A (en) * | 2019-02-11 | 2019-05-24 | 龙马智芯(珠海横琴)科技有限公司 | A kind of corpus collection method, apparatus and system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111627446A (en) * | 2020-05-29 | 2020-09-04 | 国网浙江省电力有限公司信息通信分公司 | Communication conference system based on intelligent voice recognition technology |
CN112633201A (en) * | 2020-12-29 | 2021-04-09 | 交通银行股份有限公司 | Multi-mode in-vivo detection method and device, computer equipment and storage medium |
CN117033612A (en) * | 2023-08-18 | 2023-11-10 | 中航信移动科技有限公司 | Text matching method, electronic equipment and storage medium |
CN117033612B (en) * | 2023-08-18 | 2024-06-04 | 中航信移动科技有限公司 | Text matching method, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8666726B2 (en) | Sample clustering to reduce manual transcriptions in speech recognition system | |
CN110110062B (en) | Machine intelligent question and answer method and device and electronic equipment | |
US6208971B1 (en) | Method and apparatus for command recognition using data-driven semantic inference | |
CN104252864B (en) | Real-time voice analysis method and system | |
US8412530B2 (en) | Method and apparatus for detection of sentiment in automated transcriptions | |
JP5377430B2 (en) | Question answering database expansion device and question answering database expansion method | |
CN110503958A (en) | Audio recognition method, system, mobile terminal and storage medium | |
CN110415679A (en) | Voice error correction method, device, equipment and storage medium | |
CN107077843A (en) | Session control and dialog control method | |
Howell et al. | Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers | |
CN111833853A (en) | Voice processing method and device, electronic equipment and computer readable storage medium | |
KR102024845B1 (en) | Device and method for analyzing speech act | |
CN111883122A (en) | Voice recognition method and device, storage medium and electronic equipment | |
CN111209363B (en) | Corpus data processing method, corpus data processing device, server and storage medium | |
CN105845133A (en) | Voice signal processing method and apparatus | |
Verbree et al. | Dialogue-act tagging using smart feature selection; results on multiple corpora | |
CN110717021A (en) | Input text and related device for obtaining artificial intelligence interview | |
CN110085217A (en) | Phonetic navigation method, device and terminal device | |
Chakraborty et al. | Knowledge-based framework for intelligent emotion recognition in spontaneous speech | |
López-Cózar et al. | Enhancement of emotion detection in spoken dialogue systems by combining several information sources | |
CN110580897B (en) | Audio verification method and device, storage medium and electronic equipment | |
Craighead et al. | Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions | |
CN114627896A (en) | Voice evaluation method, device, equipment and storage medium | |
KR102300427B1 (en) | Learning word collection device for improving the recognition rate of speech recognizer and operating method thereof | |
Alshammri | IoT‐Based Voice‐Controlled Smart Homes with Source Separation Based on Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191126 |