CN110503958A - Audio recognition method, system, mobile terminal and storage medium - Google Patents

Audio recognition method, system, mobile terminal and storage medium Download PDF

Info

Publication number
CN110503958A
CN110503958A CN201910812864.6A CN201910812864A CN110503958A CN 110503958 A CN110503958 A CN 110503958A CN 201910812864 A CN201910812864 A CN 201910812864A CN 110503958 A CN110503958 A CN 110503958A
Authority
CN
China
Prior art keywords
identification
sentence
confidence
text
target text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910812864.6A
Other languages
Chinese (zh)
Inventor
洪国强
肖龙源
李稀敏
蔡振华
刘晓葳
王静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN201910812864.6A priority Critical patent/CN110503958A/en
Publication of CN110503958A publication Critical patent/CN110503958A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/027Syllables being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

The present invention is suitable for technical field of voice recognition, provides a kind of audio recognition method, system, mobile terminal and storage medium, this method comprises: voice to be identified is inputted finite state converter, to obtain a plurality of identification sentence;Confidence calculations are carried out to every identification sentence respectively, to obtain multiple sentence confidence levels and be ranked up, to obtain identification sequencing table;Target text feature is obtained, and is sequentially matched with the identification sentence in identification sequencing table according to target text feature, when successful match, current identification sentence is exported.The present invention passes through the design based on confidence calculations, the sequence of reliability is carried out to identification sentence in a manner of reaching and use the sequence of sentence confidence level, and by carrying out matched design based on the other sentence of institute in target text feature and identification sequencing table, influence of the data to recognition result of the accent due to people, the noise of environment, training pattern is reduced, and then improves the accuracy and availability of speech recognition.

Description

Audio recognition method, system, mobile terminal and storage medium
Technical field
The invention belongs to technical field of voice recognition more particularly to a kind of audio recognition method, system, mobile terminal and deposit Storage media.
Background technique
With the development of speech recognition technology and the continuous improvement of discrimination, speech recognition technology is answered more and more For in each scene of daily life.Wherein, in vocal print field there are two identification, one is content of text Speaker Identification, The other is dynamic digital Speaker Identification, both have all used speech identifying function, and only voice content is correctly It will do it people's identification of further speaking, therefore, the accuracy of speech recognition is particularly important.
In existing audio recognition method, since the accent of people, the noise of environment, data of training pattern etc. influence, make The accuracy rate for obtaining speech recognition is low, reduces the availability of Application on Voiceprint Recognition.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of audio recognition method, system, mobile terminal and storage medium, purport It is low carrying out identification accuracy caused by speech recognition using sound wave model in solving existing audio recognition method The problem of.
The embodiments of the present invention are implemented as follows, a kind of audio recognition method, which comprises
Voice to be identified is obtained, and the voice to be identified is inputted into finite state converter, to obtain a plurality of identification language Sentence;
Confidence calculations are carried out to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and according to described Sentence confidence level is ranked up the identification sentence, to obtain identification sequencing table;
Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the knowledge Other sentence is matched, and when successful match, the current identification sentence is exported.
Further, described the step of carrying out confidence calculations to identification sentence described in every respectively, includes:
Sequentially the syllable of the voice to be identified is obtained, to obtain multiple target syllables, each target sound Section is corresponding with the identification text in the correspondence identification sentence;
The target syllable is matched with the text database being locally pre-stored, it is described to obtain target text table The corresponding relationship being stored in target text table between multiple groups different literals and confidence value;
Sequentially by every identification sentence, the presently described corresponding identification text of target syllable and the target Text table is matched, to obtain objective degrees of confidence;
Sequentially the corresponding objective degrees of confidence of the identification texts all in every identification sentence is summed It calculates, to obtain multiple sentence confidence levels.
Further, described sequentially by the corresponding target of the identification texts all in every identification sentence After confidence level carries out the step of read group total, the method also includes:
Judge between the adjacent identification text with the presence or absence of incidence relation;
If so, obtaining associated confidence, and the associated confidence sums the correspondence objective degrees of confidence.
Further, include: with the presence or absence of the step of incidence relation between the adjacent identification text of the judgement
A group word is carried out to the adjacent identification text respectively, to obtain multiple identification phrases;
Locally pre-stored association phrase table is obtained, and the identification phrase is matched with the association phrase table;
When the identification phrase and the association phrase table successful match, the corresponding adjacent institute of the identification phrase is determined It states and identifies that there are incidence relations between text.
Further, before described the step of carrying out confidence calculations to identification sentence described in every respectively, the side Method further include:
Sequentially judge in every identification sentence with the presence or absence of default specified word;
If it is not, then the identification sentence is deleted.
Further, described the step of being ranked up according to the sentence confidence level to the identification sentence, includes:
Size sequence is carried out to the identification sentence according to the sentence confidence level, and when sequence serial number is equal to default serial number When, stop sequence.
Further, it is described according to the target text feature sequentially with it is described identification sequencing table in the identification language Sentence the step of being matched includes:
When the target text feature be target text length when, according to the target text length sequentially with the identification The identification sentence in sequencing table carries out length matching;
When the target text feature be target text content when, according to the target text content sequentially with the identification The identification sentence in sequencing table carries out text matches.
The another object of the embodiment of the present invention is to provide a kind of speech recognition system, the system comprises:
Voice obtains module, inputs finite state converter for obtaining voice to be identified, and by the voice to be identified, To obtain a plurality of identification sentence;
Confidence calculations module, for carrying out confidence calculations to identification sentence described in every respectively, to obtain multiple languages Sentence confidence level, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table;
Sentence output module, for obtaining target text feature, and according to the target text feature sequentially with the knowledge The identification sentence in other sequencing table is matched, and when successful match, the current identification sentence is exported.
The another object of the embodiment of the present invention is to provide a kind of mobile terminal, including storage equipment and processor, institute It states storage equipment and runs the computer program so that the mobile terminal execution for storing computer program, the processor Above-mentioned audio recognition method.
The another object of the embodiment of the present invention is to provide a kind of storage medium, is stored with institute in above-mentioned mobile terminal The step of computer program used, which realizes above-mentioned audio recognition method when being executed by processor.
The embodiment of the present invention, through the design based on confidence calculations, in a manner of reaching and use the sequence of sentence confidence level Carry out the sequence of reliability to identification sentence, and by based on described in the target text feature and the identification sequencing table Identify that sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to recognition result Influence, and then improve the accuracy and availability of speech recognition.
Detailed description of the invention
Fig. 1 is the flow chart for the audio recognition method that first embodiment of the invention provides;
Fig. 2 is the flow chart for the audio recognition method that second embodiment of the invention provides;
Fig. 3 is the flow chart for the audio recognition method that third embodiment of the invention provides;
Fig. 4 is the structural schematic diagram for the speech recognition system that fourth embodiment of the invention provides;
Fig. 5 is the structural schematic diagram for the mobile terminal that fifth embodiment of the invention provides.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
Embodiment one
Referring to Fig. 1, be the flow chart for the audio recognition method that first embodiment of the invention provides, comprising steps of
Step S10 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain Item identifies sentence;
Wherein, voice to be identified can be obtained by the way of sound pick-up, i.e., ought receive adopting for the output of user When collection instruction, sound pick-up is opened to trigger the acquisition for carrying out voice to be identified by control, it is preferred that in the step, limited shape The model parameter of state converter can be selected independently according to the demand of user;
Specifically, the text between identification sentence can be identical or not identical, but sentence composed by text is respectively not It is identical, i.e., repeat statement is not present between all identification sentences;
Step S20 carries out confidence calculations to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and The identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table;
Wherein, which can be calculated based on the mode of preset model or preset algorithm, the sentence confidence Degree shows as the accuracy of corresponding identification sentence, i.e., when while statement confidence level is higher, then determines the accurate of corresponding identification sentence It spends higher;
Specifically, the sequence quantity in identification sequencing table can be configured according to user demand, in the step, retain 100 Item identifies sentence, i.e., stops sequence when identifying that sequence serial number reaches 100 in sequencing table;
Preferably, in the step, the screening of sentence confidence level can be also carried out in such a way that confidence threshold value is set, that is, is existed Before the step of being ranked up according to the sentence confidence level to the identification sentence, whether each sentence confidence level is sequentially judged Greater than confidence threshold value, and when determining sentence confidence level less than confidence threshold value, corresponding identification sentence is deleted, with Make to sort for corresponding identification sentence without identification, and then effectively raises the efficiency of follow-up data processing;
Step S30 obtains target text feature, and according to the target text feature sequentially and in the identification sequencing table The identification sentence matched, when successful match, the current identification sentence is exported;
Wherein, which includes content of text or text size, and text content is what user returned in advance Text and/or number are formed, and text length is therefore the word length of user preset limitation can use in the step Characters matching or the matched mode of length to carry out the matching between the target text feature and the identification sentence, and are worked as When determining successful match, then determine that the current identification sentence is most reliable speech recognition result;
The present embodiment, by the design based on confidence calculations, to knowledge in a manner of reaching and use the sequence of sentence confidence level Other sentence carries out the sequence of reliability, and by based on the identification in the target text feature and the identification sequencing table Sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to the shadow of recognition result It rings, and then improves the accuracy and availability of speech recognition.
Embodiment two
Referring to Fig. 2, be the flow chart for the audio recognition method that second embodiment of the invention provides, comprising steps of
Step S11 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain Item identifies sentence;
Wherein, voice to be identified can be obtained by the way of sound pick-up, i.e., ought receive adopting for the output of user When collection instruction, sound pick-up is opened to trigger the acquisition for carrying out voice to be identified by control, it is preferred that in the step, limited shape The model parameter of state converter can be selected independently according to the demand of user;
Step S21 sequentially obtains the syllable of the voice to be identified, to obtain multiple target syllables, Mei Gesuo It is corresponding with the identification text in the correspondence identification sentence to state target syllable;
Wherein it is possible to by using the corresponding acquisition for carrying out the target syllable of mode of acoustic wave analysis, such as get Target syllable be " wan ", " wo ", " ni " etc., and in the present embodiment, the target syllable got is sequentially opposite with identification text It answers;
Step S31 matches the target syllable, with the text database being locally pre-stored to obtain target text Table;
Wherein, the mapping relations being stored in the text database between different text tables and corresponding syllable therefore should By the matching based on target syllable and text database in step, corresponding target text table can accurately be obtained;
Step S41, sequentially by every identification sentence, the corresponding identification text of presently described target syllable with The target text table is matched, to obtain objective degrees of confidence;
Wherein, the corresponding relationship being stored in the target text table between multiple groups different literals and confidence value, therefore, In the step, by carrying out matched design based on the identification text and the target text table, to obtain the identification language The corresponding objective degrees of confidence of each syllable in sentence;
Step S51, sequentially by the corresponding objective degrees of confidence of the identification texts all in every identification sentence Read group total is carried out, to obtain multiple sentence confidence levels, and the identification sentence is carried out according to the sentence confidence level Sequence, to obtain identification sequencing table;
In the step, the sentence confidence level can also be calculated by the way of averaging, that is, pass through calculating pair It should identify in sentence after the sum of all described objective degrees of confidence, the word length based on the identification sentence is averaged, To obtain the sentence confidence level;
Specifically, in the step, the step of identification sentence is ranked up according to sentence confidence level packet It includes:
Size sequence is carried out to the identification sentence according to the sentence confidence level, and when sequence serial number is equal to default serial number When, stop sequence;For example, then stopping row when identifying that sequence serial number reaches 100 in sequencing table when this presets serial number 100 Sequence;
Preferably, in the step, the screening of sentence confidence level can be also carried out in such a way that confidence threshold value is set, that is, is existed Before the step of being ranked up according to the sentence confidence level to the identification sentence, whether each sentence confidence level is sequentially judged Greater than confidence threshold value, and when determining sentence confidence level less than confidence threshold value, corresponding identification sentence is deleted, with Make to sort for corresponding identification sentence without identification, and then effectively raises the efficiency of follow-up data processing;
Step S61 obtains target text feature, and according to the target text feature sequentially and in the identification sequencing table The identification sentence matched, when successful match, the current identification sentence is exported;
Wherein, which includes content of text or text size, and text content is that user is pre-set Text and/or number are formed, and text length is therefore the word length of user preset limitation can use in the step Characters matching or the matched mode of length to carry out the matching between the target text feature and the identification sentence, and are worked as When determining successful match, then determine that the current identification sentence is most reliable speech recognition result;
Preferably, in the step, it is described according to the target text feature sequentially in the identification sequencing table described in Identify that the step of sentence is matched includes:
When the target text feature be target text length when, according to the target text length sequentially with the identification The identification sentence in sequencing table carries out length matching;
When the target text feature be target text content when, according to the target text content sequentially with the identification The identification sentence in sequencing table carries out text matches;
The present embodiment, by the design based on confidence calculations, to knowledge in a manner of reaching and use the sequence of sentence confidence level Other sentence carries out the sequence of reliability, and by based on the identification in the target text feature and the identification sequencing table Sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to the shadow of recognition result It rings, and then improves the accuracy and availability of speech recognition.
Embodiment three
Referring to Fig. 3, be the flow chart for the audio recognition method that third embodiment of the invention provides, comprising steps of
Step S12 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain Item identifies sentence;
Step S22 sequentially judges in every identification sentence with the presence or absence of default specified word;
Wherein, which can carry out text setting according to user demand, and the default specified word can be with Mode for multiple texts or phrase is constituted, and in the step, is designed by the judgement based on the default specified word, with effective Determine whether the corresponding identification sentence is accurate;
When the judging result of step S22 is no, step S32 is executed;
Step S32 deletes the identification sentence;
When the judging result of step S22, which is, is, step S42 is executed;
Step S42 sequentially obtains the syllable of the voice to be identified, to obtain multiple target syllables, Mei Gesuo It is corresponding with the identification text in the correspondence identification sentence to state target syllable;
Step S52 matches the target syllable, with the text database being locally pre-stored to obtain target text Table;
Wherein, the corresponding relationship being stored in the target text table between multiple groups different literals and confidence value;
Step S62, sequentially by every identification sentence, the corresponding identification text of presently described target syllable with The target text table is matched, to obtain objective degrees of confidence;
Step S72 judges between the adjacent identification text with the presence or absence of incidence relation;
Specifically, in the step, it is described to judge to wrap the step of whether there is incidence relation between the adjacent identification text It includes:
A group word is carried out to the adjacent identification text respectively, to obtain multiple identification phrases;
Locally pre-stored association phrase table is obtained, and the identification phrase is matched with the association phrase table;
When the identification phrase and the association phrase table successful match, the corresponding adjacent institute of the identification phrase is determined It states and identifies that there are incidence relations between text;
When the judging result of step S72, which is, is, step S82 is executed;
Step S82 obtains associated confidence, and the associated confidence seeks the correspondence objective degrees of confidence With;
When the judging result of step S72 is no, directly execution step S92;
Step S92, sequentially by the corresponding objective degrees of confidence of the identification texts all in every identification sentence Read group total is carried out, to obtain multiple sentence confidence levels, and the identification sentence is carried out according to the sentence confidence level Sequence, to obtain identification sequencing table;
Step S102, obtain target text feature, and according to the target text feature sequentially with the identification sequencing table In the identification sentence matched, when successful match, the current identification sentence is exported;
In the present embodiment, through the design based on confidence calculations, in a manner of reaching and use the sequence of sentence confidence level pair Identify that sentence carries out the sequence of reliability, and by based on the knowledge in the target text feature and the identification sequencing table Other sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to recognition result It influences, and then improves the accuracy and availability of speech recognition.
Example IV
Referring to Fig. 4, being the structural schematic diagram for the speech recognition system 100 that fourth embodiment of the invention provides, including language Sound obtains module 10, confidence calculations module 11 and sentence output module 12, in which:
Voice obtains module 10, converts for obtaining voice to be identified, and by the voice input finite state to be identified Device, to obtain a plurality of identification sentence.
Confidence calculations module 11, it is multiple to obtain for carrying out confidence calculations to identification sentence described in every respectively Sentence confidence level, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table.
Wherein, the confidence calculations module 11 is also used to: sequentially the syllable of the voice to be identified is obtained, with Multiple target syllables are obtained, each target syllable is corresponding with the identification text in the correspondence identification sentence;By institute It states target syllable to be matched with locally pre-stored text database, to obtain target text table, in the target text table The corresponding relationship being stored between multiple groups different literals and confidence value;It is presently described sequentially by every identification sentence The corresponding identification text of target syllable is matched with the target text table, to obtain objective degrees of confidence;It sequentially will be every The corresponding objective degrees of confidence progress read group total of all identification texts in identification sentence described in item, to obtain multiple institutes Predicate sentence confidence level.
Preferably, the confidence calculations module 11 is also used to: being judged between the adjacent identification text with the presence or absence of pass Connection relationship;If so, obtaining associated confidence, and the associated confidence sums the correspondence objective degrees of confidence.
Further, the confidence calculations module 11 is also used to: a group word is carried out to the adjacent identification text respectively, To obtain multiple identification phrases;Obtain locally pre-stored association phrase table, and by the identification phrase and the association phrase Table is matched;When the identification phrase and the association phrase table successful match, the corresponding phase of the identification phrase is determined There are incidence relations between the adjacent identification text.
In the present embodiment, the confidence calculations module 11 is also used to: according to the sentence confidence level to the identification language Sentence carries out size sequence, and when the serial number that sorts is equal to default serial number, stops sequence.
Sentence output module 12, for obtaining target text feature, and according to the target text feature sequentially with it is described The identification sentence in identification sequencing table is matched, and when successful match, the current identification sentence is exported.
Wherein, the sentence output module 12 is also used to: when the target text feature is target text length, according to The target text length sequentially carries out length matching with the identification sentence in the identification sequencing table;When the target text Eigen be target text content when, according to the target text content sequentially with it is described identification sequencing table in the identification language Sentence carries out text matches.
In addition, in the present embodiment, the speech recognition system 100 further include:
Sentence removing module 13, for sequentially judging in every identification sentence with the presence or absence of default specified word;If It is no, then the identification sentence is deleted.
The present embodiment, by the design based on confidence calculations, to knowledge in a manner of reaching and use the sequence of sentence confidence level Other sentence carries out the sequence of reliability, and by based on the identification in the target text feature and the identification sequencing table Sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to the shadow of recognition result It rings, and then improves the accuracy and availability of speech recognition.
Embodiment five
Referring to Fig. 5, being the mobile terminal 101 that fifth embodiment of the invention provides, including equipment and processor are stored, The storage equipment runs the computer program so that the mobile terminal for storing computer program, the processor 101 execute above-mentioned audio recognition method.
The present embodiment additionally provides a kind of storage medium, is stored thereon with calculating used in above-mentioned mobile terminal 101 Machine program, the program when being executed, include the following steps:
Voice to be identified is obtained, and the voice to be identified is inputted into finite state converter, to obtain a plurality of identification language Sentence;
Confidence calculations are carried out to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and according to described Sentence confidence level is ranked up the identification sentence, to obtain identification sequencing table;
Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the knowledge Other sentence is matched, and when successful match, the current identification sentence is exported.The storage medium, such as: ROM/RAM, magnetic disk, CD etc..
It is apparent to those skilled in the art that for convenience and simplicity of description, only with above-mentioned each function The division progress of unit, module can according to need and for example, in practical application by above-mentioned function distribution by different function Energy unit or module are completed, i.e., the internal structure of storage device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, It can be each unit to physically exist alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.
It will be understood by those skilled in the art that composed structure shown in Fig. 4 is not constituted to speech recognition of the invention The restriction of system may include perhaps combining certain components or different component cloth than illustrating more or fewer components It sets, and the audio recognition method in Fig. 1-3 also uses more or fewer components shown in Fig. 4, or the certain components of combination, Or different component layout is realized.The so-called unit of the present invention, module etc. refer to that one kind can be known by the target voice The performed simultaneously function of processor (not shown) in other system enough completes the series of computation machine program of specific function, can deposit It is stored in the storage equipment (not shown) of the target voice identifying system.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (10)

1. a kind of audio recognition method, which is characterized in that the described method includes:
Voice to be identified is obtained, and the voice to be identified is inputted into finite state converter, to obtain a plurality of identification sentence;
Confidence calculations are carried out to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and according to the sentence Confidence level is ranked up the identification sentence, to obtain identification sequencing table;
Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the identification language Sentence is matched, and when successful match, the current identification sentence is exported.
2. audio recognition method as described in claim 1, which is characterized in that described to be carried out respectively to identification sentence described in every The step of confidence calculations includes:
Sequentially the syllable of the voice to be identified is obtained, to obtain multiple target syllables, each target syllable is equal It is corresponding with the identification text in the correspondence identification sentence;
The target syllable is matched with the text database being locally pre-stored, to obtain target text table, the target The corresponding relationship being stored in text table between multiple groups different literals and confidence value;
Sequentially by every identification sentence, the presently described corresponding identification text of target syllable and the target text Table is matched, to obtain objective degrees of confidence;
The corresponding objective degrees of confidence of the identification texts all in every identification sentence is sequentially subjected to read group total, To obtain multiple sentence confidence levels.
3. audio recognition method as claimed in claim 2, which is characterized in that described sequentially by institute in every identification sentence After having the step of corresponding objective degrees of confidence of the identification text carries out read group total, the method also includes:
Judge between the adjacent identification text with the presence or absence of incidence relation;
If so, obtaining associated confidence, and the associated confidence sums the correspondence objective degrees of confidence.
4. audio recognition method as claimed in claim 3, which is characterized in that be between the adjacent identification text of the judgement It is no that there are the step of incidence relation to include:
A group word is carried out to the adjacent identification text respectively, to obtain multiple identification phrases;
Locally pre-stored association phrase table is obtained, and the identification phrase is matched with the association phrase table;
When the identification phrase and the association phrase table successful match, the corresponding adjacent knowledge of the identification phrase is determined There are incidence relations between other text.
5. audio recognition method as described in claim 1, which is characterized in that described to be carried out respectively to identification sentence described in every Before the step of confidence calculations, the method also includes:
Sequentially judge in every identification sentence with the presence or absence of default specified word;
If it is not, then the identification sentence is deleted.
6. audio recognition method as described in claim 1, which is characterized in that it is described according to the sentence confidence level to the knowledge The step of other sentence is ranked up include:
Size sequence is carried out to the identification sentence according to the sentence confidence level, and when the serial number that sorts is equal to default serial number, Stop sequence.
7. audio recognition method as described in claim 1, which is characterized in that it is described according to the target text feature sequentially with The step of identification sentence in the identification sequencing table is matched include:
When the target text feature is target text length, sequentially sorted with the identification according to the target text length The identification sentence in table carries out length matching;
When the target text feature is target text content, sequentially sorted with the identification according to the target text content The identification sentence in table carries out text matches.
8. a kind of speech recognition system, which is characterized in that the system comprises:
Voice obtains module, inputs finite state converter for obtaining voice to be identified, and by the voice to be identified, with To a plurality of identification sentence;
Confidence calculations module is set for carrying out confidence calculations to identification sentence described in every respectively with obtaining multiple sentences Reliability, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table;
Sentence output module is sequentially arranged with the identification for obtaining target text feature, and according to the target text feature The identification sentence in sequence table is matched, and when successful match, the current identification sentence is exported.
9. a kind of mobile terminal, which is characterized in that including storage equipment and processor, the storage equipment is calculated for storing Machine program, the processor runs the computer program so that the mobile terminal execution is according to claim 1 to any one of 7 The audio recognition method.
10. a kind of storage medium, which is characterized in that it is stored with calculating used in mobile terminal as claimed in claim 9 Machine program realizes the step of the described in any item audio recognition methods of claim 1 to 7 when the computer program is executed by processor Suddenly.
CN201910812864.6A 2019-08-30 2019-08-30 Audio recognition method, system, mobile terminal and storage medium Pending CN110503958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910812864.6A CN110503958A (en) 2019-08-30 2019-08-30 Audio recognition method, system, mobile terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910812864.6A CN110503958A (en) 2019-08-30 2019-08-30 Audio recognition method, system, mobile terminal and storage medium

Publications (1)

Publication Number Publication Date
CN110503958A true CN110503958A (en) 2019-11-26

Family

ID=68590573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910812864.6A Pending CN110503958A (en) 2019-08-30 2019-08-30 Audio recognition method, system, mobile terminal and storage medium

Country Status (1)

Country Link
CN (1) CN110503958A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627446A (en) * 2020-05-29 2020-09-04 国网浙江省电力有限公司信息通信分公司 Communication conference system based on intelligent voice recognition technology
CN112633201A (en) * 2020-12-29 2021-04-09 交通银行股份有限公司 Multi-mode in-vivo detection method and device, computer equipment and storage medium
CN117033612A (en) * 2023-08-18 2023-11-10 中航信移动科技有限公司 Text matching method, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067780A (en) * 2007-06-21 2007-11-07 腾讯科技(深圳)有限公司 Character inputting system and method for intelligent equipment
CN101802812A (en) * 2007-08-01 2010-08-11 金格软件有限公司 Automatic context sensitive language correction and enhancement using an internet corpus
CN106486126A (en) * 2016-12-19 2017-03-08 北京云知声信息技术有限公司 Speech recognition error correction method and device
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text
CN107564528A (en) * 2017-09-20 2018-01-09 深圳市空谷幽兰人工智能科技有限公司 A kind of speech recognition text and the method and apparatus of order word text matches
CN109065031A (en) * 2018-08-02 2018-12-21 阿里巴巴集团控股有限公司 Voice annotation method, device and equipment
CN109192194A (en) * 2018-08-22 2019-01-11 北京百度网讯科技有限公司 Voice data mask method, device, computer equipment and storage medium
CN109524008A (en) * 2018-11-16 2019-03-26 广东小天才科技有限公司 Voice recognition method, device and equipment
CN109801628A (en) * 2019-02-11 2019-05-24 龙马智芯(珠海横琴)科技有限公司 A kind of corpus collection method, apparatus and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067780A (en) * 2007-06-21 2007-11-07 腾讯科技(深圳)有限公司 Character inputting system and method for intelligent equipment
CN101802812A (en) * 2007-08-01 2010-08-11 金格软件有限公司 Automatic context sensitive language correction and enhancement using an internet corpus
CN106486126A (en) * 2016-12-19 2017-03-08 北京云知声信息技术有限公司 Speech recognition error correction method and device
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text
CN107564528A (en) * 2017-09-20 2018-01-09 深圳市空谷幽兰人工智能科技有限公司 A kind of speech recognition text and the method and apparatus of order word text matches
CN109065031A (en) * 2018-08-02 2018-12-21 阿里巴巴集团控股有限公司 Voice annotation method, device and equipment
CN109192194A (en) * 2018-08-22 2019-01-11 北京百度网讯科技有限公司 Voice data mask method, device, computer equipment and storage medium
CN109524008A (en) * 2018-11-16 2019-03-26 广东小天才科技有限公司 Voice recognition method, device and equipment
CN109801628A (en) * 2019-02-11 2019-05-24 龙马智芯(珠海横琴)科技有限公司 A kind of corpus collection method, apparatus and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627446A (en) * 2020-05-29 2020-09-04 国网浙江省电力有限公司信息通信分公司 Communication conference system based on intelligent voice recognition technology
CN112633201A (en) * 2020-12-29 2021-04-09 交通银行股份有限公司 Multi-mode in-vivo detection method and device, computer equipment and storage medium
CN117033612A (en) * 2023-08-18 2023-11-10 中航信移动科技有限公司 Text matching method, electronic equipment and storage medium
CN117033612B (en) * 2023-08-18 2024-06-04 中航信移动科技有限公司 Text matching method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US8666726B2 (en) Sample clustering to reduce manual transcriptions in speech recognition system
CN110110062B (en) Machine intelligent question and answer method and device and electronic equipment
US6208971B1 (en) Method and apparatus for command recognition using data-driven semantic inference
CN104252864B (en) Real-time voice analysis method and system
US8412530B2 (en) Method and apparatus for detection of sentiment in automated transcriptions
JP5377430B2 (en) Question answering database expansion device and question answering database expansion method
CN110503958A (en) Audio recognition method, system, mobile terminal and storage medium
CN110415679A (en) Voice error correction method, device, equipment and storage medium
CN107077843A (en) Session control and dialog control method
Howell et al. Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers
CN111833853A (en) Voice processing method and device, electronic equipment and computer readable storage medium
KR102024845B1 (en) Device and method for analyzing speech act
CN111883122A (en) Voice recognition method and device, storage medium and electronic equipment
CN111209363B (en) Corpus data processing method, corpus data processing device, server and storage medium
CN105845133A (en) Voice signal processing method and apparatus
Verbree et al. Dialogue-act tagging using smart feature selection; results on multiple corpora
CN110717021A (en) Input text and related device for obtaining artificial intelligence interview
CN110085217A (en) Phonetic navigation method, device and terminal device
Chakraborty et al. Knowledge-based framework for intelligent emotion recognition in spontaneous speech
López-Cózar et al. Enhancement of emotion detection in spoken dialogue systems by combining several information sources
CN110580897B (en) Audio verification method and device, storage medium and electronic equipment
Craighead et al. Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions
CN114627896A (en) Voice evaluation method, device, equipment and storage medium
KR102300427B1 (en) Learning word collection device for improving the recognition rate of speech recognizer and operating method thereof
Alshammri IoT‐Based Voice‐Controlled Smart Homes with Source Separation Based on Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191126