CN110503958A

CN110503958A - Audio recognition method, system, mobile terminal and storage medium

Info

Publication number: CN110503958A
Application number: CN201910812864.6A
Authority: CN
Inventors: 洪国强; 肖龙源; 李稀敏; 蔡振华; 刘晓葳; 王静
Original assignee: Xiamen Kuaishangtong Technology Co Ltd
Current assignee: Xiamen Kuaishangtong Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2019-11-26

Abstract

The present invention is suitable for technical field of voice recognition, provides a kind of audio recognition method, system, mobile terminal and storage medium, this method comprises: voice to be identified is inputted finite state converter, to obtain a plurality of identification sentence；Confidence calculations are carried out to every identification sentence respectively, to obtain multiple sentence confidence levels and be ranked up, to obtain identification sequencing table；Target text feature is obtained, and is sequentially matched with the identification sentence in identification sequencing table according to target text feature, when successful match, current identification sentence is exported.The present invention passes through the design based on confidence calculations, the sequence of reliability is carried out to identification sentence in a manner of reaching and use the sequence of sentence confidence level, and by carrying out matched design based on the other sentence of institute in target text feature and identification sequencing table, influence of the data to recognition result of the accent due to people, the noise of environment, training pattern is reduced, and then improves the accuracy and availability of speech recognition.

Description

Audio recognition method, system, mobile terminal and storage medium

Technical field

The invention belongs to technical field of voice recognition more particularly to a kind of audio recognition method, system, mobile terminal and deposit Storage media.

Background technique

With the development of speech recognition technology and the continuous improvement of discrimination, speech recognition technology is answered more and more For in each scene of daily life.Wherein, in vocal print field there are two identification, one is content of text Speaker Identification, The other is dynamic digital Speaker Identification, both have all used speech identifying function, and only voice content is correctly It will do it people's identification of further speaking, therefore, the accuracy of speech recognition is particularly important.

In existing audio recognition method, since the accent of people, the noise of environment, data of training pattern etc. influence, make The accuracy rate for obtaining speech recognition is low, reduces the availability of Application on Voiceprint Recognition.

Summary of the invention

The embodiment of the present invention is designed to provide a kind of audio recognition method, system, mobile terminal and storage medium, purport It is low carrying out identification accuracy caused by speech recognition using sound wave model in solving existing audio recognition method The problem of.

The embodiments of the present invention are implemented as follows, a kind of audio recognition method, which comprises

Voice to be identified is obtained, and the voice to be identified is inputted into finite state converter, to obtain a plurality of identification language Sentence；

Confidence calculations are carried out to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and according to described Sentence confidence level is ranked up the identification sentence, to obtain identification sequencing table；

Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the knowledge Other sentence is matched, and when successful match, the current identification sentence is exported.

Further, described the step of carrying out confidence calculations to identification sentence described in every respectively, includes:

Sequentially the syllable of the voice to be identified is obtained, to obtain multiple target syllables, each target sound Section is corresponding with the identification text in the correspondence identification sentence；

The target syllable is matched with the text database being locally pre-stored, it is described to obtain target text table The corresponding relationship being stored in target text table between multiple groups different literals and confidence value；

Sequentially by every identification sentence, the presently described corresponding identification text of target syllable and the target Text table is matched, to obtain objective degrees of confidence；

Sequentially the corresponding objective degrees of confidence of the identification texts all in every identification sentence is summed It calculates, to obtain multiple sentence confidence levels.

Further, described sequentially by the corresponding target of the identification texts all in every identification sentence After confidence level carries out the step of read group total, the method also includes:

Judge between the adjacent identification text with the presence or absence of incidence relation；

If so, obtaining associated confidence, and the associated confidence sums the correspondence objective degrees of confidence.

Further, include: with the presence or absence of the step of incidence relation between the adjacent identification text of the judgement

A group word is carried out to the adjacent identification text respectively, to obtain multiple identification phrases；

Locally pre-stored association phrase table is obtained, and the identification phrase is matched with the association phrase table；

When the identification phrase and the association phrase table successful match, the corresponding adjacent institute of the identification phrase is determined It states and identifies that there are incidence relations between text.

Further, before described the step of carrying out confidence calculations to identification sentence described in every respectively, the side Method further include:

Sequentially judge in every identification sentence with the presence or absence of default specified word；

If it is not, then the identification sentence is deleted.

Further, described the step of being ranked up according to the sentence confidence level to the identification sentence, includes:

Size sequence is carried out to the identification sentence according to the sentence confidence level, and when sequence serial number is equal to default serial number When, stop sequence.

Further, it is described according to the target text feature sequentially with it is described identification sequencing table in the identification language Sentence the step of being matched includes:

When the target text feature be target text length when, according to the target text length sequentially with the identification The identification sentence in sequencing table carries out length matching；

When the target text feature be target text content when, according to the target text content sequentially with the identification The identification sentence in sequencing table carries out text matches.

The another object of the embodiment of the present invention is to provide a kind of speech recognition system, the system comprises:

Voice obtains module, inputs finite state converter for obtaining voice to be identified, and by the voice to be identified, To obtain a plurality of identification sentence；

Confidence calculations module, for carrying out confidence calculations to identification sentence described in every respectively, to obtain multiple languages Sentence confidence level, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table；

Sentence output module, for obtaining target text feature, and according to the target text feature sequentially with the knowledge The identification sentence in other sequencing table is matched, and when successful match, the current identification sentence is exported.

The another object of the embodiment of the present invention is to provide a kind of mobile terminal, including storage equipment and processor, institute It states storage equipment and runs the computer program so that the mobile terminal execution for storing computer program, the processor Above-mentioned audio recognition method.

The another object of the embodiment of the present invention is to provide a kind of storage medium, is stored with institute in above-mentioned mobile terminal The step of computer program used, which realizes above-mentioned audio recognition method when being executed by processor.

The embodiment of the present invention, through the design based on confidence calculations, in a manner of reaching and use the sequence of sentence confidence level Carry out the sequence of reliability to identification sentence, and by based on described in the target text feature and the identification sequencing table Identify that sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to recognition result Influence, and then improve the accuracy and availability of speech recognition.

Detailed description of the invention

Fig. 1 is the flow chart for the audio recognition method that first embodiment of the invention provides；

Fig. 2 is the flow chart for the audio recognition method that second embodiment of the invention provides；

Fig. 3 is the flow chart for the audio recognition method that third embodiment of the invention provides；

Fig. 4 is the structural schematic diagram for the speech recognition system that fourth embodiment of the invention provides；

Fig. 5 is the structural schematic diagram for the mobile terminal that fifth embodiment of the invention provides.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.

Embodiment one

Referring to Fig. 1, be the flow chart for the audio recognition method that first embodiment of the invention provides, comprising steps of

Step S10 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain Item identifies sentence；

Wherein, voice to be identified can be obtained by the way of sound pick-up, i.e., ought receive adopting for the output of user When collection instruction, sound pick-up is opened to trigger the acquisition for carrying out voice to be identified by control, it is preferred that in the step, limited shape The model parameter of state converter can be selected independently according to the demand of user；

Specifically, the text between identification sentence can be identical or not identical, but sentence composed by text is respectively not It is identical, i.e., repeat statement is not present between all identification sentences；

Step S20 carries out confidence calculations to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and The identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table；

Wherein, which can be calculated based on the mode of preset model or preset algorithm, the sentence confidence Degree shows as the accuracy of corresponding identification sentence, i.e., when while statement confidence level is higher, then determines the accurate of corresponding identification sentence It spends higher；

Specifically, the sequence quantity in identification sequencing table can be configured according to user demand, in the step, retain 100 Item identifies sentence, i.e., stops sequence when identifying that sequence serial number reaches 100 in sequencing table；

Preferably, in the step, the screening of sentence confidence level can be also carried out in such a way that confidence threshold value is set, that is, is existed Before the step of being ranked up according to the sentence confidence level to the identification sentence, whether each sentence confidence level is sequentially judged Greater than confidence threshold value, and when determining sentence confidence level less than confidence threshold value, corresponding identification sentence is deleted, with Make to sort for corresponding identification sentence without identification, and then effectively raises the efficiency of follow-up data processing；

Step S30 obtains target text feature, and according to the target text feature sequentially and in the identification sequencing table The identification sentence matched, when successful match, the current identification sentence is exported；

Wherein, which includes content of text or text size, and text content is what user returned in advance Text and/or number are formed, and text length is therefore the word length of user preset limitation can use in the step Characters matching or the matched mode of length to carry out the matching between the target text feature and the identification sentence, and are worked as When determining successful match, then determine that the current identification sentence is most reliable speech recognition result；

The present embodiment, by the design based on confidence calculations, to knowledge in a manner of reaching and use the sequence of sentence confidence level Other sentence carries out the sequence of reliability, and by based on the identification in the target text feature and the identification sequencing table Sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to the shadow of recognition result It rings, and then improves the accuracy and availability of speech recognition.

Embodiment two

Referring to Fig. 2, be the flow chart for the audio recognition method that second embodiment of the invention provides, comprising steps of

Step S11 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain Item identifies sentence；

Step S21 sequentially obtains the syllable of the voice to be identified, to obtain multiple target syllables, Mei Gesuo It is corresponding with the identification text in the correspondence identification sentence to state target syllable；

Wherein it is possible to by using the corresponding acquisition for carrying out the target syllable of mode of acoustic wave analysis, such as get Target syllable be " wan ", " wo ", " ni " etc., and in the present embodiment, the target syllable got is sequentially opposite with identification text It answers；

Step S31 matches the target syllable, with the text database being locally pre-stored to obtain target text Table；

Wherein, the mapping relations being stored in the text database between different text tables and corresponding syllable therefore should By the matching based on target syllable and text database in step, corresponding target text table can accurately be obtained；

Step S41, sequentially by every identification sentence, the corresponding identification text of presently described target syllable with The target text table is matched, to obtain objective degrees of confidence；

Wherein, the corresponding relationship being stored in the target text table between multiple groups different literals and confidence value, therefore, In the step, by carrying out matched design based on the identification text and the target text table, to obtain the identification language The corresponding objective degrees of confidence of each syllable in sentence；

Step S51, sequentially by the corresponding objective degrees of confidence of the identification texts all in every identification sentence Read group total is carried out, to obtain multiple sentence confidence levels, and the identification sentence is carried out according to the sentence confidence level Sequence, to obtain identification sequencing table；

In the step, the sentence confidence level can also be calculated by the way of averaging, that is, pass through calculating pair It should identify in sentence after the sum of all described objective degrees of confidence, the word length based on the identification sentence is averaged, To obtain the sentence confidence level；

Specifically, in the step, the step of identification sentence is ranked up according to sentence confidence level packet It includes:

Size sequence is carried out to the identification sentence according to the sentence confidence level, and when sequence serial number is equal to default serial number When, stop sequence；For example, then stopping row when identifying that sequence serial number reaches 100 in sequencing table when this presets serial number 100 Sequence；

Step S61 obtains target text feature, and according to the target text feature sequentially and in the identification sequencing table The identification sentence matched, when successful match, the current identification sentence is exported；

Wherein, which includes content of text or text size, and text content is that user is pre-set Text and/or number are formed, and text length is therefore the word length of user preset limitation can use in the step Characters matching or the matched mode of length to carry out the matching between the target text feature and the identification sentence, and are worked as When determining successful match, then determine that the current identification sentence is most reliable speech recognition result；

Preferably, in the step, it is described according to the target text feature sequentially in the identification sequencing table described in Identify that the step of sentence is matched includes:

When the target text feature be target text content when, according to the target text content sequentially with the identification The identification sentence in sequencing table carries out text matches；

Embodiment three

Referring to Fig. 3, be the flow chart for the audio recognition method that third embodiment of the invention provides, comprising steps of

Step S12 obtains voice to be identified, and the voice to be identified is inputted finite state converter, more to obtain Item identifies sentence；

Step S22 sequentially judges in every identification sentence with the presence or absence of default specified word；

Wherein, which can carry out text setting according to user demand, and the default specified word can be with Mode for multiple texts or phrase is constituted, and in the step, is designed by the judgement based on the default specified word, with effective Determine whether the corresponding identification sentence is accurate；

When the judging result of step S22 is no, step S32 is executed；

Step S32 deletes the identification sentence；

When the judging result of step S22, which is, is, step S42 is executed；

Step S42 sequentially obtains the syllable of the voice to be identified, to obtain multiple target syllables, Mei Gesuo It is corresponding with the identification text in the correspondence identification sentence to state target syllable；

Step S52 matches the target syllable, with the text database being locally pre-stored to obtain target text Table；

Wherein, the corresponding relationship being stored in the target text table between multiple groups different literals and confidence value；

Step S62, sequentially by every identification sentence, the corresponding identification text of presently described target syllable with The target text table is matched, to obtain objective degrees of confidence；

Step S72 judges between the adjacent identification text with the presence or absence of incidence relation；

Specifically, in the step, it is described to judge to wrap the step of whether there is incidence relation between the adjacent identification text It includes:

When the identification phrase and the association phrase table successful match, the corresponding adjacent institute of the identification phrase is determined It states and identifies that there are incidence relations between text；

When the judging result of step S72, which is, is, step S82 is executed；

Step S82 obtains associated confidence, and the associated confidence seeks the correspondence objective degrees of confidence With；

When the judging result of step S72 is no, directly execution step S92；

Step S92, sequentially by the corresponding objective degrees of confidence of the identification texts all in every identification sentence Read group total is carried out, to obtain multiple sentence confidence levels, and the identification sentence is carried out according to the sentence confidence level Sequence, to obtain identification sequencing table；

Step S102, obtain target text feature, and according to the target text feature sequentially with the identification sequencing table In the identification sentence matched, when successful match, the current identification sentence is exported；

In the present embodiment, through the design based on confidence calculations, in a manner of reaching and use the sequence of sentence confidence level pair Identify that sentence carries out the sequence of reliability, and by based on the knowledge in the target text feature and the identification sequencing table Other sentence carries out matched design, reduce the accent due to people, the noise of environment, training pattern data to recognition result It influences, and then improves the accuracy and availability of speech recognition.

Example IV

Referring to Fig. 4, being the structural schematic diagram for the speech recognition system 100 that fourth embodiment of the invention provides, including language Sound obtains module 10, confidence calculations module 11 and sentence output module 12, in which:

Voice obtains module 10, converts for obtaining voice to be identified, and by the voice input finite state to be identified Device, to obtain a plurality of identification sentence.

Confidence calculations module 11, it is multiple to obtain for carrying out confidence calculations to identification sentence described in every respectively Sentence confidence level, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table.

Wherein, the confidence calculations module 11 is also used to: sequentially the syllable of the voice to be identified is obtained, with Multiple target syllables are obtained, each target syllable is corresponding with the identification text in the correspondence identification sentence；By institute It states target syllable to be matched with locally pre-stored text database, to obtain target text table, in the target text table The corresponding relationship being stored between multiple groups different literals and confidence value；It is presently described sequentially by every identification sentence The corresponding identification text of target syllable is matched with the target text table, to obtain objective degrees of confidence；It sequentially will be every The corresponding objective degrees of confidence progress read group total of all identification texts in identification sentence described in item, to obtain multiple institutes Predicate sentence confidence level.

Preferably, the confidence calculations module 11 is also used to: being judged between the adjacent identification text with the presence or absence of pass Connection relationship；If so, obtaining associated confidence, and the associated confidence sums the correspondence objective degrees of confidence.

Further, the confidence calculations module 11 is also used to: a group word is carried out to the adjacent identification text respectively, To obtain multiple identification phrases；Obtain locally pre-stored association phrase table, and by the identification phrase and the association phrase Table is matched；When the identification phrase and the association phrase table successful match, the corresponding phase of the identification phrase is determined There are incidence relations between the adjacent identification text.

In the present embodiment, the confidence calculations module 11 is also used to: according to the sentence confidence level to the identification language Sentence carries out size sequence, and when the serial number that sorts is equal to default serial number, stops sequence.

Sentence output module 12, for obtaining target text feature, and according to the target text feature sequentially with it is described The identification sentence in identification sequencing table is matched, and when successful match, the current identification sentence is exported.

Wherein, the sentence output module 12 is also used to: when the target text feature is target text length, according to The target text length sequentially carries out length matching with the identification sentence in the identification sequencing table；When the target text Eigen be target text content when, according to the target text content sequentially with it is described identification sequencing table in the identification language Sentence carries out text matches.

In addition, in the present embodiment, the speech recognition system 100 further include:

Sentence removing module 13, for sequentially judging in every identification sentence with the presence or absence of default specified word；If It is no, then the identification sentence is deleted.

Embodiment five

Referring to Fig. 5, being the mobile terminal 101 that fifth embodiment of the invention provides, including equipment and processor are stored, The storage equipment runs the computer program so that the mobile terminal for storing computer program, the processor 101 execute above-mentioned audio recognition method.

The present embodiment additionally provides a kind of storage medium, is stored thereon with calculating used in above-mentioned mobile terminal 101 Machine program, the program when being executed, include the following steps:

Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the knowledge Other sentence is matched, and when successful match, the current identification sentence is exported.The storage medium, such as: ROM/RAM, magnetic disk, CD etc..

It is apparent to those skilled in the art that for convenience and simplicity of description, only with above-mentioned each function The division progress of unit, module can according to need and for example, in practical application by above-mentioned function distribution by different function Energy unit or module are completed, i.e., the internal structure of storage device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, It can be each unit to physically exist alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.

It will be understood by those skilled in the art that composed structure shown in Fig. 4 is not constituted to speech recognition of the invention The restriction of system may include perhaps combining certain components or different component cloth than illustrating more or fewer components It sets, and the audio recognition method in Fig. 1-3 also uses more or fewer components shown in Fig. 4, or the certain components of combination, Or different component layout is realized.The so-called unit of the present invention, module etc. refer to that one kind can be known by the target voice The performed simultaneously function of processor (not shown) in other system enough completes the series of computation machine program of specific function, can deposit It is stored in the storage equipment (not shown) of the target voice identifying system.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of audio recognition method, which is characterized in that the described method includes:

Voice to be identified is obtained, and the voice to be identified is inputted into finite state converter, to obtain a plurality of identification sentence；

Confidence calculations are carried out to identification sentence described in every respectively, to obtain multiple sentence confidence levels, and according to the sentence Confidence level is ranked up the identification sentence, to obtain identification sequencing table；

Obtain target text feature, and according to the target text feature sequentially with it is described identification sequencing table in the identification language Sentence is matched, and when successful match, the current identification sentence is exported.

2. audio recognition method as described in claim 1, which is characterized in that described to be carried out respectively to identification sentence described in every The step of confidence calculations includes:

Sequentially the syllable of the voice to be identified is obtained, to obtain multiple target syllables, each target syllable is equal It is corresponding with the identification text in the correspondence identification sentence；

The target syllable is matched with the text database being locally pre-stored, to obtain target text table, the target The corresponding relationship being stored in text table between multiple groups different literals and confidence value；

The corresponding objective degrees of confidence of the identification texts all in every identification sentence is sequentially subjected to read group total, To obtain multiple sentence confidence levels.

3. audio recognition method as claimed in claim 2, which is characterized in that described sequentially by institute in every identification sentence After having the step of corresponding objective degrees of confidence of the identification text carries out read group total, the method also includes:

4. audio recognition method as claimed in claim 3, which is characterized in that be between the adjacent identification text of the judgement It is no that there are the step of incidence relation to include:

When the identification phrase and the association phrase table successful match, the corresponding adjacent knowledge of the identification phrase is determined There are incidence relations between other text.

5. audio recognition method as described in claim 1, which is characterized in that described to be carried out respectively to identification sentence described in every Before the step of confidence calculations, the method also includes:

If it is not, then the identification sentence is deleted.

6. audio recognition method as described in claim 1, which is characterized in that it is described according to the sentence confidence level to the knowledge The step of other sentence is ranked up include:

Size sequence is carried out to the identification sentence according to the sentence confidence level, and when the serial number that sorts is equal to default serial number, Stop sequence.

7. audio recognition method as described in claim 1, which is characterized in that it is described according to the target text feature sequentially with The step of identification sentence in the identification sequencing table is matched include:

When the target text feature is target text length, sequentially sorted with the identification according to the target text length The identification sentence in table carries out length matching；

When the target text feature is target text content, sequentially sorted with the identification according to the target text content The identification sentence in table carries out text matches.

8. a kind of speech recognition system, which is characterized in that the system comprises:

Voice obtains module, inputs finite state converter for obtaining voice to be identified, and by the voice to be identified, with To a plurality of identification sentence；

Confidence calculations module is set for carrying out confidence calculations to identification sentence described in every respectively with obtaining multiple sentences Reliability, and the identification sentence is ranked up according to the sentence confidence level, to obtain identification sequencing table；

Sentence output module is sequentially arranged with the identification for obtaining target text feature, and according to the target text feature The identification sentence in sequence table is matched, and when successful match, the current identification sentence is exported.

9. a kind of mobile terminal, which is characterized in that including storage equipment and processor, the storage equipment is calculated for storing Machine program, the processor runs the computer program so that the mobile terminal execution is according to claim 1 to any one of 7 The audio recognition method.

10. a kind of storage medium, which is characterized in that it is stored with calculating used in mobile terminal as claimed in claim 9 Machine program realizes the step of the described in any item audio recognition methods of claim 1 to 7 when the computer program is executed by processor Suddenly.