CN104112445A - Terminal and voice identification method - Google Patents
Terminal and voice identification method Download PDFInfo
- Publication number
- CN104112445A CN104112445A CN201410371025.2A CN201410371025A CN104112445A CN 104112445 A CN104112445 A CN 104112445A CN 201410371025 A CN201410371025 A CN 201410371025A CN 104112445 A CN104112445 A CN 104112445A
- Authority
- CN
- China
- Prior art keywords
- voice
- information
- recognition
- test
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000012360 testing method Methods 0.000 claims description 122
- 238000000605 extraction Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000012937 correction Methods 0.000 claims description 7
- 238000012986 modification Methods 0.000 claims description 4
- 230000004048 modification Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 7
- 238000012545 processing Methods 0.000 abstract description 4
- 230000000875 corresponding effect Effects 0.000 description 22
- 230000006399 behavior Effects 0.000 description 17
- 238000013101 initial test Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 235000002597 Solanum melongena Nutrition 0.000 description 3
- 244000061458 Solanum melongena Species 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 2
- 208000003028 Stuttering Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a terminal and a voice identification method. The terminal comprises a determination module used for determining an identification algorithm based on a preset rule; a first reception module used for receiving to-be-identified voice information input by a user; an identification module used for identifying the to-be-identified voice received by the first reception module based on the identification algorithm determined by the determination module. Through the technical scheme, the problem that voice input by the user cannot be identified and corresponding operations cannot be carried out as voice information as same as the voice information input by the user does not exist in a voice model library in the existing way can be solved. According to the identification algorithm determined based on the preset rule, the voice identification system can be customized by aiming at different users, so that the advantage of special treatment for special individual is achieved, the uniform processing way is avoided, certain re-learnability is achieved, the voice identification rate is increased extremely, and the user experience effect is improved.
Description
Technical Field
The present invention relates to the field of speech recognition, and in particular, to a terminal and a speech recognition method.
Background
For the recognition of the speech, feature parameters of the speech to be recognized are extracted, and then the speech is matched with a network or a local speech model library according to a certain recognition algorithm, so as to recognize the speech and execute corresponding action, however, this does not consider the case that the speech to be recognized does not exist in the speech model library. In addition, the characteristic parameters of the voice input to be recognized may be different from person to person, the same action may correspond to different characteristic parameters, and once the recognition algorithm is formed, the recognition algorithm cannot be changed, and for a specific individual, the voice recognition rate of the individual can be affected, so that special treatment on the specific individual cannot be achieved, and the user experience effect is greatly reduced.
Disclosure of Invention
The invention provides a terminal and a voice recognition method, which solve the problem that in the prior art, because the voice model library does not have the same voice information as the voice information input by a user, the voice input by the user cannot be recognized, and further corresponding operation is executed.
In order to solve the technical problems, the invention adopts the following technical scheme:
a terminal, comprising:
the determining module is used for determining a recognition algorithm according to a preset rule;
the first receiving module is used for receiving the voice information to be recognized input by a user;
and the recognition module is used for recognizing the voice information to be recognized received by the first receiving module according to the recognition algorithm determined by the determination module.
In an embodiment of the present invention, the apparatus further includes a second receiving module, a comparing module and a modifying module;
the first receiving module is also used for receiving voice test information input by a user;
the second receiving module is used for receiving voice target information set by a user;
the recognition module is further used for recognizing the voice test information received by the first receiving module according to a test algorithm to obtain voice recognition information;
the comparison module is used for comparing the voice recognition information obtained by the recognition module with the voice target information;
the correction module is used for correcting the test algorithm according to the difference if the difference between the voice recognition information and the voice target information is not within an error range;
the determining module is specifically configured to determine the test algorithm as the recognition algorithm if the difference between the speech recognition information and the speech target information is within an error range.
In an embodiment of the present invention, the identification module includes an extraction sub-module and a calculation sub-module;
the extraction submodule is used for extracting at least one voice test value of the voice test information;
the calculation submodule is used for calculating at least one voice test value extracted by the extraction submodule according to a test algorithm to obtain at least one voice recognition value, and the voice recognition value is used as voice recognition information.
In one embodiment of the present invention, the first and second electrodes are,
the extraction submodule is also used for extracting at least one voice target numerical value of the voice target information;
the calculation submodule is also used for calculating the average error of the at least one voice recognition value extracted by the extraction submodule and the at least one voice target value;
the comparison module is specifically configured to determine whether the average error calculated by the calculation submodule is within a certain range.
In an embodiment of the present invention, the modification module is specifically configured to modify a structure and parameters of the test algorithm according to the difference.
A speech recognition method comprising:
determining a recognition algorithm according to a preset rule;
receiving voice information to be recognized input by a user;
and recognizing the voice information to be recognized according to the recognition algorithm.
In an embodiment of the present invention, the determining, according to a preset rule, an identification algorithm specifically includes:
step A, receiving voice test information input by a user and voice target information set by the user;
b, recognizing the voice test information according to a test algorithm to obtain voice recognition information;
step C, comparing the voice recognition information with the voice target information, if the difference between the voice recognition information and the voice target information is within an error range, entering step E, otherwise, entering step D;
d, correcting the test algorithm according to the difference, and returning to the step B;
and E, determining the test algorithm as the identification algorithm.
In an embodiment of the present invention, recognizing the voice test information according to a test algorithm, and obtaining the voice recognition information specifically includes:
extracting at least one voice test value of the voice test information;
and calculating the at least one voice test value according to a test algorithm to obtain at least one voice recognition value, and taking the voice recognition value as voice recognition information.
In an embodiment of the present invention, comparing the voice recognition information with the voice target information specifically includes:
extracting at least one voice target value of the voice target information;
and calculating the average error of the at least one voice recognition value and the at least one voice target value, and judging whether the average error is in a certain range.
In an embodiment of the present invention, modifying the test algorithm according to the difference specifically includes:
and correcting the structure and parameters of the test algorithm according to the difference.
The invention has the beneficial effects that:
the invention provides a terminal and a voice recognition method, wherein a recognition algorithm is determined according to a preset rule, the recognition of voice information of a user is completed through the recognition algorithm, the recognition algorithm determined according to the preset rule can be used for customizing a voice recognition system aiming at different users, the recognition algorithm has the advantage of special individual treatment, the method of unified processing is avoided, certain re-learning is realized, the voice recognition rate is greatly increased, and the experience effect of the user is improved.
Drawings
FIG. 1 is a flow chart of a speech recognition method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a terminal according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a BP neural network algorithm according to an embodiment of the present invention.
Fig. 4 is a flowchart of a determination method of a recognition algorithm according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings.
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention, and as shown in fig. 1, the speech recognition method includes:
s101: determining a recognition algorithm according to a preset rule;
specifically, when the voice recognition function of the terminal is used, the terminal cannot recognize the voice of the user due to the inaccurate pronunciation of some users or the existence of accents and other problems, so that in order to increase the voice recognition rate and improve the experience effect of the user, a recognition algorithm for a special user is determined according to a preset rule, the recognition algorithm can recognize the voice information input by the special user, and corresponding behavior actions are completed.
In this embodiment, before determining the recognition algorithm according to the preset rule, it is determined whether the terminal is currently in the determination mode of the recognition algorithm, and for the determination mode of the recognition algorithm, the terminal may default to be in the determination mode in real time, or may start the determination mode according to the operation of the user.
If the terminal is not in the determined mode of the recognition algorithm, the voice information to be recognized input by the user is directly recognized, namely whether the voice information matched with the voice information to be recognized exists in the network or the local voice model library or not is determined, and therefore a corresponding recognition result is obtained.
If the terminal is currently in the determination mode of the identification algorithm, the terminal determines the identification algorithm according to a preset rule, which includes but is not limited to the following modes:
step A, receiving voice test information input by a user and voice target information set by the user;
specifically, when a user needs to "customize" a recognition algorithm for himself, it is necessary to input not only voice test information but also set voice target information to the terminal, and the terminal receives the voice test information input by the user and the voice target information set by the user, and for the voice target information, the setting manner of the user includes, but is not limited to, the following manners: and the terminal displays a target voice input interface, and a user actively inputs voice information in the voice model library, or the terminal displays the voice information in the voice model library in a chart form for the user to select. The voice model library can be a local voice model library or a voice model library called through a network, the voice model library comprises voice information of at least one user, each voice information has a corresponding action behavior, the action behavior corresponding to each voice information can be set by the user in a self-defined mode or a default mode of a system, and after the user inputs or selects voice target information, the terminal receives the voice target information.
B, identifying the voice test information according to a test algorithm to obtain voice identification information;
specifically, after the terminal receives the voice test information input by the user, the voice test information is processed, at least one voice test value of the voice test information is extracted to represent the voice test information input by the user, after the terminal obtains the at least one voice test value, the voice test information input by the user is recognized according to a test algorithm, namely, the at least one voice test value is calculated to obtain at least one voice recognition value, and the voice recognition value is used as the voice recognition information.
It should be noted that the test algorithm includes, but is not limited to, a BP neural network algorithm, and as shown in fig. 3, is a schematic structural diagram of the BP neural network algorithm, where { x1, x2, x 3.. the.. xN } is N extracted feature values of the voice test information input by the user, { o1, o2, o 3.. the oN } is N output values calculated by the network, the middle layer includes at least one hidden layer, nodes of each hidden layer are hidden layer nodes of the network, the number of nodes of the hidden layer may be randomly generated within a certain range, there is a connection weight Wij between every two nodes, and the connection weight Wij may be randomly generated within a certain range, such as between (-1, 1), and in this embodiment, the middle layer preferably includes one hidden layer.
1) Initializing, selecting a reasonable network structure, taking N characteristic values of voice test information of a user as N inputs of the network, setting all adjustable parameters Wij as uniformly distributed small random values, and randomly generating a certain number of nodes in a hidden layer.
2) The N inputs to the network are calculated as follows:
calculating in the forward direction: for the j cell of the l-th layer, <math><mrow>
<msubsup>
<mi>v</mi>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mi>l</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>p</mi>
</munderover>
<msubsup>
<mi>w</mi>
<mi>ji</mi>
<mi>l</mi>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msubsup>
<mi>y</mi>
<mi>i</mi>
<mrow>
<mo>(</mo>
<mi>l</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
</mrow></math>
wherein,is the output value of the cell i of the previous layer (l-1),is the output value of the cell j of the current layer (l), p-mode sample number;
if the function of the unit j in the input layer is sigmoid function, the output of each node in the input layer is
If neuron j belongs to the first hidden layer (l ═ 1), then there are
If neuron j belongs to the output layer (L ═ L), then there is, and
secondly, error feedback reverse calculation:
for the output unit: <math><mrow>
<msubsup>
<mi>δ</mi>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mi>L</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>e</mi>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mi>L</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>O</mi>
<mi>j</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>-</mo>
<msub>
<mi>O</mi>
<mi>j</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow></math>
for the hidden layer unit: <math><mrow>
<msubsup>
<mi>δ</mi>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mi>l</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>y</mi>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mi>l</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>[</mo>
<mn>1</mn>
<mo>-</mo>
<msubsup>
<mi>y</mi>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mi>l</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>]</mo>
<munder>
<mi>Σ</mi>
<mi>k</mi>
</munder>
<msubsup>
<mi>δ</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>l</mi>
<mo>+</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msubsup>
<mi>w</mi>
<mi>kj</mi>
<mrow>
<mo>(</mo>
<mi>l</mi>
<mo>+</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow></math>
reversely correcting the weight layer by layer according to the following formula: <math><mrow>
<msubsup>
<mi>w</mi>
<mi>ji</mi>
<mi>l</mi>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>+</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>w</mi>
<mi>ji</mi>
<mi>l</mi>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>η</mi>
<msubsup>
<mi>δ</mi>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mi>l</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msubsup>
<mi>y</mi>
<mi>i</mi>
<mrow>
<mi>l</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow></math>
andrespectively is the connection weight between two nodes corrected last time and the connection weight between two nodes corrected last time.
3) And carrying out the forward transmission and calibration of the network for the next time.
Step C, comparing the voice recognition information with the voice target information, if the difference between the voice recognition information and the voice target information is within an error range, entering the step E, otherwise, entering the step D;
specifically, after receiving the voice target information input by the user, the terminal processes the voice test information, and extracts at least one voice target value of the voice test information to represent the voice test information input by the user.
In this embodiment, the voice recognition information is compared with the voice target information, that is, the voice recognition value representing the voice recognition information and the voice target value representing the voice target information are compared, and whether the difference between the voice test recognition value and the voice target value is within an error range is determined, for example, an average error between at least one voice recognition value and at least one voice target value is calculated, whether the average error is within a certain error range is determined, if the difference is within the certain error range, step E is performed, otherwise step D is performed.
In this embodiment, the average error between the at least one speech recognition value and the at least one speech target value may be calculated by:
let the quadratic error function of the input-output mode of each sample p be defined as:
the average error cost function of the system is:
where p is the number of mode samples, l is the number of nodes output, dpkAnd OpkRespectively the target output and the actual output of the network, and iterate all the time according to the steps untilUntil the error meets the requirement.
D, correcting the test algorithm according to the difference, and returning to the step B;
specifically, when the difference between the speech recognition information and the speech target information is not within the error range, if the average error between the speech recognition value and the speech target value is not within a certain error range, the average error is fed back to the test algorithm, it should be noted that, as for the test algorithm in step B, it is an initial test algorithm, which can be actively input by the user, or pre-configured in the terminal, in step D, the initial test algorithm in step B is modified according to the difference, after the modification is completed, the test algorithm returns to step B, and then secondary recognition is performed on the initial speech test value or the speech recognition value after the last recognition according to the modified test algorithm to obtain a new speech recognition value, which is used as the speech recognition information, and the speech recognition information after the secondary recognition is compared with the speech target information, if the difference between the voice recognition information after the secondary recognition and the voice target information is within a certain error range, entering the step E; and if the difference between the voice recognition information after secondary recognition and the voice target information is not within a certain error range, the terminal performs secondary correction on the corrected test algorithm according to the difference until the difference between the obtained voice recognition information and the voice target information can stop feedback within a certain error range after the voice test information is recognized according to the corrected test algorithm, and then the step E is carried out.
In the above technical solution, the manner of correcting the test algorithm in step B according to the difference includes, but is not limited to, the following manners: optimizing the structure of the test algorithm and adjusting the parameters of the test algorithm;
the correction mode is based on a gradient descent method, and local convergence is possible to cause a large error of network output, so that a certain disturbance can be added to enable the algorithm to jump out of the local convergence in the parameter optimization process, which relates to the adjustment of a network structure, namely, the number of certain hidden network nodes can be increased or reduced to enable the network to jump out of the local optimum condition. And (C) aiming at the adjustment of the parameters, namely, adjusting the connection weight values among the nodes, and performing reverse calculation according to the error feedback in the step (B) so as to adjust the connection weight values.
And E, determining the testing algorithm as the recognition algorithm.
Specifically, if the difference between the voice recognition information obtained according to the initial test algorithm and the voice target information is within a certain error range, determining the initial test algorithm as the recognition algorithm; and if the difference between the voice recognition information obtained according to the initial test algorithm and the voice target information is not within a certain error range, correcting the initial test algorithm according to the difference until the difference between the voice recognition information obtained according to the corrected test algorithm and the voice target information is within a certain error range, and determining the finally corrected test algorithm as the recognition algorithm.
S102: receiving voice information to be recognized input by a user;
specifically, due to the problems of inaccurate pronunciation or accent of some users, after the terminal determines a recognition algorithm according to a preset rule, the recognition algorithm can recognize the voice information of a special user, for example, when the user determines the recognition algorithm, the input voice test information is dialect voice information, the determined recognition algorithm is a recognition algorithm for the dialect voice information of the user, and then, when the voice information to be recognized input by the user is the dialect voice information, the recognition algorithm determined according to the preset rule can recognize the voice information, so that corresponding behavior actions are completed. When a user needs to enable the terminal to complete corresponding action according to the voice information, the user needs to input the voice information to be recognized to the terminal, and the terminal receives the voice information to be recognized.
S103: and recognizing the voice information to be recognized according to a recognition algorithm.
Specifically, because the network or local speech model library includes the speech information of at least one user, and each speech information has a corresponding action behavior, after the terminal receives the speech information to be recognized input by the user, the speech information to be recognized input by the user is recognized according to the determined recognition algorithm, and when the terminal recognizes that the speech information to be recognized input by the user is the target speech information in the network or local speech model library, the terminal can recognize the action behavior corresponding to the speech information to be recognized according to the action behavior corresponding to the target speech information, thereby executing the recognized action behavior.
For example: at present, a terminal generally has a voice photographing function, and when a user says 'photograph' or 'eggplant' to the terminal, the terminal performs voice recognition and then judges whether the character is a corresponding character, so that the photographing function is executed. However, the user still cannot perform the photographing function after saying 'photograph' or 'eggplant' to the terminal due to inaccurate pronunciation or stuttering, and is embarrassed, at the moment, the user can start the voice calibration mode, and the photographing or 'eggplant' waiting recognition voice information provided by the user is calibrated according to the target voice information provided by the user, such as 'photograph', and the like, so that the user is identified to want to perform the photographing function, and thus, a set of voice recognition system belonging to the user is customized for the user, and the voice photographing function is realized.
Through the technical scheme, the voice recognition of 'custom-to-measure' can be carried out aiming at special users, the method has the advantage of 'special individual special treatment', the method of unified processing is avoided, certain re-learning performance is realized, the voice recognition rate is greatly increased, and the experience effect of the users is improved.
As shown in fig. 2, a schematic structural diagram of a terminal according to an embodiment of the present invention is shown in fig. 2, where the terminal includes:
the determining module 1 is used for determining a recognition algorithm according to a preset rule;
the first receiving module 2 is used for receiving the voice information to be recognized input by a user;
and the recognition module 3 is configured to recognize the speech information to be recognized, which is received by the first receiving module 2, according to the recognition algorithm determined by the determining module 1.
Specifically, when the speech recognition function of the terminal is used, the terminal cannot recognize the speech of the user due to the inaccurate pronunciation of some users or the existence of accents and other problems, so that, in order to increase the speech recognition rate and improve the experience effect of the user, the determining module 1 determines a recognition algorithm for a special user according to a preset rule, so that the recognition algorithm can recognize the speech information input by the special user, and thus, the corresponding behavior action is completed.
In this embodiment, before the determining module 1 determines the recognition algorithm according to the preset rule, the terminal determines whether the terminal is currently in the determination mode of the recognition algorithm, and for the determination mode of the recognition algorithm, the terminal may default to be in the determination mode in real time, or may start the determination mode according to the operation of the user.
If the terminal is not in the determined mode of the recognition algorithm, the voice information to be recognized input by the user is directly recognized, namely whether the voice information to be recognized exists in the network or the local voice model library is determined, so that a corresponding recognition result is obtained.
If the terminal is currently in the determination mode of the recognition algorithm, the terminal further comprises a second receiving module 4, a comparing module 5 and a correcting module 6;
when a user needs to customize a recognition algorithm for himself, not only the voice test information but also the set voice target information need to be input to the terminal, the first receiving module 2 receives the voice test information input by the user, the second receiving module 4 receives the voice target information set by the user, and the setting manner of the user includes, but is not limited to, the following manners: and the terminal displays a target voice input interface, and a user actively inputs voice information in the voice model library, or the terminal displays the voice information in the voice model library in a chart form for the user to select.
In the above technical solution, the recognition module 3 includes an extraction submodule 31 and a calculation submodule 32, when the first receiving module 2 receives the voice test information input by the user, the voice test information is processed, the extraction submodule 31 extracts at least one voice test value of the voice test information to represent the voice test information input by the user, when the extraction submodule 31 obtains the at least one voice test value, the calculation submodule 32 calculates the at least one voice test value according to a test algorithm to obtain the at least one voice recognition value, and uses the voice recognition value as the voice recognition information. When the second receiving module 4 receives the voice target information set by the user, the voice target information is processed, the extracting submodule 31 extracts at least one voice target value of the voice test information to represent the voice target information input by the user, the calculating submodule 32 calculates at least one voice target value extracted by the extracting submodule 31 according to a test algorithm to obtain at least one voice recognition value, and the voice recognition value is used as the voice recognition information.
In the above technical solution, the comparing module 5 compares the voice recognition information with the voice target information, that is, the comparing module 5 compares the voice test recognition value representing the voice recognition information with the voice target value representing the voice target information, and determines whether a difference between the voice test recognition value and the voice target value is within an error range, for example, an average error between at least one voice recognition value and at least one voice target value is calculated, and determines whether the average error is within a certain error range.
When the difference between the speech recognition information and the speech target information is not within the error range, if the average error between the speech recognition value and the speech target value is not within a certain error range, the average error is fed back to the test algorithm, it should be noted that, when the first receiving module 2 receives the speech test information and the second receiving module 4 receives the speech target information, the test algorithm starting to recognize the speech test information is the initial test algorithm, it should be noted that the initial test algorithm can be actively input by the user, if the initial test algorithm is actively input by the user, the second receiving module 4 receives the initial test algorithm input by the user, or can be configured in the terminal in advance, the correcting module 6 corrects the initial test algorithm according to the difference, and after the correction is completed, the recognition module 3 performs two operations on the initial speech test value or the speech test recognition value recognized last time according to the corrected test algorithm Performing secondary recognition to obtain a new voice recognition value, taking the new voice recognition value as voice recognition information, and comparing the voice recognition information subjected to secondary recognition with voice target information by using a comparison module 5; if the difference between the voice recognition information after the secondary recognition and the voice target information is not within a certain error range, the correction module 6 performs secondary correction on the corrected test algorithm according to the difference until the recognition module 3 recognizes the voice test information according to the corrected test algorithm, and the feedback can be stopped when the difference between the obtained voice recognition information and the voice target information is within a certain error range.
If the difference between the voice recognition information obtained according to the initial test algorithm and the voice target information is within a certain error range, the determining module 1 determines the initial test algorithm as the recognition algorithm; if the difference between the voice recognition information obtained according to the initial test algorithm and the voice target information is not within a certain error range, the initial test algorithm is corrected according to the difference until the difference between the voice recognition information obtained according to the corrected test algorithm and the voice target information is within a certain error range, and the determination module 1 determines the finally corrected test algorithm as the recognition algorithm.
When a user needs to make the terminal complete a corresponding action according to the voice information, the user needs to input the voice information to be recognized to the terminal, the first receiving module 2 receives the voice information to be recognized, and the voice information to be recognized is dialect voice information. Because the network or local voice model library comprises the voice information of at least one user and each voice information has a corresponding action behavior, after the first receiving module 2 receives the voice information to be recognized input by the user, the recognition module 3 recognizes the voice information to be recognized input by the user according to the recognition algorithm determined by the determination module 1, and when the recognition module 3 recognizes that the voice information to be recognized input by the user is the target voice information in the network or local voice model library, the terminal can recognize the action behavior corresponding to the voice information to be recognized according to the action behavior corresponding to the target voice information, so as to execute the recognized action behavior.
Fig. 3 is a flowchart of a method for determining a recognition algorithm according to an embodiment of the present invention, and as shown in fig. 3, the method includes:
s301: receiving voice information;
s302: extracting voice characteristic parameters;
s303: judging whether the terminal starts a calibration mode, if so, turning to S304, otherwise, turning to S309;
s304: receiving voice target information;
s305: extracting voice target parameters;
s306: recognizing voice test parameters according to a test algorithm;
s307: calculating a difference value;
s308: judging whether the difference is within an allowable range, if so, turning to S309, otherwise, correcting the test algorithm according to the difference, and returning to S306;
s309: recognizing voice information;
s310: an action behavior is performed.
In this embodiment, the test algorithm is modified based on the difference to determine the recognition algorithm.
The terminal receives voice information input by a user, processes the voice information, extracts at least one voice characteristic parameter of the voice information and is used for representing the voice information input by the user;
before determining the identification algorithm, judging whether a terminal starts a calibration mode at present, wherein the calibration mode indicates whether the terminal needs to determine the identification algorithm according to a preset rule;
if the terminal does not start the calibration mode, the voice information input by the user is directly identified, and the identification process comprises the following two modes:
(1) according to the existing recognition mode, recognizing the voice information, namely determining whether a voice characteristic parameter matched with the voice characteristic parameter of the voice information exists in a voice model library, and if so, executing a corresponding action;
(2) and recognizing the voice information according to a recognition algorithm determined by the terminal according to the preset rule last time, so that the recognition result of the voice information is matched with the voice information in the voice model library, and the action behavior desired by the user is executed.
If the terminal starts the calibration mode, receiving voice target information set by a user, processing the voice target information, extracting at least one voice target parameter of the voice target information, representing the voice target information set by the user, identifying the voice characteristic parameter according to a test algorithm, calculating a difference value between the identified voice characteristic parameter and the voice target parameter, judging whether the difference value is within an allowable range, if so, determining an initial test algorithm as a recognition algorithm, identifying the voice information input by the user as the voice target information through the recognition algorithm, and executing a corresponding action, otherwise, feeding the difference value back to the test algorithm, modifying the test algorithm according to the difference value, optimizing the structure of the test algorithm, adjusting the parameters of the test algorithm, after the modification is completed, performing secondary identification on the voice characteristic parameter or the voice characteristic parameter after the last identification, and calculating the difference value between the voice characteristic parameter after secondary recognition and the voice target parameter, if the difference value is within an allowable range, determining the corrected test algorithm as the recognition algorithm, recognizing the voice information input by the user as the voice target information through the corrected recognition algorithm, executing corresponding action behaviors, if the difference value is not within the allowable range, secondarily correcting the corrected test algorithm according to the difference value until the difference value between the voice characteristic parameter and the voice target parameter can stop feedback within a certain error range after the voice characteristic parameter is recognized according to the corrected test algorithm, determining the finally corrected test algorithm as the recognition algorithm, recognizing the voice information input by the user as the voice target information through the finally corrected recognition algorithm, and executing the corresponding action behaviors.
Through the technical scheme, when the difference value between the voice information input by the user and the voice target information is within the allowable range, the terminal determines a set of unique recognition algorithm for the user according to the preset rule, and the user improves the voice recognition rate of the user and enhances the experience effect of the user through the recognition algorithm.
The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (10)
1. A terminal, comprising:
the determining module is used for determining a recognition algorithm according to a preset rule;
the first receiving module is used for receiving the voice information to be recognized input by a user;
and the recognition module is used for recognizing the voice information to be recognized received by the first receiving module according to the recognition algorithm determined by the determination module.
2. The terminal of claim 1, further comprising a second receiving module, a comparing module, and a modifying module;
the first receiving module is also used for receiving voice test information input by a user;
the second receiving module is used for receiving voice target information set by a user;
the recognition module is further used for recognizing the voice test information received by the first receiving module according to a test algorithm to obtain voice recognition information;
the comparison module is used for comparing the voice recognition information obtained by the recognition module with the voice target information;
the correction module is used for correcting the test algorithm according to the difference if the difference between the voice recognition information and the voice target information is not within an error range;
the determining module is specifically configured to determine the test algorithm as the recognition algorithm if the difference between the speech recognition information and the speech target information is within an error range.
3. The terminal of claim 2, wherein the identification module comprises an extraction sub-module, a calculation sub-module;
the extraction submodule is used for extracting at least one voice test value of the voice test information;
the calculation submodule is used for calculating at least one voice test value extracted by the extraction submodule according to a test algorithm to obtain at least one voice recognition value, and the voice recognition value is used as voice recognition information.
4. The terminal of claim 3,
the extraction submodule is also used for extracting at least one voice target numerical value of the voice target information;
the calculation submodule is also used for calculating the average error of the at least one voice recognition value extracted by the extraction submodule and the at least one voice target value;
the comparison module is specifically configured to determine whether the average error calculated by the calculation submodule is within a certain range.
5. The terminal according to any of claims 2 to 4, wherein the modification module is specifically configured to modify the structure and parameters of the test algorithm according to the difference.
6. A speech recognition method, comprising:
determining a recognition algorithm according to a preset rule;
receiving voice information to be recognized input by a user;
and recognizing the voice information to be recognized according to the recognition algorithm.
7. The speech recognition method according to claim 6, wherein the determining a recognition algorithm according to the preset rule specifically comprises:
step A, receiving voice test information input by a user and voice target information set by the user;
b, recognizing the voice test information according to a test algorithm to obtain voice recognition information;
step C, comparing the voice recognition information with the voice target information, if the difference between the voice recognition information and the voice target information is within an error range, entering step E, otherwise, entering step D;
d, correcting the test algorithm according to the difference, and returning to the step B;
and E, determining the test algorithm as the identification algorithm.
8. The speech recognition method according to claim 7, wherein recognizing the speech test information according to a test algorithm to obtain the speech recognition information specifically comprises:
extracting at least one voice test value of the voice test information;
and calculating the at least one voice test value according to a test algorithm to obtain at least one voice recognition value, and taking the voice recognition value as voice recognition information.
9. The speech recognition method of claim 8, wherein comparing the speech recognition information with the speech target information specifically comprises:
extracting at least one voice target value of the voice target information;
and calculating the average error of the at least one voice recognition value and the at least one voice target value, and judging whether the average error is in a certain range.
10. The speech recognition method of any one of claims 7-9, wherein modifying the test algorithm based on the difference specifically comprises:
and correcting the structure and parameters of the test algorithm according to the difference.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410371025.2A CN104112445A (en) | 2014-07-30 | 2014-07-30 | Terminal and voice identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410371025.2A CN104112445A (en) | 2014-07-30 | 2014-07-30 | Terminal and voice identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104112445A true CN104112445A (en) | 2014-10-22 |
Family
ID=51709206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410371025.2A Pending CN104112445A (en) | 2014-07-30 | 2014-07-30 | Terminal and voice identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104112445A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106057194A (en) * | 2016-06-25 | 2016-10-26 | 浙江合众新能源汽车有限公司 | Voice interaction system |
CN106328124A (en) * | 2016-08-24 | 2017-01-11 | 安徽咪鼠科技有限公司 | Voice recognition method based on user behavior characteristics |
CN106611599A (en) * | 2015-10-21 | 2017-05-03 | 展讯通信(上海)有限公司 | Voice recognition method and device based on artificial neural network and electronic equipment |
CN106997767A (en) * | 2017-03-24 | 2017-08-01 | 百度在线网络技术(北京)有限公司 | Method of speech processing and device based on artificial intelligence |
CN107170454A (en) * | 2017-05-31 | 2017-09-15 | 广东欧珀移动通信有限公司 | Audio recognition method and Related product |
CN107301862A (en) * | 2016-04-01 | 2017-10-27 | 北京搜狗科技发展有限公司 | A kind of audio recognition method, identification model method for building up, device and electronic equipment |
CN108597500A (en) * | 2018-03-30 | 2018-09-28 | 四川斐讯信息技术有限公司 | A kind of intelligent wearable device and the audio recognition method based on intelligent wearable device |
CN110379418A (en) * | 2019-06-28 | 2019-10-25 | 西安交通大学 | A kind of voice confrontation sample generating method |
CN111968620A (en) * | 2019-05-20 | 2020-11-20 | 北京声智科技有限公司 | Algorithm testing method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102034472A (en) * | 2009-09-28 | 2011-04-27 | 戴红霞 | Speaker recognition method based on Gaussian mixture model embedded with time delay neural network |
US20110301953A1 (en) * | 2010-06-07 | 2011-12-08 | Seoby Electronic Co., Ltd | System and method of multi model adaptation and voice recognition |
CN102800316A (en) * | 2012-08-30 | 2012-11-28 | 重庆大学 | Optimal codebook design method for voiceprint recognition system based on nerve network |
CN103514879A (en) * | 2013-09-18 | 2014-01-15 | 广东欧珀移动通信有限公司 | Local voice recognition method based on BP neural network |
-
2014
- 2014-07-30 CN CN201410371025.2A patent/CN104112445A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102034472A (en) * | 2009-09-28 | 2011-04-27 | 戴红霞 | Speaker recognition method based on Gaussian mixture model embedded with time delay neural network |
US20110301953A1 (en) * | 2010-06-07 | 2011-12-08 | Seoby Electronic Co., Ltd | System and method of multi model adaptation and voice recognition |
CN102800316A (en) * | 2012-08-30 | 2012-11-28 | 重庆大学 | Optimal codebook design method for voiceprint recognition system based on nerve network |
CN103514879A (en) * | 2013-09-18 | 2014-01-15 | 广东欧珀移动通信有限公司 | Local voice recognition method based on BP neural network |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106611599A (en) * | 2015-10-21 | 2017-05-03 | 展讯通信(上海)有限公司 | Voice recognition method and device based on artificial neural network and electronic equipment |
CN107301862A (en) * | 2016-04-01 | 2017-10-27 | 北京搜狗科技发展有限公司 | A kind of audio recognition method, identification model method for building up, device and electronic equipment |
CN106057194A (en) * | 2016-06-25 | 2016-10-26 | 浙江合众新能源汽车有限公司 | Voice interaction system |
CN106328124A (en) * | 2016-08-24 | 2017-01-11 | 安徽咪鼠科技有限公司 | Voice recognition method based on user behavior characteristics |
CN106997767A (en) * | 2017-03-24 | 2017-08-01 | 百度在线网络技术(北京)有限公司 | Method of speech processing and device based on artificial intelligence |
CN107170454A (en) * | 2017-05-31 | 2017-09-15 | 广东欧珀移动通信有限公司 | Audio recognition method and Related product |
CN107170454B (en) * | 2017-05-31 | 2022-04-05 | Oppo广东移动通信有限公司 | Speech recognition method and related product |
CN108597500A (en) * | 2018-03-30 | 2018-09-28 | 四川斐讯信息技术有限公司 | A kind of intelligent wearable device and the audio recognition method based on intelligent wearable device |
CN111968620A (en) * | 2019-05-20 | 2020-11-20 | 北京声智科技有限公司 | Algorithm testing method and device, electronic equipment and storage medium |
CN111968620B (en) * | 2019-05-20 | 2024-05-28 | 北京声智科技有限公司 | Algorithm testing method and device, electronic equipment and storage medium |
CN110379418A (en) * | 2019-06-28 | 2019-10-25 | 西安交通大学 | A kind of voice confrontation sample generating method |
CN110379418B (en) * | 2019-06-28 | 2021-08-13 | 西安交通大学 | Voice confrontation sample generation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104112445A (en) | Terminal and voice identification method | |
US5212821A (en) | Machine-based learning system | |
EP3971786B1 (en) | Feedforward generative neural networks | |
US9400955B2 (en) | Reducing dynamic range of low-rank decomposition matrices | |
EP3301675B1 (en) | Parameter prediction device and parameter prediction method for acoustic signal processing | |
CN110992934A (en) | Defense method and defense device for black box attack model of voice recognition system | |
KR20190113952A (en) | Batch renormalization layer | |
CN109947940B (en) | Text classification method, device, terminal and storage medium | |
JP2020061173A (en) | Answer learning device, answer learning method, answer generating device, answer generating method, and program | |
CN110277090B (en) | Self-adaptive correction method and system for pronunciation dictionary model of user person | |
CN111191450A (en) | Corpus cleaning method, corpus entry device and computer-readable storage medium | |
WO2012094014A1 (en) | Automatic updating of confidence scoring functionality for speech recognition systems | |
CN110503149B (en) | Method and system for classifying local features in image | |
US20170270909A1 (en) | Method for correcting false recognition contained in recognition result of speech of user | |
CN110362814A (en) | A kind of name entity recognition method and device based on improvement loss function | |
CN113222883A (en) | Apparatus and method for handling exception detection | |
CN115329744B (en) | Natural language processing method, system, equipment and storage medium | |
CN111260056A (en) | Network model distillation method and device | |
CN106384587A (en) | Voice recognition method and system thereof | |
CN106776335B (en) | Test case clustering analysis method and system | |
CN107066560B (en) | Text classification method and device | |
CN112214592A (en) | Reply dialogue scoring model training method, dialogue reply method and device | |
CN108288074A (en) | A kind of selection method and system of data characteristics | |
CN111696555A (en) | Method and system for confirming awakening words | |
WO2020134829A1 (en) | Neural network batch normalization optimization method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20141022 |
|
RJ01 | Rejection of invention patent application after publication |