CN104112445A - Terminal and voice identification method - Google Patents

Terminal and voice identification method Download PDF

Info

Publication number
CN104112445A
CN104112445A CN201410371025.2A CN201410371025A CN104112445A CN 104112445 A CN104112445 A CN 104112445A CN 201410371025 A CN201410371025 A CN 201410371025A CN 104112445 A CN104112445 A CN 104112445A
Authority
CN
China
Prior art keywords
voice
information
recognition
test
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410371025.2A
Other languages
Chinese (zh)
Inventor
张军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Original Assignee
Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yulong Computer Telecommunication Scientific Shenzhen Co Ltd filed Critical Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Priority to CN201410371025.2A priority Critical patent/CN104112445A/en
Publication of CN104112445A publication Critical patent/CN104112445A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a terminal and a voice identification method. The terminal comprises a determination module used for determining an identification algorithm based on a preset rule; a first reception module used for receiving to-be-identified voice information input by a user; an identification module used for identifying the to-be-identified voice received by the first reception module based on the identification algorithm determined by the determination module. Through the technical scheme, the problem that voice input by the user cannot be identified and corresponding operations cannot be carried out as voice information as same as the voice information input by the user does not exist in a voice model library in the existing way can be solved. According to the identification algorithm determined based on the preset rule, the voice identification system can be customized by aiming at different users, so that the advantage of special treatment for special individual is achieved, the uniform processing way is avoided, certain re-learnability is achieved, the voice identification rate is increased extremely, and the user experience effect is improved.

Description

Terminal and voice recognition method
Technical Field
The present invention relates to the field of speech recognition, and in particular, to a terminal and a speech recognition method.
Background
For the recognition of the speech, feature parameters of the speech to be recognized are extracted, and then the speech is matched with a network or a local speech model library according to a certain recognition algorithm, so as to recognize the speech and execute corresponding action, however, this does not consider the case that the speech to be recognized does not exist in the speech model library. In addition, the characteristic parameters of the voice input to be recognized may be different from person to person, the same action may correspond to different characteristic parameters, and once the recognition algorithm is formed, the recognition algorithm cannot be changed, and for a specific individual, the voice recognition rate of the individual can be affected, so that special treatment on the specific individual cannot be achieved, and the user experience effect is greatly reduced.
Disclosure of Invention
The invention provides a terminal and a voice recognition method, which solve the problem that in the prior art, because the voice model library does not have the same voice information as the voice information input by a user, the voice input by the user cannot be recognized, and further corresponding operation is executed.
In order to solve the technical problems, the invention adopts the following technical scheme:
a terminal, comprising:
the determining module is used for determining a recognition algorithm according to a preset rule;
the first receiving module is used for receiving the voice information to be recognized input by a user;
and the recognition module is used for recognizing the voice information to be recognized received by the first receiving module according to the recognition algorithm determined by the determination module.
In an embodiment of the present invention, the apparatus further includes a second receiving module, a comparing module and a modifying module;
the first receiving module is also used for receiving voice test information input by a user;
the second receiving module is used for receiving voice target information set by a user;
the recognition module is further used for recognizing the voice test information received by the first receiving module according to a test algorithm to obtain voice recognition information;
the comparison module is used for comparing the voice recognition information obtained by the recognition module with the voice target information;
the correction module is used for correcting the test algorithm according to the difference if the difference between the voice recognition information and the voice target information is not within an error range;
the determining module is specifically configured to determine the test algorithm as the recognition algorithm if the difference between the speech recognition information and the speech target information is within an error range.
In an embodiment of the present invention, the identification module includes an extraction sub-module and a calculation sub-module;
the extraction submodule is used for extracting at least one voice test value of the voice test information;
the calculation submodule is used for calculating at least one voice test value extracted by the extraction submodule according to a test algorithm to obtain at least one voice recognition value, and the voice recognition value is used as voice recognition information.
In one embodiment of the present invention, the first and second electrodes are,
the extraction submodule is also used for extracting at least one voice target numerical value of the voice target information;
the calculation submodule is also used for calculating the average error of the at least one voice recognition value extracted by the extraction submodule and the at least one voice target value;
the comparison module is specifically configured to determine whether the average error calculated by the calculation submodule is within a certain range.
In an embodiment of the present invention, the modification module is specifically configured to modify a structure and parameters of the test algorithm according to the difference.
A speech recognition method comprising:
determining a recognition algorithm according to a preset rule;
receiving voice information to be recognized input by a user;
and recognizing the voice information to be recognized according to the recognition algorithm.
In an embodiment of the present invention, the determining, according to a preset rule, an identification algorithm specifically includes:
step A, receiving voice test information input by a user and voice target information set by the user;
b, recognizing the voice test information according to a test algorithm to obtain voice recognition information;
step C, comparing the voice recognition information with the voice target information, if the difference between the voice recognition information and the voice target information is within an error range, entering step E, otherwise, entering step D;
d, correcting the test algorithm according to the difference, and returning to the step B;
and E, determining the test algorithm as the identification algorithm.
In an embodiment of the present invention, recognizing the voice test information according to a test algorithm, and obtaining the voice recognition information specifically includes:
extracting at least one voice test value of the voice test information;
and calculating the at least one voice test value according to a test algorithm to obtain at least one voice recognition value, and taking the voice recognition value as voice recognition information.
In an embodiment of the present invention, comparing the voice recognition information with the voice target information specifically includes:
extracting at least one voice target value of the voice target information;
and calculating the average error of the at least one voice recognition value and the at least one voice target value, and judging whether the average error is in a certain range.
In an embodiment of the present invention, modifying the test algorithm according to the difference specifically includes:
and correcting the structure and parameters of the test algorithm according to the difference.
The invention has the beneficial effects that:
the invention provides a terminal and a voice recognition method, wherein a recognition algorithm is determined according to a preset rule, the recognition of voice information of a user is completed through the recognition algorithm, the recognition algorithm determined according to the preset rule can be used for customizing a voice recognition system aiming at different users, the recognition algorithm has the advantage of special individual treatment, the method of unified processing is avoided, certain re-learning is realized, the voice recognition rate is greatly increased, and the experience effect of the user is improved.
Drawings
FIG. 1 is a flow chart of a speech recognition method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a terminal according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a BP neural network algorithm according to an embodiment of the present invention.
Fig. 4 is a flowchart of a determination method of a recognition algorithm according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings.
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention, and as shown in fig. 1, the speech recognition method includes:
s101: determining a recognition algorithm according to a preset rule;
specifically, when the voice recognition function of the terminal is used, the terminal cannot recognize the voice of the user due to the inaccurate pronunciation of some users or the existence of accents and other problems, so that in order to increase the voice recognition rate and improve the experience effect of the user, a recognition algorithm for a special user is determined according to a preset rule, the recognition algorithm can recognize the voice information input by the special user, and corresponding behavior actions are completed.
In this embodiment, before determining the recognition algorithm according to the preset rule, it is determined whether the terminal is currently in the determination mode of the recognition algorithm, and for the determination mode of the recognition algorithm, the terminal may default to be in the determination mode in real time, or may start the determination mode according to the operation of the user.
If the terminal is not in the determined mode of the recognition algorithm, the voice information to be recognized input by the user is directly recognized, namely whether the voice information matched with the voice information to be recognized exists in the network or the local voice model library or not is determined, and therefore a corresponding recognition result is obtained.
If the terminal is currently in the determination mode of the identification algorithm, the terminal determines the identification algorithm according to a preset rule, which includes but is not limited to the following modes:
step A, receiving voice test information input by a user and voice target information set by the user;
specifically, when a user needs to "customize" a recognition algorithm for himself, it is necessary to input not only voice test information but also set voice target information to the terminal, and the terminal receives the voice test information input by the user and the voice target information set by the user, and for the voice target information, the setting manner of the user includes, but is not limited to, the following manners: and the terminal displays a target voice input interface, and a user actively inputs voice information in the voice model library, or the terminal displays the voice information in the voice model library in a chart form for the user to select. The voice model library can be a local voice model library or a voice model library called through a network, the voice model library comprises voice information of at least one user, each voice information has a corresponding action behavior, the action behavior corresponding to each voice information can be set by the user in a self-defined mode or a default mode of a system, and after the user inputs or selects voice target information, the terminal receives the voice target information.
B, identifying the voice test information according to a test algorithm to obtain voice identification information;
specifically, after the terminal receives the voice test information input by the user, the voice test information is processed, at least one voice test value of the voice test information is extracted to represent the voice test information input by the user, after the terminal obtains the at least one voice test value, the voice test information input by the user is recognized according to a test algorithm, namely, the at least one voice test value is calculated to obtain at least one voice recognition value, and the voice recognition value is used as the voice recognition information.
It should be noted that the test algorithm includes, but is not limited to, a BP neural network algorithm, and as shown in fig. 3, is a schematic structural diagram of the BP neural network algorithm, where { x1, x2, x 3.. the.. xN } is N extracted feature values of the voice test information input by the user, { o1, o2, o 3.. the oN } is N output values calculated by the network, the middle layer includes at least one hidden layer, nodes of each hidden layer are hidden layer nodes of the network, the number of nodes of the hidden layer may be randomly generated within a certain range, there is a connection weight Wij between every two nodes, and the connection weight Wij may be randomly generated within a certain range, such as between (-1, 1), and in this embodiment, the middle layer preferably includes one hidden layer.
1) Initializing, selecting a reasonable network structure, taking N characteristic values of voice test information of a user as N inputs of the network, setting all adjustable parameters Wij as uniformly distributed small random values, and randomly generating a certain number of nodes in a hidden layer.
2) The N inputs to the network are calculated as follows:
calculating in the forward direction: for the j cell of the l-th layer, <math><mrow> <msubsup> <mi>v</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>p</mi> </munderover> <msubsup> <mi>w</mi> <mi>ji</mi> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msubsup> <mi>y</mi> <mi>i</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow></math>
wherein,is the output value of the cell i of the previous layer (l-1),is the output value of the cell j of the current layer (l), p-mode sample number;
if the function of the unit j in the input layer is sigmoid function, the output of each node in the input layer is y j ( l ) ( n ) = 1 1 + exp ( - v j l ( n ) ) ;
If neuron j belongs to the first hidden layer (l ═ 1), then there are
If neuron j belongs to the output layer (L ═ L), then there is, and
secondly, error feedback reverse calculation:
for the output unit: <math><mrow> <msubsup> <mi>&delta;</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>L</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msubsup> <mi>e</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>L</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msub> <mi>O</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>O</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>;</mo> </mrow></math>
for the hidden layer unit: <math><mrow> <msubsup> <mi>&delta;</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msubsup> <mi>y</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>[</mo> <mn>1</mn> <mo>-</mo> <msubsup> <mi>y</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>]</mo> <munder> <mi>&Sigma;</mi> <mi>k</mi> </munder> <msubsup> <mi>&delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msubsup> <mi>w</mi> <mi>kj</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow></math>
reversely correcting the weight layer by layer according to the following formula: <math><mrow> <msubsup> <mi>w</mi> <mi>ji</mi> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <msubsup> <mi>w</mi> <mi>ji</mi> <mi>l</mi> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&eta;</mi> <msubsup> <mi>&delta;</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msubsup> <mi>y</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow></math>
andrespectively is the connection weight between two nodes corrected last time and the connection weight between two nodes corrected last time.
3) And carrying out the forward transmission and calibration of the network for the next time.
Step C, comparing the voice recognition information with the voice target information, if the difference between the voice recognition information and the voice target information is within an error range, entering the step E, otherwise, entering the step D;
specifically, after receiving the voice target information input by the user, the terminal processes the voice test information, and extracts at least one voice target value of the voice test information to represent the voice test information input by the user.
In this embodiment, the voice recognition information is compared with the voice target information, that is, the voice recognition value representing the voice recognition information and the voice target value representing the voice target information are compared, and whether the difference between the voice test recognition value and the voice target value is within an error range is determined, for example, an average error between at least one voice recognition value and at least one voice target value is calculated, whether the average error is within a certain error range is determined, if the difference is within the certain error range, step E is performed, otherwise step D is performed.
In this embodiment, the average error between the at least one speech recognition value and the at least one speech target value may be calculated by:
let the quadratic error function of the input-output mode of each sample p be defined as:
the average error cost function of the system is:
where p is the number of mode samples, l is the number of nodes output, dpkAnd OpkRespectively the target output and the actual output of the network, and iterate all the time according to the steps untilUntil the error meets the requirement.
D, correcting the test algorithm according to the difference, and returning to the step B;
specifically, when the difference between the speech recognition information and the speech target information is not within the error range, if the average error between the speech recognition value and the speech target value is not within a certain error range, the average error is fed back to the test algorithm, it should be noted that, as for the test algorithm in step B, it is an initial test algorithm, which can be actively input by the user, or pre-configured in the terminal, in step D, the initial test algorithm in step B is modified according to the difference, after the modification is completed, the test algorithm returns to step B, and then secondary recognition is performed on the initial speech test value or the speech recognition value after the last recognition according to the modified test algorithm to obtain a new speech recognition value, which is used as the speech recognition information, and the speech recognition information after the secondary recognition is compared with the speech target information, if the difference between the voice recognition information after the secondary recognition and the voice target information is within a certain error range, entering the step E; and if the difference between the voice recognition information after secondary recognition and the voice target information is not within a certain error range, the terminal performs secondary correction on the corrected test algorithm according to the difference until the difference between the obtained voice recognition information and the voice target information can stop feedback within a certain error range after the voice test information is recognized according to the corrected test algorithm, and then the step E is carried out.
In the above technical solution, the manner of correcting the test algorithm in step B according to the difference includes, but is not limited to, the following manners: optimizing the structure of the test algorithm and adjusting the parameters of the test algorithm;
the correction mode is based on a gradient descent method, and local convergence is possible to cause a large error of network output, so that a certain disturbance can be added to enable the algorithm to jump out of the local convergence in the parameter optimization process, which relates to the adjustment of a network structure, namely, the number of certain hidden network nodes can be increased or reduced to enable the network to jump out of the local optimum condition. And (C) aiming at the adjustment of the parameters, namely, adjusting the connection weight values among the nodes, and performing reverse calculation according to the error feedback in the step (B) so as to adjust the connection weight values.
And E, determining the testing algorithm as the recognition algorithm.
Specifically, if the difference between the voice recognition information obtained according to the initial test algorithm and the voice target information is within a certain error range, determining the initial test algorithm as the recognition algorithm; and if the difference between the voice recognition information obtained according to the initial test algorithm and the voice target information is not within a certain error range, correcting the initial test algorithm according to the difference until the difference between the voice recognition information obtained according to the corrected test algorithm and the voice target information is within a certain error range, and determining the finally corrected test algorithm as the recognition algorithm.
S102: receiving voice information to be recognized input by a user;
specifically, due to the problems of inaccurate pronunciation or accent of some users, after the terminal determines a recognition algorithm according to a preset rule, the recognition algorithm can recognize the voice information of a special user, for example, when the user determines the recognition algorithm, the input voice test information is dialect voice information, the determined recognition algorithm is a recognition algorithm for the dialect voice information of the user, and then, when the voice information to be recognized input by the user is the dialect voice information, the recognition algorithm determined according to the preset rule can recognize the voice information, so that corresponding behavior actions are completed. When a user needs to enable the terminal to complete corresponding action according to the voice information, the user needs to input the voice information to be recognized to the terminal, and the terminal receives the voice information to be recognized.
S103: and recognizing the voice information to be recognized according to a recognition algorithm.
Specifically, because the network or local speech model library includes the speech information of at least one user, and each speech information has a corresponding action behavior, after the terminal receives the speech information to be recognized input by the user, the speech information to be recognized input by the user is recognized according to the determined recognition algorithm, and when the terminal recognizes that the speech information to be recognized input by the user is the target speech information in the network or local speech model library, the terminal can recognize the action behavior corresponding to the speech information to be recognized according to the action behavior corresponding to the target speech information, thereby executing the recognized action behavior.
For example: at present, a terminal generally has a voice photographing function, and when a user says 'photograph' or 'eggplant' to the terminal, the terminal performs voice recognition and then judges whether the character is a corresponding character, so that the photographing function is executed. However, the user still cannot perform the photographing function after saying 'photograph' or 'eggplant' to the terminal due to inaccurate pronunciation or stuttering, and is embarrassed, at the moment, the user can start the voice calibration mode, and the photographing or 'eggplant' waiting recognition voice information provided by the user is calibrated according to the target voice information provided by the user, such as 'photograph', and the like, so that the user is identified to want to perform the photographing function, and thus, a set of voice recognition system belonging to the user is customized for the user, and the voice photographing function is realized.
Through the technical scheme, the voice recognition of 'custom-to-measure' can be carried out aiming at special users, the method has the advantage of 'special individual special treatment', the method of unified processing is avoided, certain re-learning performance is realized, the voice recognition rate is greatly increased, and the experience effect of the users is improved.
As shown in fig. 2, a schematic structural diagram of a terminal according to an embodiment of the present invention is shown in fig. 2, where the terminal includes:
the determining module 1 is used for determining a recognition algorithm according to a preset rule;
the first receiving module 2 is used for receiving the voice information to be recognized input by a user;
and the recognition module 3 is configured to recognize the speech information to be recognized, which is received by the first receiving module 2, according to the recognition algorithm determined by the determining module 1.
Specifically, when the speech recognition function of the terminal is used, the terminal cannot recognize the speech of the user due to the inaccurate pronunciation of some users or the existence of accents and other problems, so that, in order to increase the speech recognition rate and improve the experience effect of the user, the determining module 1 determines a recognition algorithm for a special user according to a preset rule, so that the recognition algorithm can recognize the speech information input by the special user, and thus, the corresponding behavior action is completed.
In this embodiment, before the determining module 1 determines the recognition algorithm according to the preset rule, the terminal determines whether the terminal is currently in the determination mode of the recognition algorithm, and for the determination mode of the recognition algorithm, the terminal may default to be in the determination mode in real time, or may start the determination mode according to the operation of the user.
If the terminal is not in the determined mode of the recognition algorithm, the voice information to be recognized input by the user is directly recognized, namely whether the voice information to be recognized exists in the network or the local voice model library is determined, so that a corresponding recognition result is obtained.
If the terminal is currently in the determination mode of the recognition algorithm, the terminal further comprises a second receiving module 4, a comparing module 5 and a correcting module 6;
when a user needs to customize a recognition algorithm for himself, not only the voice test information but also the set voice target information need to be input to the terminal, the first receiving module 2 receives the voice test information input by the user, the second receiving module 4 receives the voice target information set by the user, and the setting manner of the user includes, but is not limited to, the following manners: and the terminal displays a target voice input interface, and a user actively inputs voice information in the voice model library, or the terminal displays the voice information in the voice model library in a chart form for the user to select.
In the above technical solution, the recognition module 3 includes an extraction submodule 31 and a calculation submodule 32, when the first receiving module 2 receives the voice test information input by the user, the voice test information is processed, the extraction submodule 31 extracts at least one voice test value of the voice test information to represent the voice test information input by the user, when the extraction submodule 31 obtains the at least one voice test value, the calculation submodule 32 calculates the at least one voice test value according to a test algorithm to obtain the at least one voice recognition value, and uses the voice recognition value as the voice recognition information. When the second receiving module 4 receives the voice target information set by the user, the voice target information is processed, the extracting submodule 31 extracts at least one voice target value of the voice test information to represent the voice target information input by the user, the calculating submodule 32 calculates at least one voice target value extracted by the extracting submodule 31 according to a test algorithm to obtain at least one voice recognition value, and the voice recognition value is used as the voice recognition information.
In the above technical solution, the comparing module 5 compares the voice recognition information with the voice target information, that is, the comparing module 5 compares the voice test recognition value representing the voice recognition information with the voice target value representing the voice target information, and determines whether a difference between the voice test recognition value and the voice target value is within an error range, for example, an average error between at least one voice recognition value and at least one voice target value is calculated, and determines whether the average error is within a certain error range.
When the difference between the speech recognition information and the speech target information is not within the error range, if the average error between the speech recognition value and the speech target value is not within a certain error range, the average error is fed back to the test algorithm, it should be noted that, when the first receiving module 2 receives the speech test information and the second receiving module 4 receives the speech target information, the test algorithm starting to recognize the speech test information is the initial test algorithm, it should be noted that the initial test algorithm can be actively input by the user, if the initial test algorithm is actively input by the user, the second receiving module 4 receives the initial test algorithm input by the user, or can be configured in the terminal in advance, the correcting module 6 corrects the initial test algorithm according to the difference, and after the correction is completed, the recognition module 3 performs two operations on the initial speech test value or the speech test recognition value recognized last time according to the corrected test algorithm Performing secondary recognition to obtain a new voice recognition value, taking the new voice recognition value as voice recognition information, and comparing the voice recognition information subjected to secondary recognition with voice target information by using a comparison module 5; if the difference between the voice recognition information after the secondary recognition and the voice target information is not within a certain error range, the correction module 6 performs secondary correction on the corrected test algorithm according to the difference until the recognition module 3 recognizes the voice test information according to the corrected test algorithm, and the feedback can be stopped when the difference between the obtained voice recognition information and the voice target information is within a certain error range.
If the difference between the voice recognition information obtained according to the initial test algorithm and the voice target information is within a certain error range, the determining module 1 determines the initial test algorithm as the recognition algorithm; if the difference between the voice recognition information obtained according to the initial test algorithm and the voice target information is not within a certain error range, the initial test algorithm is corrected according to the difference until the difference between the voice recognition information obtained according to the corrected test algorithm and the voice target information is within a certain error range, and the determination module 1 determines the finally corrected test algorithm as the recognition algorithm.
When a user needs to make the terminal complete a corresponding action according to the voice information, the user needs to input the voice information to be recognized to the terminal, the first receiving module 2 receives the voice information to be recognized, and the voice information to be recognized is dialect voice information. Because the network or local voice model library comprises the voice information of at least one user and each voice information has a corresponding action behavior, after the first receiving module 2 receives the voice information to be recognized input by the user, the recognition module 3 recognizes the voice information to be recognized input by the user according to the recognition algorithm determined by the determination module 1, and when the recognition module 3 recognizes that the voice information to be recognized input by the user is the target voice information in the network or local voice model library, the terminal can recognize the action behavior corresponding to the voice information to be recognized according to the action behavior corresponding to the target voice information, so as to execute the recognized action behavior.
Fig. 3 is a flowchart of a method for determining a recognition algorithm according to an embodiment of the present invention, and as shown in fig. 3, the method includes:
s301: receiving voice information;
s302: extracting voice characteristic parameters;
s303: judging whether the terminal starts a calibration mode, if so, turning to S304, otherwise, turning to S309;
s304: receiving voice target information;
s305: extracting voice target parameters;
s306: recognizing voice test parameters according to a test algorithm;
s307: calculating a difference value;
s308: judging whether the difference is within an allowable range, if so, turning to S309, otherwise, correcting the test algorithm according to the difference, and returning to S306;
s309: recognizing voice information;
s310: an action behavior is performed.
In this embodiment, the test algorithm is modified based on the difference to determine the recognition algorithm.
The terminal receives voice information input by a user, processes the voice information, extracts at least one voice characteristic parameter of the voice information and is used for representing the voice information input by the user;
before determining the identification algorithm, judging whether a terminal starts a calibration mode at present, wherein the calibration mode indicates whether the terminal needs to determine the identification algorithm according to a preset rule;
if the terminal does not start the calibration mode, the voice information input by the user is directly identified, and the identification process comprises the following two modes:
(1) according to the existing recognition mode, recognizing the voice information, namely determining whether a voice characteristic parameter matched with the voice characteristic parameter of the voice information exists in a voice model library, and if so, executing a corresponding action;
(2) and recognizing the voice information according to a recognition algorithm determined by the terminal according to the preset rule last time, so that the recognition result of the voice information is matched with the voice information in the voice model library, and the action behavior desired by the user is executed.
If the terminal starts the calibration mode, receiving voice target information set by a user, processing the voice target information, extracting at least one voice target parameter of the voice target information, representing the voice target information set by the user, identifying the voice characteristic parameter according to a test algorithm, calculating a difference value between the identified voice characteristic parameter and the voice target parameter, judging whether the difference value is within an allowable range, if so, determining an initial test algorithm as a recognition algorithm, identifying the voice information input by the user as the voice target information through the recognition algorithm, and executing a corresponding action, otherwise, feeding the difference value back to the test algorithm, modifying the test algorithm according to the difference value, optimizing the structure of the test algorithm, adjusting the parameters of the test algorithm, after the modification is completed, performing secondary identification on the voice characteristic parameter or the voice characteristic parameter after the last identification, and calculating the difference value between the voice characteristic parameter after secondary recognition and the voice target parameter, if the difference value is within an allowable range, determining the corrected test algorithm as the recognition algorithm, recognizing the voice information input by the user as the voice target information through the corrected recognition algorithm, executing corresponding action behaviors, if the difference value is not within the allowable range, secondarily correcting the corrected test algorithm according to the difference value until the difference value between the voice characteristic parameter and the voice target parameter can stop feedback within a certain error range after the voice characteristic parameter is recognized according to the corrected test algorithm, determining the finally corrected test algorithm as the recognition algorithm, recognizing the voice information input by the user as the voice target information through the finally corrected recognition algorithm, and executing the corresponding action behaviors.
Through the technical scheme, when the difference value between the voice information input by the user and the voice target information is within the allowable range, the terminal determines a set of unique recognition algorithm for the user according to the preset rule, and the user improves the voice recognition rate of the user and enhances the experience effect of the user through the recognition algorithm.
The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A terminal, comprising:
the determining module is used for determining a recognition algorithm according to a preset rule;
the first receiving module is used for receiving the voice information to be recognized input by a user;
and the recognition module is used for recognizing the voice information to be recognized received by the first receiving module according to the recognition algorithm determined by the determination module.
2. The terminal of claim 1, further comprising a second receiving module, a comparing module, and a modifying module;
the first receiving module is also used for receiving voice test information input by a user;
the second receiving module is used for receiving voice target information set by a user;
the recognition module is further used for recognizing the voice test information received by the first receiving module according to a test algorithm to obtain voice recognition information;
the comparison module is used for comparing the voice recognition information obtained by the recognition module with the voice target information;
the correction module is used for correcting the test algorithm according to the difference if the difference between the voice recognition information and the voice target information is not within an error range;
the determining module is specifically configured to determine the test algorithm as the recognition algorithm if the difference between the speech recognition information and the speech target information is within an error range.
3. The terminal of claim 2, wherein the identification module comprises an extraction sub-module, a calculation sub-module;
the extraction submodule is used for extracting at least one voice test value of the voice test information;
the calculation submodule is used for calculating at least one voice test value extracted by the extraction submodule according to a test algorithm to obtain at least one voice recognition value, and the voice recognition value is used as voice recognition information.
4. The terminal of claim 3,
the extraction submodule is also used for extracting at least one voice target numerical value of the voice target information;
the calculation submodule is also used for calculating the average error of the at least one voice recognition value extracted by the extraction submodule and the at least one voice target value;
the comparison module is specifically configured to determine whether the average error calculated by the calculation submodule is within a certain range.
5. The terminal according to any of claims 2 to 4, wherein the modification module is specifically configured to modify the structure and parameters of the test algorithm according to the difference.
6. A speech recognition method, comprising:
determining a recognition algorithm according to a preset rule;
receiving voice information to be recognized input by a user;
and recognizing the voice information to be recognized according to the recognition algorithm.
7. The speech recognition method according to claim 6, wherein the determining a recognition algorithm according to the preset rule specifically comprises:
step A, receiving voice test information input by a user and voice target information set by the user;
b, recognizing the voice test information according to a test algorithm to obtain voice recognition information;
step C, comparing the voice recognition information with the voice target information, if the difference between the voice recognition information and the voice target information is within an error range, entering step E, otherwise, entering step D;
d, correcting the test algorithm according to the difference, and returning to the step B;
and E, determining the test algorithm as the identification algorithm.
8. The speech recognition method according to claim 7, wherein recognizing the speech test information according to a test algorithm to obtain the speech recognition information specifically comprises:
extracting at least one voice test value of the voice test information;
and calculating the at least one voice test value according to a test algorithm to obtain at least one voice recognition value, and taking the voice recognition value as voice recognition information.
9. The speech recognition method of claim 8, wherein comparing the speech recognition information with the speech target information specifically comprises:
extracting at least one voice target value of the voice target information;
and calculating the average error of the at least one voice recognition value and the at least one voice target value, and judging whether the average error is in a certain range.
10. The speech recognition method of any one of claims 7-9, wherein modifying the test algorithm based on the difference specifically comprises:
and correcting the structure and parameters of the test algorithm according to the difference.
CN201410371025.2A 2014-07-30 2014-07-30 Terminal and voice identification method Pending CN104112445A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410371025.2A CN104112445A (en) 2014-07-30 2014-07-30 Terminal and voice identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410371025.2A CN104112445A (en) 2014-07-30 2014-07-30 Terminal and voice identification method

Publications (1)

Publication Number Publication Date
CN104112445A true CN104112445A (en) 2014-10-22

Family

ID=51709206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410371025.2A Pending CN104112445A (en) 2014-07-30 2014-07-30 Terminal and voice identification method

Country Status (1)

Country Link
CN (1) CN104112445A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106057194A (en) * 2016-06-25 2016-10-26 浙江合众新能源汽车有限公司 Voice interaction system
CN106328124A (en) * 2016-08-24 2017-01-11 安徽咪鼠科技有限公司 Voice recognition method based on user behavior characteristics
CN106611599A (en) * 2015-10-21 2017-05-03 展讯通信(上海)有限公司 Voice recognition method and device based on artificial neural network and electronic equipment
CN106997767A (en) * 2017-03-24 2017-08-01 百度在线网络技术(北京)有限公司 Method of speech processing and device based on artificial intelligence
CN107170454A (en) * 2017-05-31 2017-09-15 广东欧珀移动通信有限公司 Audio recognition method and Related product
CN107301862A (en) * 2016-04-01 2017-10-27 北京搜狗科技发展有限公司 A kind of audio recognition method, identification model method for building up, device and electronic equipment
CN108597500A (en) * 2018-03-30 2018-09-28 四川斐讯信息技术有限公司 A kind of intelligent wearable device and the audio recognition method based on intelligent wearable device
CN110379418A (en) * 2019-06-28 2019-10-25 西安交通大学 A kind of voice confrontation sample generating method
CN111968620A (en) * 2019-05-20 2020-11-20 北京声智科技有限公司 Algorithm testing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034472A (en) * 2009-09-28 2011-04-27 戴红霞 Speaker recognition method based on Gaussian mixture model embedded with time delay neural network
US20110301953A1 (en) * 2010-06-07 2011-12-08 Seoby Electronic Co., Ltd System and method of multi model adaptation and voice recognition
CN102800316A (en) * 2012-08-30 2012-11-28 重庆大学 Optimal codebook design method for voiceprint recognition system based on nerve network
CN103514879A (en) * 2013-09-18 2014-01-15 广东欧珀移动通信有限公司 Local voice recognition method based on BP neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034472A (en) * 2009-09-28 2011-04-27 戴红霞 Speaker recognition method based on Gaussian mixture model embedded with time delay neural network
US20110301953A1 (en) * 2010-06-07 2011-12-08 Seoby Electronic Co., Ltd System and method of multi model adaptation and voice recognition
CN102800316A (en) * 2012-08-30 2012-11-28 重庆大学 Optimal codebook design method for voiceprint recognition system based on nerve network
CN103514879A (en) * 2013-09-18 2014-01-15 广东欧珀移动通信有限公司 Local voice recognition method based on BP neural network

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106611599A (en) * 2015-10-21 2017-05-03 展讯通信(上海)有限公司 Voice recognition method and device based on artificial neural network and electronic equipment
CN107301862A (en) * 2016-04-01 2017-10-27 北京搜狗科技发展有限公司 A kind of audio recognition method, identification model method for building up, device and electronic equipment
CN106057194A (en) * 2016-06-25 2016-10-26 浙江合众新能源汽车有限公司 Voice interaction system
CN106328124A (en) * 2016-08-24 2017-01-11 安徽咪鼠科技有限公司 Voice recognition method based on user behavior characteristics
CN106997767A (en) * 2017-03-24 2017-08-01 百度在线网络技术(北京)有限公司 Method of speech processing and device based on artificial intelligence
CN107170454A (en) * 2017-05-31 2017-09-15 广东欧珀移动通信有限公司 Audio recognition method and Related product
CN107170454B (en) * 2017-05-31 2022-04-05 Oppo广东移动通信有限公司 Speech recognition method and related product
CN108597500A (en) * 2018-03-30 2018-09-28 四川斐讯信息技术有限公司 A kind of intelligent wearable device and the audio recognition method based on intelligent wearable device
CN111968620A (en) * 2019-05-20 2020-11-20 北京声智科技有限公司 Algorithm testing method and device, electronic equipment and storage medium
CN111968620B (en) * 2019-05-20 2024-05-28 北京声智科技有限公司 Algorithm testing method and device, electronic equipment and storage medium
CN110379418A (en) * 2019-06-28 2019-10-25 西安交通大学 A kind of voice confrontation sample generating method
CN110379418B (en) * 2019-06-28 2021-08-13 西安交通大学 Voice confrontation sample generation method

Similar Documents

Publication Publication Date Title
CN104112445A (en) Terminal and voice identification method
US5212821A (en) Machine-based learning system
EP3971786B1 (en) Feedforward generative neural networks
US9400955B2 (en) Reducing dynamic range of low-rank decomposition matrices
EP3301675B1 (en) Parameter prediction device and parameter prediction method for acoustic signal processing
CN110992934A (en) Defense method and defense device for black box attack model of voice recognition system
KR20190113952A (en) Batch renormalization layer
CN109947940B (en) Text classification method, device, terminal and storage medium
JP2020061173A (en) Answer learning device, answer learning method, answer generating device, answer generating method, and program
CN110277090B (en) Self-adaptive correction method and system for pronunciation dictionary model of user person
CN111191450A (en) Corpus cleaning method, corpus entry device and computer-readable storage medium
WO2012094014A1 (en) Automatic updating of confidence scoring functionality for speech recognition systems
CN110503149B (en) Method and system for classifying local features in image
US20170270909A1 (en) Method for correcting false recognition contained in recognition result of speech of user
CN110362814A (en) A kind of name entity recognition method and device based on improvement loss function
CN113222883A (en) Apparatus and method for handling exception detection
CN115329744B (en) Natural language processing method, system, equipment and storage medium
CN111260056A (en) Network model distillation method and device
CN106384587A (en) Voice recognition method and system thereof
CN106776335B (en) Test case clustering analysis method and system
CN107066560B (en) Text classification method and device
CN112214592A (en) Reply dialogue scoring model training method, dialogue reply method and device
CN108288074A (en) A kind of selection method and system of data characteristics
CN111696555A (en) Method and system for confirming awakening words
WO2020134829A1 (en) Neural network batch normalization optimization method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141022

RJ01 Rejection of invention patent application after publication