CN115188381B

CN115188381B - Voice recognition result optimization method and device based on click ordering

Info

Publication number: CN115188381B
Application number: CN202210540446.8A
Authority: CN
Inventors: 郑宏; 郑善福; 阮海鹏
Original assignee: Seashell Housing Beijing Technology Co Ltd
Current assignee: Seashell Housing Beijing Technology Co Ltd
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2023-10-24
Anticipated expiration: 2042-05-17
Also published as: CN115188381A

Abstract

The application provides a click ordering-based voice recognition result optimization method and a device, wherein the method comprises the following steps: acquiring a first set containing a plurality of voice recognition results; generating a second set based on the click rate of each speech recognition result in the first set; training the sorting model by taking the objects in the second set as samples to obtain a target model; the click rate of the voice recognition result is the probability that the voice recognition result generated based on voice input is selected; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected. According to the click sequencing-based voice recognition result optimization method provided by the embodiment of the application, the sequencing model is trained according to the target parameters of the acquired voice recognition result, the probability that the voice recognition result is selected is predicted through the sequencing model, the recognition result of voice input is sequenced based on the selected probability, and the accuracy of the voice recognition result is improved.

Description

Voice recognition result optimization method and device based on click ordering

Technical Field

The present application relates to the field of speech recognition, and in particular, to a method and apparatus for optimizing speech recognition results based on click ordering.

Background

The speech recognition technology can convert speech input by a user into text sentences, for example, the user can make speech input in an input method. In the related art, after the input method obtains the voice input of the user, selecting the voice recognition result with the highest language model score and acoustic model score from the N-best-based algorithm as the final voice recognition result, displaying N candidate results to the user, providing the N candidate results for the user to select, and completing the input by the user by selecting the sentence closest to the intention of the user from the N candidate results.

The first-ranked speech recognition candidates may not necessarily be optimal due to environmental noise, or training corpus, etc. The traditional optimization thinking is to continuously optimize the voice recognition front end and optimize the noise processing link, or optimize acoustic and language model training corpus, or directly optimize the voice recognition model. However, the above-mentioned optimization methods are difficult to achieve the refinement processing of each audio and recognition result, resulting in that the accuracy of the speech recognition result is difficult to reach a high level.

Disclosure of Invention

The application aims to provide a click ordering-based voice recognition result optimization method and device, which are used for improving the accuracy of voice recognition results.

The application provides a click ordering-based voice recognition result optimization method, which comprises the following steps:

acquiring a first set containing a plurality of voice recognition results; generating a second set based on the click rate of each speech recognition result in the first set; training the sorting model by taking the objects in the second set as samples to obtain a target model; the click rate of the voice recognition result is the probability that the voice recognition result generated based on voice input is selected; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected.

Optionally, the generating a second set based on the click rate of each voice recognition result in the first set includes: generating a second set according to the target parameters of each voice recognition result in the first set; wherein the target parameters include at least one of: the method comprises the steps of determining the length of a sentence, whether the sentence contains English characters, whether the sentence contains numbers, the confidence of a voice recognition result, the number of times the sentence is selected, the number of times the sentence is displayed and the probability of the sentence being selected.

Optionally, before training the sorting model by using the objects in the second set as samples to obtain the target model, the method further includes: dividing the objects in the second set into a training set and a prediction set according to a preset proportion; labeling each training sample in the training set based on the selected condition of the voice recognition result indicated by each sample in the training set; wherein, the sample containing the selected speech recognition result is marked as 1, and the sample containing the unselected speech recognition result is marked as 0.

Optionally, training the sorting model by using the objects in the second set as samples to obtain a target model, including: generating parallel decision trees based on first parameters of samples in the training set, and respectively predicting scores of the samples in the training set; obtaining a prediction result of the target sample based on the prediction result of each decision tree in the parallel decision trees aiming at the target sample; wherein the first parameter is a parameter in the target parameters; the target sample is any sample in the training set.

Optionally, the first parameter is a confidence level of the speech recognition result.

Optionally, after training the sorting model by using the objects in the second set as samples to obtain the target model, the method further includes: acquiring a third set containing a plurality of voice recognition results obtained based on the target voice input; generating a fourth set according to the target parameters of each voice recognition result in the third set; and predicting each voice recognition result in the fourth set by using the target model to obtain the probability that each voice recognition result in the fourth set is selected.

The application also provides a device for optimizing the voice recognition result based on click ordering, which comprises the following steps:

the acquisition module is used for acquiring a first set containing a plurality of voice recognition results; the data processing module is used for generating a second set based on the click rate of each voice recognition result in the first set acquired by the acquisition module; the training module is used for training the sorting model by taking the objects in the second set as samples to obtain a target model; the click rate of the voice recognition result is the probability that the voice recognition result generated based on voice input is selected; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected.

Optionally, the data processing module is specifically configured to generate a second set according to the target parameter of each speech recognition result in the first set; wherein the target parameters include at least one of: the method comprises the steps of determining the length of a sentence, whether the sentence contains English characters, whether the sentence contains numbers, the confidence of a voice recognition result, the number of times the sentence is selected, the number of times the sentence is displayed and the probability of the sentence being selected.

Optionally, the apparatus further comprises: the dividing module is used for dividing the objects in the second set into a training set and a prediction set according to a preset proportion; the training module is specifically configured to label each training sample in the training set based on a selected condition of a speech recognition result indicated by each sample in the training set; wherein, the sample containing the selected speech recognition result is marked as 1, and the sample containing the unselected speech recognition result is marked as 0.

Optionally, the training module is specifically configured to generate a parallel decision tree based on a first parameter of the samples in the training set, and respectively predict scores of the samples in the training set; the training module is specifically further configured to obtain a prediction result of the target sample based on the prediction result of each decision tree in the parallel decision trees for the target sample; wherein the first parameter is a parameter in the target parameters; the target sample is any sample in the training set.

Optionally, the apparatus further comprises: a prediction module; the acquisition module is further used for acquiring a third set containing a plurality of voice recognition results obtained based on target voice input; the data processing module is further used for generating a fourth set according to the target parameters of each voice recognition result in the third set; and the prediction module is used for predicting each voice recognition result in the fourth set by using the target model to obtain the probability that each voice recognition result in the fourth set is selected.

The application also provides a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of a click-ordering based speech recognition result optimization method as described in any of the above.

The application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor realizes the steps of the click sequencing-based voice recognition result optimizing method when executing the program.

The present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the click-ordering based speech recognition result optimization method as described in any of the above.

According to the click ordering-based voice recognition result optimizing method and device, the first set containing the plurality of voice recognition results is obtained, and the second set is generated based on the click rate of each voice recognition result in the first set, wherein the click rate of the voice recognition results is the probability that the voice recognition result generated based on voice input is selected. And then training the sorting model by taking the objects in the second set as samples to obtain a target model. After a voice input of a user is obtained and N voice recognition results are obtained according to the voice input, the probability of being selected based on each voice recognition result in the N voice recognition results can be predicted through a target model, so that the voice recognition result with the highest probability of being selected is displayed to the user, and the accuracy of the voice recognition result is improved.

Drawings

In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a click ordering-based voice recognition result optimization method provided by the application;

FIG. 2 is a schematic diagram of a click ordering-based speech recognition result optimizing apparatus according to the present application;

fig. 3 is a schematic structural diagram of an electronic device provided by the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The voice recognition technology, which is a man-machine interaction technology, can be applied to various scenes, such as a voice input method, voice control and the like. The purpose of this is to communicate with the machine in voice, letting the machine understand what you say. Taking a voice input method as an example, after a user inputs voice, the input method recognizes the voice input of the user, sorts a plurality of obtained voice recognition candidate results for the user to select, and the user selects a candidate item with the closest semantic meaning from the plurality of voice recognition candidate results to finish the input operation; taking voice control as an example, after the equipment receives voice input of a user, a plurality of voice recognition results are generated, each voice recognition result has a corresponding score, the equipment takes the voice recognition result with the highest score as a control instruction of the user, and corresponding actions are executed according to the control instruction.

However, due to environmental noise, or corpus, the first, or highest scoring, speech recognition candidate may not necessarily be the best speech recognition result, which may result in the user having to select another candidate or re-enter speech.

Aiming at the problem of low accuracy of the voice recognition result in the related technology, the application discloses a voice recognition result optimizing method based on click sequencing, which improves the accuracy of the voice recognition result.

The click sequencing-based voice recognition result optimization method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

As shown in fig. 1, the method for optimizing a speech recognition result based on click sequencing according to the embodiment of the present application may include the following steps 101 to 103:

step 101, a first set including a plurality of speech recognition results is obtained.

The plurality of speech recognition results are, for example, a plurality of speech recognition results obtained by performing speech recognition on a speech input.

For example, taking the above speech recognition application scenario as an input method scenario as an example, after a user performs speech input, the input method performs recognition on the speech input through a local speech recognition model or a speech recognition model on a server, so as to obtain a plurality of speech recognition candidate results. The first set includes a plurality of speech recognition results for a plurality of speech inputs.

Step 102, generating a second set based on the click rate of each voice recognition result in the first set.

The click rate of the voice recognition result is the probability that the voice recognition result generated based on voice input is selected.

Illustratively, the second set is used for training of the ranking model, and therefore, the objects in the second set need to be processed in a data format that the ranking model can handle. And taking the processed object as a sample to train the sequencing model.

Illustratively, the data formats that the ranking model is capable of handling may include at least one of the following parameters: the method comprises the steps of determining the length of a sentence, determining whether the sentence contains English characters, determining whether the sentence contains numbers, determining the confidence of a voice recognition result, determining the number of times the sentence is selected, determining the number of times the sentence is displayed, and determining the number of times the sentence is selected.

It should be noted that the data format may include a plurality of parameters among the above parameters, and may be selected according to actual situations. In the embodiment of the application, the number of times the sentence is selected, the number of times the sentence is displayed, and the number of times the sentence is selected, namely the probability of the sentence being selected are mainly used as parameters.

The sentence is selected to understand that the speech recognition candidate item in the input method scene is selected by the user, and also can be understood to be the speech recognition result with the highest score in the speech control scene, and the speech recognition result is taken as a control instruction of the user by the machine. Different scenarios may have different understandings.

And step 103, training the sorting model by taking the objects in the second set as samples to obtain a target model.

Wherein the target model is used for predicting the probability that a speech recognition result generated based on the speech input is selected.

Illustratively, the ranking model described above may include the following three categories: 1. pointwise: scoring the single sample, classifying or regressing, typically representing logistic regression and XGBoost; 2. paper with: considering the partial order relationship between every two samples, a typical representation is RankSVM, lambdaMart; 3. lisdwise, taking the whole ordering as an optimization target, optimizing a model by predicting the distribution gap between the distribution and the true ordering, and typically representing ListNet.

It should be noted that, in the embodiment of the present application, the voice recognition result optimizing method based on click sequencing provided in the embodiment of the present application is described by taking XGBoost as an example of the above-mentioned sequencing model.

Illustratively, based on the click rate of each speech recognition result in the first set, the objects in the first set are subjected to data processing to obtain a second set for training the ranking model. And then training the ordering model by taking the objects in the second set as samples to obtain a target model for ordering the voice recognition result.

For example, after the target model is obtained, the target model may be used to sort a plurality of speech recognition results of the speech input, so as to obtain a speech recognition result with the highest probability of being selected.

In this manner, a second set is generated by obtaining a first set comprising a plurality of speech recognition results and based on a click rate of each speech recognition result in the first set, wherein the click rate of the speech recognition result is a probability that a speech recognition result generated based on a speech input is selected. And then training the sorting model by taking the objects in the second set as samples to obtain a target model. After a voice input of a user is obtained and N voice recognition results are obtained according to the voice input, the probability of being selected based on each voice recognition result in the N voice recognition results can be predicted through a target model, so that the voice recognition result with the highest probability of being selected is displayed to the user, and the accuracy of the voice recognition result is improved.

Optionally, in the embodiment of the present application, specifically, according to the target parameter of each speech recognition result in the first set, the object in the first set may be processed to obtain the second set for training the ranking model.

Illustratively, the step 102 described above may include the following step 102a:

102a, generating a second set according to the target parameters of each voice recognition result in the first set;

wherein the target parameters include at least one of: the method comprises the steps of determining the length of a sentence, determining whether the sentence contains English characters, determining whether the sentence contains numbers, determining the confidence of a voice recognition result, determining the number of times the sentence is selected, determining the number of times the sentence is displayed, and determining the number of times the sentence is selected.

It can be appreciated that the confidence level of the speech recognition result identifies the confidence level of the speech recognition result; the number of times the sentence is presented indicates the number of times the speech recognition result is regarded as closest to the intent of the speech input.

Illustratively, the speech recognition results in the first set may be processed based on at least one of the above-mentioned target parameters, and in order to maximize the accuracy of the target model prediction results, in one possible implementation, the length of the sentence of each speech recognition result may be obtained, whether the sentence contains english characters, whether the sentence contains numbers, the confidence level of the speech recognition result, the number of times the sentence is selected, the number of times the sentence is presented, and the number of times the sentence is selected.

Further, in order to be able to use all objects in the second set for training of the ranking model, it is also necessary to divide the second set into training data and test data.

Illustratively, before the step 103, the method for optimizing a speech recognition result based on click sequencing according to the embodiment of the present application may further include the following steps 102b1 and 102b2:

step 102b1, dividing the objects in the second set into a training set and a prediction set according to a preset proportion.

Step 102b2, labeling each training sample in the training set based on the selected condition of the speech recognition result indicated by each sample in the training set.

Wherein, the sample containing the selected speech recognition result is marked as 1, and the sample containing the unselected speech recognition result is marked as 0.

For example, the preset ratio may be 7:3, that is, the number of samples in the training set is 70% of the number of objects in the second set, and the number of samples in the prediction set is 30% of the number of objects in the second set. The training set is used for training the sorting model, the prediction set is used for verifying the training result of the sorting model, and the optimal model parameters are obtained after multiple rounds of parameter adjustment.

Illustratively, each sample in the training set may need to be labeled, and the sample may be labeled based on the selected condition of the speech recognition result indicated by each sample. Taking the input method scene as an example, when a voice input result indicated by a certain sample is selected by a user, the sample is marked as 1, otherwise, the sample is marked as 0.

Illustratively, as shown in table 1 below, the objects in the second set may include the following:

TABLE 1

Illustratively, the samples in the training set may be processed in the data format described in Table 1 above.

Therefore, after the data processing is carried out on the objects in the first set, a sample for training the sequencing model is obtained, and the probability of the voice recognition candidate item being selected can be accurately predicted based on the target model obtained by training the sample, so that the accuracy of the voice recognition result is improved.

Optionally, in the embodiment of the present application, for the training process of the XGBoost ranking model applied in the embodiment of the present application, training may specifically be performed with reference to the following steps.

Illustratively, the step 103 may specifically include the following steps 103a1 and 103a2:

step 103a1, generating a parallel decision tree based on the first parameters of the samples in the training set, and respectively predicting the scores of the samples in the training set.

Step 103a2, obtaining a prediction result of the target sample based on the prediction result of each decision tree in the parallel decision trees for the target sample.

Wherein the first parameter is a parameter in the target parameters; the target sample is any sample in the training set.

It will be appreciated that XGBoost is an optimized distributed gradient enhanced library that is intended to be efficient, flexible and portable. It implements a machine learning algorithm under the Gradient Boosting framework. XGBoost provides parallel tree promotion (also known as GBDT, GBM) that can quickly and accurately solve many data science problems. That is, XGBoost can efficiently and quickly process data by a parallel tree training method.

Illustratively, since XGBoost may contain parallel tree results, multiple parallel trees may be derived based on different parameters. For example, a speech recognition result with a high confidence is closer to the intention than a speech recognition result with a low confidence, and therefore the probability of being selected is higher; since most people have nonstandard English pronunciation, the accuracy of the voice recognition result containing English is lower than that of the voice recognition result not containing English. Based on the above, confidence degree distinction can be performed first, then further distinction is performed by whether English is contained or not, each sample is scored one by one, and the scores of the samples in the parallel decision tree in each tree are added, so that the comprehensive score of the sample is obtained.

In one possible implementation manner, the first parameter is a confidence level of the speech recognition result.

In this way, a parallel tree is generated according to the confidence of the speech recognition of the sample, and the prediction result of the target sample is obtained based on the prediction result of the parallel decision tree for the target sample, so that the accuracy of the prediction result is further improved.

Optionally, in the embodiment of the present application, after the target model is obtained through the above steps, the target model may be used to predict the voice input result of the voice input.

Illustratively, after the step 103, the method for optimizing a speech recognition result based on click sequencing according to the embodiment of the present application may further include the following steps 104 to 106:

step 104, obtaining a third set of a plurality of speech recognition results comprising input of the target speech.

Step 105, generating a fourth set according to the target parameters of each voice recognition result in the third set.

And 106, predicting each voice recognition result in the fourth set by using the target model to obtain the probability that each voice recognition result in the fourth set is selected.

The speech recognition result with the highest probability of being selected in the fourth set is the most likely speech recognition result to be selected.

For example, the target voice input may be a voice input of a user, and after the device obtains the target voice input, the device performs voice recognition on the target voice input through a language model and an acoustic model based on the voice input, so as to obtain N-Best, i.e. N nearest voice recognition results.

It can be understood that the N-Best is a plurality of speech recognition results obtained based on the target speech input.

Illustratively, after obtaining the plurality of speech recognition results, the target parameters of each speech input result are obtained and a fourth set is generated according to the method in step 102. Then, using each object in the fourth set as input data of the target model, predicting the probability that each speech recognition result is selected by using the target model. And the speech recognition result with the highest selected probability among the multiple speech recognition results predicted by the target model is the optimal speech recognition result.

For example, taking the input method scenario as an example, after the voice input of the user is obtained, a plurality of voice recognition candidate results are obtained based on the voice input, at this time, the plurality of voice recognition candidate results are not directly displayed to the user, but are predicted by using a target model, the voice recognition candidate results are sorted in descending order according to the probability that the voice recognition candidate results are selected by the user according to the prediction result of the target model, and then the sorted results are displayed to the user.

According to the click sequencing-based voice recognition result optimization method provided by the embodiment of the application, the target model is obtained by training the sequencing model by obtaining the data sample obtained by using the click rate based on the voice recognition result. And predicting a plurality of voice recognition results obtained by recognizing the voice input through the target model, so as to obtain the voice recognition result with the highest selected probability. The click ordering-based voice recognition result optimizing method provided by the embodiment of the application greatly improves the accuracy of the voice recognition result, so that the voice recognition result ranked in the first voice recognition result or the voice recognition result selected by the machine is always the best voice recognition result.

It should be noted that, in the voice recognition result optimizing method based on click ordering provided in the embodiments of the present application, the execution subject may be a voice recognition result optimizing device based on click ordering, or a control module for executing the voice recognition result optimizing method based on click ordering in the voice recognition result optimizing device based on click ordering. In the embodiment of the application, a click ordering-based voice recognition result optimizing device executes a click ordering-based voice recognition result optimizing method as an example, and the click ordering-based voice recognition result optimizing device provided by the embodiment of the application is described.

In the embodiment of the present application, the method is shown in the drawings. The click ordering-based voice recognition result optimizing method is exemplified by combining with one drawing in the embodiment of the application. In specific implementation, the method for optimizing the speech recognition result based on click ordering shown in the above method drawings may be further implemented in combination with any other drawing that may be combined and is illustrated in the above embodiment, and will not be described herein.

The following description of the present application is provided, and the following description and the above-described click ranking-based voice recognition result optimization method can be referred to correspondingly.

Fig. 2 is a schematic diagram of a structure of a click ordering-based voice recognition result optimizing apparatus according to an embodiment of the present application, where, as shown in fig. 2, the structure specifically includes:

an acquisition module 201, configured to acquire a first set including a plurality of speech recognition results; a data processing module 202, configured to generate a second set based on the click rate of each speech recognition result in the first set acquired by the acquiring module 201; the training module 203 is configured to train the ranking model by using the objects in the second set as samples, so as to obtain a target model; the click rate of the voice recognition result is the probability that the voice recognition result generated based on voice input is selected; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected.

Optionally, the data processing module 202 is specifically configured to generate a second set according to the target parameter of each speech recognition result in the first set; wherein the target parameters include at least one of: the method comprises the steps of determining the length of a sentence, whether the sentence contains English characters, whether the sentence contains numbers, the confidence of a voice recognition result, the number of times the sentence is selected, the number of times the sentence is displayed and the probability of the sentence being selected.

Optionally, the apparatus further comprises: a dividing module 204, configured to divide the objects in the second set into a training set and a prediction set according to a preset proportion; the training module 203 is specifically configured to label each training sample in the training set based on a selected condition of a speech recognition result indicated by each sample in the training set; wherein, the sample containing the selected speech recognition result is marked as 1, and the sample containing the unselected speech recognition result is marked as 0.

Optionally, the training module 203 is specifically configured to generate a parallel decision tree based on a first parameter of the samples in the training set, and respectively predict scores of the samples in the training set; the training module 203 is specifically further configured to obtain a prediction result of the target sample based on the prediction result of each decision tree in the parallel decision trees for the target sample; wherein the first parameter is a parameter in the target parameters; the target sample is any sample in the training set.

Optionally, the apparatus further comprises: a prediction module 204; the obtaining module 201 is further configured to obtain a third set including a plurality of speech recognition results obtained based on the target speech input; the data processing module 202 is further configured to generate a fourth set according to the target parameter of each speech recognition result in the third set; the prediction module 204 is configured to predict each speech recognition result in the fourth set using the target model, so as to obtain a probability that each speech recognition result in the fourth set is selected.

According to the voice recognition result optimizing device based on click sequencing, the data sample obtained by using the click rate based on the voice recognition result is used for training the sequencing model, so that the target model is obtained. And predicting a plurality of voice recognition results obtained by recognizing the voice input through the target model, so as to obtain the voice recognition result with the highest selected probability. The click ordering-based voice recognition result optimizing method provided by the embodiment of the application greatly improves the accuracy of the voice recognition result, so that the voice recognition result ranked in the first voice recognition result or the voice recognition result selected by the machine is always the best voice recognition result.

Fig. 3 illustrates a physical schematic diagram of an electronic device, as shown in fig. 3, where the electronic device may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a click-ordering based speech recognition result optimization method comprising: acquiring a first set containing a plurality of voice recognition results; generating a second set based on the click rate of each speech recognition result in the first set; training the sorting model by taking the objects in the second set as samples to obtain a target model; the click rate of the voice recognition result is the probability that the voice recognition result generated based on voice input is selected; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected.

Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the click-sequencing based speech recognition result optimization method provided by the above methods, the method comprising: acquiring a first set containing a plurality of voice recognition results; generating a second set based on the click rate of each speech recognition result in the first set; training the sorting model by taking the objects in the second set as samples to obtain a target model; the click rate of the voice recognition result is the probability that the voice recognition result generated based on voice input is selected; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected.

In yet another aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above provided click ordering based speech recognition result optimization method, the method comprising: acquiring a first set containing a plurality of voice recognition results; generating a second set based on the click rate of each speech recognition result in the first set; training the sorting model by taking the objects in the second set as samples to obtain a target model; the click rate of the voice recognition result is the probability that the voice recognition result generated based on voice input is selected; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. The click ordering-based voice recognition result optimizing method is characterized by comprising the following steps of:

acquiring a first set containing a plurality of voice recognition results;

processing the voice recognition result in the first set based on the target parameter to generate a second set;

training the sorting model by taking the objects in the second set as samples to obtain a target model;

wherein the target parameter comprises a probability that the sentence is selected, and at least one of: the length of the sentence, whether the sentence contains English characters, whether the sentence contains numbers, the confidence level of the voice recognition result, the number of times the sentence is selected and the number of times the sentence is displayed; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected.

2. The method of claim 1, wherein the training the ranking model using the objects in the second set as samples further comprises, prior to obtaining the target model:

dividing the objects in the second set into a training set and a prediction set according to a preset proportion;

labeling each training sample in the training set based on the selected condition of the voice recognition result indicated by each sample in the training set;

3. The method of claim 2, wherein training the ranking model using the objects in the second set as samples results in a target model, comprising:

generating parallel decision trees based on first parameters of samples in the training set, and respectively predicting scores of the samples in the training set;

obtaining a prediction result of the target sample based on the prediction result of each decision tree in the parallel decision trees aiming at the target sample;

4. A method according to claim 3, wherein the first parameter is a confidence level of the speech recognition result.

5. The method according to any one of claims 1 to 4, wherein after training the ranking model with the objects in the second set as samples to obtain the target model, the method further comprises:

acquiring a third set containing a plurality of voice recognition results obtained based on the target voice input;

generating a fourth set according to the target parameters of each voice recognition result in the third set;

and predicting each voice recognition result in the fourth set by using the target model to obtain the probability that each voice recognition result in the fourth set is selected.

6. A click ordering based speech recognition result optimizing apparatus, the apparatus comprising:

the acquisition module is used for acquiring a first set containing a plurality of voice recognition results;

the data processing module is used for processing the voice recognition result in the first set based on the target parameter to generate a second set;

the training module is used for training the sorting model by taking the objects in the second set as samples to obtain a target model;

wherein the target parameter comprises a probability that the sentence is selected, at least one of: the length of the sentence, whether the sentence contains English characters, whether the sentence contains numbers, the confidence of the voice recognition result, and the number of times the sentence is selected; the target model is used to predict a probability that a speech recognition result generated based on the speech input is selected.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the click-sequencing based speech recognition result optimization method of any one of claims 1 to 5 when the program is executed by the processor.

8. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps of the click-sequencing based speech recognition result optimization method according to any one of claims 1 to 5.

9. A computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the click-sequencing based speech recognition result optimization method of any one of claims 1 to 5.