WO2021138898A1 - 语音识别结果检测方法及装置、存储介质 - Google Patents

语音识别结果检测方法及装置、存储介质 Download PDF

Info

Publication number
WO2021138898A1
WO2021138898A1 PCT/CN2020/071389 CN2020071389W WO2021138898A1 WO 2021138898 A1 WO2021138898 A1 WO 2021138898A1 CN 2020071389 W CN2020071389 W CN 2020071389W WO 2021138898 A1 WO2021138898 A1 WO 2021138898A1
Authority
WO
WIPO (PCT)
Prior art keywords
result
score
recognition result
tested
translation
Prior art date
Application number
PCT/CN2020/071389
Other languages
English (en)
French (fr)
Inventor
薛征山
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2020/071389 priority Critical patent/WO2021138898A1/zh
Priority to CN202080088999.3A priority patent/CN114846543A/zh
Publication of WO2021138898A1 publication Critical patent/WO2021138898A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the embodiments of the present application relate to the field of voice recognition technology, and in particular, to a method and device for detecting a voice recognition result, and a storage medium.
  • some language features of the language of the speech recognition result are mainly used to train the corresponding error detection model to perform the error detection of the speech recognition result.
  • an error detection model can be designed based on word collocation and context information to achieve error detection of Chinese speech recognition results.
  • the error detection model corresponding to the language of the speech recognition result is used for error detection, there are fewer detection features that can be used, and the detection accuracy rate is low.
  • the embodiments of the present application expect to provide a voice recognition result detection method, device, and storage medium.
  • the embodiment of the present application provides a method for detecting a voice recognition result, including:
  • the first evaluation feature is used to characterize the recognition and translation effect of the speech recognition result to be tested;
  • the determining the first evaluation feature based on the translation result to be tested and the speech recognition result to be tested includes:
  • the first translation score, the first confusion score, and the first language model score are determined as the first evaluation feature.
  • the evaluating the voice recognition result to be tested based on the first evaluation feature to obtain the first evaluation result includes:
  • the first translation score, the first confusion score, and the first language model score are weighted by using the three feature weights to obtain the first evaluation result.
  • the method before obtaining the feature weight corresponding to each of the first translation score, the first confusion score, and the first language model score, and obtaining three feature weights, the method further includes :
  • the three feature weights are determined.
  • the second translation score, the second confusion score, the second language model score, the preset detection result, and the three preset weights are used to determine the three Feature weights, including:
  • the second evaluation feature is used for characterization The recognition and translation effects of the sample speech recognition results
  • the three preset weights are adjusted to obtain the three feature weights.
  • the adjusting the three preset weights based on the error detection result of the sample speech recognition result and the preset detection result to obtain the three feature weights includes:
  • the three preset weights are adjusted according to the weight adjustment algorithm until the error detection result of the sample speech recognition result is the same as the preset detection result, and the three feature weights are obtained.
  • the determining the error detection result of the voice recognition result to be tested according to the first judgment result includes:
  • the error detection result of the voice recognition result to be tested is no error.
  • the judging whether the first evaluation result meets a preset condition and obtaining the first judgment result includes:
  • the first evaluation result is greater than or equal to the evaluation threshold, it is determined that the first judgment result is that the first evaluation result satisfies the preset condition.
  • the embodiment of the present application provides a voice recognition result detection device, including:
  • the translation module is configured to obtain the voice recognition result to be tested, and to use the machine translation model to translate the voice recognition result to be tested from the first language to the second language to obtain the translation result to be tested;
  • a determining module configured to determine a first evaluation feature based on the translation result to be tested and the speech recognition result to be tested; the first evaluation feature is used to characterize the recognition and translation effect of the speech recognition result to be tested;
  • An evaluation module configured to evaluate the voice recognition result to be tested based on the first evaluation feature to obtain a first evaluation result
  • the judgment module is configured to judge whether the first evaluation result meets a preset condition, obtain a first judgment result, and determine an error detection result of the voice recognition result to be tested according to the first judgment result.
  • the embodiment of the present application provides a voice recognition result detection device, the device includes a processor and a memory;
  • the processor is configured to execute a voice recognition result detection program stored in the memory to implement the above-mentioned voice recognition result detection method.
  • the embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned method for detecting a voice recognition result is realized.
  • the embodiments of the present application provide a method and device for detecting a voice recognition result, and a storage medium.
  • the method includes: obtaining a voice recognition result to be tested, and using a machine translation model to translate the voice recognition result to be tested from a first language to a second language, Obtain the translation result to be tested; determine the first evaluation feature based on the translation result to be tested and the speech recognition result to be tested; the first evaluation feature is used to characterize the recognition and translation effect of the speech recognition result to be tested; treat based on the first evaluation feature
  • the speech recognition result is measured and evaluated to obtain the first evaluation result; it is judged whether the first evaluation result meets the preset condition, the first judgment result is obtained, and the error detection result of the speech recognition result to be tested is determined according to the first judgment result.
  • the technical solution provided by the embodiments of the present application translates the result of speech recognition to be tested into another language, so as to perform error detection for the speech recognition under test in combination with the translation result. Compared with the error detection based on the relevant features of a single language, it can obtain better results. Multiple features to achieve error detection, improve the accuracy of error detection.
  • FIG. 1 is a schematic flowchart of a method for detecting a voice recognition result provided by an embodiment of this application;
  • FIG. 2 is a schematic diagram of an exemplary error detection process of a speech recognition result provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of a process for determining three feature weights according to an embodiment of this application.
  • FIG. 4 is a schematic diagram of a system architecture of an exemplary speech recognition result detection method application provided by an embodiment of the application;
  • FIG. 5 is a first structural diagram of a voice recognition result detection device provided by an embodiment of the application.
  • FIG. 6 is a second structural diagram of a voice recognition result detection device provided by an embodiment of the application.
  • FIG. 1 is a schematic flowchart of a method for detecting a voice recognition result provided by an embodiment of the application. As shown in Figure 1, the voice recognition result detection method mainly includes the following steps:
  • the speech recognition result detection device can obtain the speech recognition result to be tested, and use the machine translation model to translate the speech recognition result to be tested from the first language to the second language to obtain the translation result to be tested.
  • the voice recognition result to be tested is the voice recognition result that requires error detection, and is the text generated after voice recognition.
  • the voice recognition result detection device may receive the voice recognition result to be tested obtained by the voice recognition processing device by the voice recognition processing device.
  • the voice recognition result detection device itself may also be equipped with a recognition module to perform voice recognition to obtain the voice recognition result to be tested.
  • the specific voice recognition result to be tested and the source of the voice recognition result to be tested are not limited in this embodiment of the application.
  • the language of the voice recognition result to be tested is the first language.
  • the machine translation model is that the input text can be translated from the first language to the second language.
  • the first language and the second language are two different languages.
  • the specific machine translation model, as well as the first language and the second language can be selected according to actual needs, which are not limited in the embodiment of the present application.
  • the language of the speech recognition result to be tested is Chinese, that is, the first language is Chinese, and the machine translation model can translate the speech recognition result to be tested from Chinese to English, that is, the second language is English, so as to obtain the translation result to be tested in English.
  • S102 Determine a first evaluation feature based on the translation result to be tested and the speech recognition result to be tested; the first evaluation feature is used to characterize the recognition and translation effect of the speech recognition result to be tested.
  • the voice recognition result detection device may determine the first evaluation feature based on the translation result to be tested and the voice recognition result to be tested. Features are used to characterize the recognition and translation effects of the voice recognition results to be tested.
  • the voice recognition result detection device determines the first evaluation feature based on the translation result to be tested and the voice recognition result to be tested, including: obtaining the first translation score and the first translation score of the translation result to be tested by the machine translation model The first perplexity score; input the speech recognition result to be tested into the language model corresponding to the first language to obtain the first language model score; determine the first translation score, the first perplexity score and the first language model score as the first evaluation feature.
  • the voice recognition result detection device uses the machine translation model to translate the voice recognition result under test in step S101, and can actually translate multiple translations in the second language.
  • the corresponding translation score and confusion score are determined for each translation result.
  • the translation score represents the overall translation effect of the translation result, for example, the degree of fluency and semantic matching
  • the confusion score represents the degree of confusion that the translation result is subjected to during the translation process.
  • the machine translation model actually determines the translation result with the highest translation score among multiple translation results as the translation result to be tested corresponding to the speech recognition result to be tested. Therefore, the speech recognition result detection device can directly obtain the machine translation model to be tested translation
  • the translation score and confusion score of the result are determined as the first translation score and the first confusion score.
  • the language of the voice recognition result to be tested is the first language. Therefore, the voice recognition result detection device can input the voice recognition result to be tested into the language model corresponding to the first language.
  • the model can be based on the designed first language-related word collocations, and the context structure is evaluated in terms of fluency, sentence structure, etc., to obtain the corresponding language model score. That is to say, the first speech model score represents the effect of semantic expression of the speech recognition result of the side in the first language environment.
  • the voice recognition result to be tested is "How is the weather today"
  • the voice recognition result detection device may input the voice recognition result to be tested into the language model of the Chinese language, and the model is smooth from the sentence The degree and structure are evaluated to obtain the first language model score A1.
  • the speech recognition result detection device translates "What's the weather today” into English using a machine translation model, and the translation result to be tested is "What's the weather like today", as well as the first translation score A2 and the first translation result of the translation result to be tested.
  • the voice recognition result detection device determines the first translation score, the first confusion score, and the first language model score as the first evaluation feature, not only from the voice recognition result to be tested
  • the recognition effect is considered on the language feature, and the expression result of another language of the speech recognition result to be tested is also considered. More information that characterizes the recognition effect of the speech recognition result to be tested can be obtained, so that the first evaluation feature can be used. Perform follow-up error detection more accurately.
  • the voice recognition result detection device may also determine the first evaluation feature based on the translation result to be tested and the voice recognition result to be tested from other angles in other ways.
  • the speech recognition result detection device can also use corresponding models to score the translation results to be tested and the speech recognition results to be tested in terms of smoothness, language logic, etc., and determine each score as a first evaluation feature.
  • the specific number and type of the first evaluation feature are not limited in the embodiment of this application.
  • the voice recognition result detection device after the voice recognition result detection device obtains the first evaluation feature, it further evaluates the voice recognition result to be tested based on the first evaluation feature to obtain the first evaluation result.
  • the speech recognition result detection device evaluates the speech recognition result to be tested based on the first evaluation feature to obtain the first evaluation result, including: obtaining the first first translation score and the first confusion score
  • the feature weights corresponding to each of the first language model scores are used to obtain three feature weights; the three feature weights are used to weight the first translation score, the first confusion score, and the first language model score to obtain the first language model score.
  • a corresponding feature weight is set to characterize the process of using the evaluation feature to evaluate the voice recognition result to be tested.
  • the speech recognition result detection device can multiply each of the first evaluation features with the corresponding feature weight to obtain three product results, and then calculate the sum of the three product results to be the first evaluation result.
  • the first evaluation feature includes: translation score, confusion score, and language model score
  • the speech recognition result detection device calculates the first evaluation result of the speech recognition result to be tested according to the following formula (1) :
  • Score(s) is the first evaluation result
  • TM is the translation score
  • ⁇ 1 is the feature weight corresponding to the translation score
  • PP is the perplexity score
  • ⁇ 2 is the feature weight corresponding to the perplexity score
  • LM is the speech model score
  • ⁇ 3 is the feature weight corresponding to the language model score.
  • the speech recognition result detection device performs weighting processing on the first evaluation feature. Not only can the first evaluation result be obtained in the above weighted summation method, but also other forms of weighting processing can be performed. For example, weighted average, etc., which are not limited in the embodiment of the present application.
  • S104 Determine whether the first evaluation result meets the preset condition, obtain the first judgment result, and determine the error detection result of the voice recognition result to be tested according to the first judgment result.
  • the speech recognition result detection device after the speech recognition result detection device obtains the first evaluation result, it can determine whether the first evaluation result meets the preset condition, obtain the first judgment result, and determine the waiting result according to the first judgment result. Measure the wrong detection result of the speech recognition result.
  • the voice recognition result detection device determines the error detection result of the voice recognition result to be tested according to the first judgment result, including: when the first judgment result is that the first evaluation result does not meet the preset condition In this case, it is determined that the wrong detection result of the voice recognition result to be tested is an error; when the first judgment result is that the first evaluation result meets the preset condition, it is determined that the wrong detection result of the voice recognition result to be tested is no error.
  • the voice recognition result detection device judges whether the first evaluation result meets the preset condition, and obtains the first judgment result, including: comparing the first evaluation result with the evaluation threshold; if the first evaluation result is less than In the case of the evaluation threshold, it is determined that the first judgment result is that the first evaluation result does not meet the preset condition; in the case that the first evaluation result is greater than or equal to the evaluation threshold, it is determined that the first judgment result is that the first evaluation result does not meet the Pre-conditions.
  • the first evaluation result represents the overall score of the speech recognition result to be tested in terms of fluency, typos, etc. Therefore, when the first evaluation result is less than the evaluation threshold, It indicates that there is an error in the voice recognition result to be tested. Accordingly, when the first evaluation result is greater than or equal to the evaluation threshold, it indicates that the voice recognition result to be tested is error-free.
  • the specific evaluation threshold can be set according to actual needs, and is not limited in the embodiment of the present application.
  • preset conditions may be preset according to actual error detection standards, and the specific preset conditions are not limited in the embodiments of the present application.
  • the preset condition may also be a preset interval, that is, when the first evaluation result exceeds the preset interval, it is determined that the error detection result of the voice recognition result to be tested is an error, and the first evaluation result is within the preset interval range. In the case of internal, it is determined that the error detection result of the voice recognition result to be tested is no error.
  • FIG. 2 is a schematic diagram of an exemplary error detection process of a speech recognition result provided by an embodiment of the application.
  • the speech recognition result detection device inputs the speech recognition result to be tested into the machine translation model, and the machine translation model can translate the speech recognition result to be tested from the first language to the second language, and output the translation result to be tested.
  • the first evaluation result and the evaluation threshold are combined Make a comparison to determine the wrong detection result of the voice recognition result to be tested.
  • the voice recognition result detection device may actually determine the three feature weights in a certain manner before acquiring the three feature weights in step S103.
  • FIG. 3 is a schematic diagram of a process for determining three feature weights according to an embodiment of this application. As shown in Figure 3, it mainly includes the following steps:
  • the voice recognition result detection device may first obtain the sample voice recognition result and the preset detection result of the sample voice recognition result.
  • the number of sample speech recognition results can be multiple. For example, hundreds of thousands.
  • the preset detection result of the sample speech recognition result is the artificial judgment result of whether the sample speech recognition result has errors.
  • a sample speech recognition result is "the first university in 2019 starts new students", and the preset detection result is no error.
  • a sample speech recognition result that is "The second university welcomes new students in 2019”
  • the default detection result is that there is an error.
  • sample voice recognition result and the voice recognition result to be tested are in the same language, that is, both are the first language.
  • the voice recognition result detection device after the voice recognition result detection device obtains the sample voice recognition result, it can use the machine translation model to translate the sample voice recognition result from the first language to the second language to obtain the sample translation result.
  • the process of the voice recognition result detection device using the machine translation model to translate the sample voice recognition result is the same as translating the voice recognition result to be tested in step S101, and will not be repeated here.
  • S303 Determine a second translation score, a second confusion score, and a second language model score based on the sample translation result and the sample voice recognition result.
  • the voice recognition result detection device obtains the sample translation result and the sample voice recognition result, further, based on the sample translation result and the sample voice recognition result, determine the second translation score, the second confusion score and Second language model score.
  • the voice recognition result detection device determines the second translation score, the second confusion score, and the second language model score based on the sample translation result and the sample voice recognition result, which is the same as the step S102 described above. Based on the translation result to be tested and the speech recognition result to be tested, the process of determining the first translation score, the first perplexity score and the first language model score is similar, and will not be repeated here.
  • the speech recognition result detection device can also obtain the preset weights corresponding to each of the second translation score, the second confusion score, and the second language model score to obtain three preset weights.
  • three preset weights can be set in advance, for example, Each of the three preset weights is set to 1.
  • the specific three preset weights are not limited in the embodiment of this application.
  • S305 Determine three feature weights by using the second translation score, the second confusion score, the second language model score, the preset detection result and the three preset weights.
  • the voice recognition result detection device can use the second translation after obtaining the second translation score, the second confusion score, the second language model score, the preset detection result, and the three preset weights.
  • the score, the second perplexity score, the second language model score, the preset detection result, and the three preset weights are used to determine the three feature weights.
  • the voice recognition result detection device uses the second translation score, the second confusion score, the second language model score, the preset detection result, and three preset weights to determine three feature weights. , Including: using three preset weights to weight the second translation score, second confusion score, and second language model score to obtain the second evaluation result; determine whether the second evaluation result meets the preset conditions, and obtain the second The judgment result is determined, and the error detection result of the sample speech recognition result is determined according to the second judgment result; based on the error detection result of the sample speech recognition result and the preset detection result, three preset weights are adjusted to obtain three feature weights.
  • the speech recognition result detection device uses three preset weights to weight the second evaluation feature, which is the same as using three feature weights to weight the first evaluation feature in step S103.
  • the process is similar, and the difference is only in the specific values of features and weights, which will not be repeated here.
  • the voice recognition result detection device judges whether the second evaluation result meets the preset condition, which is similar to the process of judging whether the first evaluation result satisfies the preset condition in step S104, the difference is only It lies in the different objects of judgment, so I won't repeat them here.
  • the voice recognition result detection device adjusts three preset weights based on the error detection result of the sample voice recognition result and the preset detection result to obtain three feature weights, including: adjusting the algorithm according to the weights Adjust the three preset weights until the error detection result of the sample speech recognition result is the same as the preset detection result, and three feature weights are obtained.
  • the error detection result of the sample speech recognition result is the same as the preset detection result, which means that the weight setting is more appropriate. Therefore, the three preset weights after adjustment can be determined as Three feature weights.
  • a weight adjustment algorithm such as a minimum error rate training (MERT) algorithm
  • MMT minimum error rate training
  • the specific weight adjustment algorithm is not limited in the embodiment of this application.
  • the speech recognition result detection device can use a large number of sample speech recognition results to determine the weights of the three features. Therefore, the speech recognition result detection device can adjust the weights when a large number of When the error detection result of a higher proportion of the sample speech recognition results of the sample speech recognition results is the same as the preset detection result, the obtained weight can be determined as the feature weight.
  • the above voice recognition result detection method can be applied to various application scenarios that require voice recognition, so as to realize the error detection of the voice recognition result.
  • FIG. 4 is a schematic diagram of a system architecture of an exemplary speech recognition result detection method application provided by an embodiment of the application.
  • the system may include: a client, a cloud, a voice processing server, and a display screen.
  • the cloud is integrated with the voice recognition result detection method provided by the present application.
  • the client terminal collects the speech data of the speaker, and sends the collected speech data to the speech processing server.
  • the speech processing server recognizes the speech data and obtains the result of the speech recognition to be tested.
  • the voice processing server can send the voice recognition result to be tested to the cloud, and the cloud performs error detection on the voice recognition result to be tested according to the voice recognition result detection method, and returns the error detection result to the voice processing server. If the error detection result is an error, The voice processing server can correct the voice recognition result to be tested in a certain manner according to the error detection result, obtain the correct voice recognition result, and finally project the correct voice recognition result on the display screen for display.
  • the above-mentioned voice recognition result detection method is implemented by software, which can not only be integrated in the above-mentioned cloud, but also can be integrated on a mobile terminal, which is not limited in the embodiment of the present application.
  • the embodiment of the application provides a method for detecting a voice recognition result, including: obtaining a voice recognition result to be tested, and using a machine translation model to translate the voice recognition result to be tested from a first language to a second language to obtain the translation result to be tested; Based on the translation result to be tested and the voice recognition result to be tested, the first evaluation feature is determined; the first evaluation feature is used to characterize the recognition and translation effect of the voice recognition result to be tested; the speech recognition result to be tested is evaluated based on the first evaluation feature to obtain The first evaluation result: It is judged whether the first evaluation result meets the preset condition, the first judgment result is obtained, and the error detection result of the voice recognition result to be tested is determined according to the first judgment result.
  • the technical solution provided by the embodiments of the present application translates the result of speech recognition to be tested into another language, so as to perform error detection for the speech recognition under test in combination with the translation result. Compared with the error detection based on the relevant features of a single language, it can obtain better results. Multiple features to achieve error detection, improve the accuracy of error detection.
  • FIG. 5 is a first structural diagram of a voice recognition result detection device provided by an embodiment of the application. As shown in Figure 5, the voice recognition result detection device includes:
  • the translation module 501 is configured to obtain a voice recognition result to be tested, and to use a machine translation model to translate the voice recognition result to be tested from a first language to a second language to obtain a translation result to be tested;
  • the determining module 502 is configured to determine a first evaluation feature based on the translation result to be tested and the speech recognition result to be tested; the first evaluation feature is used to characterize the recognition and translation effects of the speech recognition result to be tested;
  • An evaluation module 503, configured to evaluate the voice recognition result to be tested based on the first evaluation feature to obtain a first evaluation result
  • the judgment module 504 is configured to judge whether the first evaluation result meets a preset condition, obtain a first judgment result, and determine an error detection result of the voice recognition result to be tested according to the first judgment result.
  • the determining module 502 is configured to obtain the translation score and the confusion score of the translation result to be tested by the machine translation model to obtain the first translation score and the first confusion score;
  • the speech recognition result to be tested is input into the language model corresponding to the first language to obtain a first language model score; the first translation score, the first confusion score, and the first language model score are determined as the first Evaluate characteristics.
  • the evaluation module 503 is configured to obtain a feature weight corresponding to each of the first translation score, the first confusion score, and the first language model score to obtain three feature weights Use the three feature weights to weight the first translation score, the first confusion score, and the first language model score to obtain the first evaluation result.
  • the determination module 502 is configured to obtain a sample speech recognition result and a preset detection result of the sample speech recognition result; use the machine translation model to convert the sample speech recognition result from the first One language is translated to the second language to obtain a sample translation result; based on the sample translation result and the sample speech recognition result, a second translation score, a second confusion score, and a second language model score are determined; and the The preset weights corresponding to each of the second translation score, the second confusion score, and the second language model score are used to obtain three preset weights; using the second translation score and the second confusion degree The score, the second language model score, the preset detection result, and the three preset weights are used to determine the three feature weights.
  • the determining module 502 is configured to use the three preset weights to weight the second translation score, the second confusion score, and the second language model score to obtain Second evaluation result; judging whether the second evaluation result meets the preset condition, obtaining a second judgment result, and determining the error detection result of the sample speech recognition result according to the second judgment result; based on the The error detection result of the sample speech recognition result and the preset detection result are adjusted, and the three preset weights are adjusted to obtain the three feature weights.
  • the determining module 502 is configured to adjust the three preset weights according to a weight adjustment algorithm until the error detection result of the sample speech recognition result is the same as the preset detection result, and the Three feature weights.
  • the judgment module 504 is configured to determine the error detection of the voice recognition result to be tested when the first judgment result is that the first evaluation result does not meet the preset condition The result is that there is an error; if the first judgment result is that the first evaluation result meets the preset condition, it is determined that the error detection result of the voice recognition result to be tested is no error.
  • the judgment module 504 is configured to compare the first evaluation result with an evaluation threshold; if the first evaluation result is less than the evaluation threshold, determine that the first judgment result is all The first evaluation result does not meet the preset condition; in a case where the first evaluation result is greater than or equal to the evaluation threshold, it is determined that the first judgment result is that the first evaluation result satisfies the preset condition condition.
  • the steps performed by the translation module 501, the determination module 502, the evaluation module 503, and the judgment module 504 can be implemented by a processor.
  • the voice recognition result detection device provided in the above embodiment performs the error detection of the voice recognition result
  • only the division of the above-mentioned program modules is used as an example for illustration. In actual applications, the above-mentioned processing can be allocated according to needs. Different program modules are completed, that is, the internal structure of the device is divided into different program modules to complete all or module processing described above.
  • the voice recognition result detection device provided in the foregoing embodiment and the voice recognition result detection method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, and will not be repeated here.
  • FIG. 6 is a second structural diagram of a voice recognition result detection device provided by an embodiment of the application.
  • the voice recognition result detection device includes: a processor 601, a memory 602, and a communication bus 603;
  • the communication bus 603 is configured to implement a communication connection between the processor 601 and the memory 602;
  • the processor 601 is configured to execute a voice recognition result detection program stored in the memory 602 to implement the above-mentioned voice recognition result detection method.
  • the embodiment of the application provides a speech recognition result detection device, which obtains the speech recognition result to be tested, and uses the machine translation model to translate the speech recognition result to be tested from the first language to the second language, and obtains the translation result to be tested; Determine the first evaluation feature based on the test translation result and the speech recognition result to be tested; the first evaluation feature is used to characterize the recognition and translation effect of the speech recognition result to be tested; the speech recognition result to be tested is evaluated based on the first evaluation feature to obtain The first evaluation result: Determine whether the first evaluation result meets the preset condition, obtain the first judgment result, and determine the error detection result of the voice recognition result to be tested according to the first judgment result.
  • the speech recognition result detection device provided by the embodiment of the present application translates the speech recognition result to be tested into another language to perform error detection for the speech recognition to be tested in combination with the translation result. Compared with the error detection based only on the relevant features of a single language, More features can be obtained to achieve error detection, which improves the accuracy of error detection.
  • the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by one or more processors, the above simultaneous interpretation method is implemented.
  • the computer-readable storage medium may be a volatile memory (volatile memory), such as random-access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as read-only memory (Read- Only Memory, ROM, flash memory, Hard Disk Drive (HDD) or Solid-State Drive (SSD); it can also be a respective device including one or any combination of the above-mentioned memories, such as Mobile phones, computers, tablet devices, personal digital assistants, etc.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of hardware embodiments, software embodiments, or embodiments combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device realizes the functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

一种语音识别结果检测方法、装置及存储介质,该方法包括:获取待测语音识别结果,并利用机器翻译模型将待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果(S101);基于待测翻译结果和待测语音识别结果,确定第一评估特征;第一评估特征用于表征待测语音识别结果的识别和翻译效果(S102);基于第一评估特征对待测语音识别结果进行评估,得到第一评估结果(S103);判断第一评估结果是否满足预设条件,得到第一判断结果,并根据第一判断结果确定待测语音识别结果的错误检测结果(S104)。

Description

语音识别结果检测方法及装置、存储介质 技术领域
本申请实施例涉及语音识别技术领域,尤其涉及一种语音识别结果检测方法及装置、存储介质。
背景技术
受限于语音识别的准确性,语音识别结果常常会出现错误,这将对后续的理解将造成障碍,增加了语音理解的难度。通常情况下,在对语音进行识别,得到语音识别结果之后,对语音识别结果进行错误检测,从而可以进一步纠正其中的错误。
目前,主要是利用语音识别结果所属语种的一些语言特征,训练出相应的错误检测模型,以进行语音识别结果的错误检测。例如,对于中文语音识别结果,可以基于词语搭配、上下文信息设计错误检测模型,实现中文语音识别结果的错误检测。然而,采用与语音识别结果所属语种相应的错误检测模型进行错误检测,可以利用的检测特征较少,检测准确率较低。
发明内容
为解决相关技术问题,本申请实施例期望提供一种语音识别结果检测方法及装置、存储介质。
本申请实施例的技术方案可以如下实现:
本申请实施例提供了一种语音识别结果检测方法,包括:
获取待测语音识别结果,并利用机器翻译模型将所述待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果;
基于所述待测翻译结果和所述待测语音识别结果,确定第一评估特征;所述第一评估特征用于表征所述待测语音识别结果的识别和翻译效果;
基于所述第一评估特征对所述待测语音识别结果进行评估,得到第一评估结果;
判断所述第一评估结果是否满足预设条件,得到第一判断结果,并根据所述第一判断结果确定所述待测语音识别结果的错误检测结果。
在上述方案中,所述基于所述待测翻译结果和所述待测语音识别结果,确定第一评估特征,包括:
获取所述机器翻译模型对所述待测翻译结果的翻译评分和困惑度评分,得到第一翻译评分和第一困惑度评分;
将所述待测语音识别结果输入所述第一语种对应的语言模型,得到第 一语言模型得分;
将所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分确定为所述第一评估特征。
在上述方案中,所述基于所述第一评估特征对所述待测语音识别结果进行评估,得到第一评估结果,包括:
获取所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分中每一个对应的特征权重,得到三个特征权重;
利用所述三个特征权重对所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分进行加权处理,得到所述第一评估结果。
在上述方案中,所述获取所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分中每一个对应的特征权重,得到三个特征权重之前,所述方法还包括:
获取样本语音识别结果,以及所述样本语音识别结果的预设检测结果;
利用所述机器翻译模型将所述样本语音识别结果从所述第一语种翻译至所述第二语种,得到样本翻译结果;
基于所述样本翻译结果和所述样本语音识别结果,确定第二翻译评分、第二困惑度评分和第二语言模型得分;
获取所述第二翻译评分、所述第二困惑度评分和所述第二语言模型得分中每一个对应的预设权重,得到三个预设权重;
利用所述第二翻译评分、所述第二困惑度评分、所述第二语言模型得分、所述预设检测结果和所述三个预设权重,确定所述三个特征权重。
在上述方案中,所述利用所述第二翻译评分、所述第二困惑度评分、所述第二语言模型得分、所述预设检测结果和三个所述预设权重,确定所述三个特征权重,包括:
利用所述三个预设权重对所述第二翻译评分、所述第二困惑度评分和所述第二语言模型得分进行加权处理,得到第二评估结果;所述第二评估特征用于表征所述样本语音识别结果的识别和翻译效果;
判断所述第二评估结果是否满足所述预设条件,得到第二判断结果,并根据所述第二判断结果确定所述样本语音识别结果的错误检测结果;
基于所述样本语音识别结果的错误检测结果和所述预设检测结果,调整所述三个预设权重,得到所述三个特征权重。
在上述方案中,所述基于所述样本语音识别结果的错误检测结果和所述预设检测结果,调整所述三个预设权重,得到所述三个特征权重,包括:
按照权重调整算法调整所述三个预设权重,直至所述样本语音识别结果的错误检测结果与所述预设检测结果相同,得到所述三个特征权重。
在上述方案中,所述根据第一判断结果确定所述待测语音识别结果的错误检测结果,包括:
在所述第一判断结果为所述第一评估结果不满足所述预设条件的情况 下,确定所述待测语音识别结果的错误检测结果为存在错误;
在所述第一判断结果为所述第一评估结果满足所述预设条件的情况下,确定所述待测语音识别结果的错误检测结果为无错误。
在上述方案中,所述判断所述第一评估结果是否满足预设条件,得到第一判断结果,包括:
比较所述第一评估结果和评估阈值;
在所述第一评估结果小于所述评估阈值的情况下,确定所述第一判断结果为所述第一评估结果不满足所述预设条件;
在所述第一评估结果大于或者等于所述评估阈值的情况下,确定所述第一判断结果为所述第一评估结果满足所述预设条件。
本申请实施例提供了一种语音识别结果检测装置,包括:
翻译模块,配置为获取待测语音识别结果,并利用机器翻译模型将所述待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果;
确定模块,配置为基于所述待测翻译结果和所述待测语音识别结果,确定第一评估特征;所述第一评估特征用于表征所述待测语音识别结果的识别和翻译效果;
评估模块,配置为基于所述第一评估特征对所述待测语音识别结果进行评估,得到第一评估结果;
判断模块,配置为判断所述第一评估结果是否满足预设条件,得到第一判断结果,并根据所述第一判断结果确定所述待测语音识别结果的错误检测结果。
本申请实施例提供了一种语音识别结果检测装置,所述装置包括处理器和存储器;
所述处理器,配置为执行所述存储器中存储的语音识别结果检测程序,以实现上述语音识别结果检测方法。
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述语音识别结果检测方法。
本申请实施例提供了一种语音识别结果检测方法及装置、存储介质,方法包括:获取待测语音识别结果,并利用机器翻译模型将待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果;基于待测翻译结果和待测语音识别结果,确定第一评估特征;第一评估特征用于表征所述待测语音识别结果的识别和翻译效果;基于第一评估特征对待测语音识别结果进行评估,得到第一评估结果;判断第一评估结果是否满足预设条件,得到第一判断结果,并根据第一判断结果确定待测语音识别结果的错误检测结果。本申请实施例提供的技术方案,将待测语音识别结果翻译成另一种语种,以结合翻译结果对待测语音识别进行错误检测,相比于仅基于单一语种相关特征进行错误检测,可以得到更多的特征以实现错误检测,提高了错误检测的准确率。
附图说明
图1为本申请实施例提供的一种语音识别结果检测方法的流程示意图;
图2为本申请实施例提供的一种示例性的语音识别结果的错误检测过程示意图;
图3为本申请实施例提供的一种确定三个特征权重的流程示意图;
图4为本申请实施例提供的一种示例性的语音识别结果检测方法应用的系统架构示意图;
图5为本申请实施例提供的一种语音识别结果检测装置的结构示意图一;
图6为本申请实施例提供的一种语音识别结果检测装置的结构示意图二。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅仅用于解释相关申请,而非对该申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请相关的部分。
本申请实施例提供了一种语音识别结果检测方法,通过语音识别结果检测装置实现。图1为本申请实施例提供的一种语音识别结果检测方法的流程示意图。如图1所示,语音识别结果检测方法主要包括以下步骤:
S101、获取待测语音识别结果,并利用机器翻译模型将待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果。
在本申请的实施例中,语音识别结果检测装置可以获取到待测语音识别结果,并利用机器翻译模型将待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果。
需要说明的是,在本申请的实施例中,待测语音识别结果为需要进行错误检测的语音识别结果,是语音识别后生成的文本。语音识别结果检测装置可以接收语音识别处理装置将语音识别得到的待测语音识别结果,此外,语音识别结果检测装置自身也可以配置有识别模块,以进行语音识别,得到待测语音识别结果。具体的待测语音识别结果,以及待测语音识别结果的来源本申请实施例不作限定。
需要说明的是,在本申请的实施例中,待测语音识别结果的语种为第一语种。机器翻译模型为可以将输入的文本从第一语种翻译至第二语种,第一语种和第二语种为两种不同的语种。具体的机器翻译模型,以及第一语种和第二语种可以根据实际需求选择,本申请实施例不作限定。
示例性的,在本申请的实施例中,待测语音识别结果的语种为汉语,即第一语种为汉语,机器翻译模型可以将待测语音识别结果从汉语翻译至 英语,即第二语种为英语,从而得到语种为英语的待测翻译结果。
S102、基于待测翻译结果和待测语音识别结果,确定第一评估特征;第一评估特征用于表征待测语音识别结果的识别和翻译效果。
在本申请的实施例中,语音识别结果检测装置在获得待测语音识别结果和待测语音识别结果之后,可以基于待测翻译结果和待测语音识别结果,确定第一评估特征,第一评估特征用于表征待测语音识别结果的识别和翻译效果。
具体地,在本申请的实施例中,语音识别结果检测装置基于待测翻译结果和待测语音识别结果,确定第一评估特征,包括:获取机器翻译模型对待测翻译结果的第一翻译评分和第一困惑度评分;将待测语音识别结果输入第一语种对应的语言模型,得到第一语言模型得分;将第一翻译评分、第一困惑度评分和第一语言模型得分确定为第一评估特征。
需要说明的是,在本申请的实施例中,语音识别结果检测装置在步骤S101利用机器翻译模型对待测语音识别结果进行翻译的过程中,实际上可以翻译出语种为第二语种的多个翻译结果,并针对每一个翻译结果确定出相应的翻译评分和困惑度评分。其中,翻译评分表征翻译结果的整体翻译效果,例如,流畅程度和语义匹配程度,困惑度评分表征翻译结果在翻译过程中受到的混淆程度。机器翻译模型实际上是将多个翻译结果中翻译评分最高的翻译结果,确定为待测语音识别结果对应的待测翻译结果,因此,语音识别结果检测装置可以直接获取到机器翻译模型对待测翻译结果的翻译评分和困惑度评分,确定为第一翻译评分和第一困惑度评分。
需要说明的是,在本申请的实施例中,待测语音识别结果的语种为第一语种,因此,语音识别结果检测装置可以将待测语音识别结果输入第一语种对应的语言模型,该语言模型可以基于设计的第一语种相关的词语搭配,上下文结构对待测语音识别结果从流畅度、语句结构方面等进行评估,从而得到相应的语言模型得分。也就是说,第一语音模型评分表征了待侧语音识别结果在第一语种环境下语义表达的效果。
示例性地,在本申请的实施例中,待测语音识别结果为“今天天气怎么样”,语音识别结果检测装置可以将该待测语音识别结果输入汉语语种的语言模型,该模型从语句流畅度和结构等方面进行评估,从而得到第一语言模型得分A1。此外,语音识别结果检测装置将“今天天气怎么样”利用机器翻译模型翻译至英语,得到待测翻译结果为“What’s the weather like today”,以及待测翻译结果的第一翻译评分A2和第一困惑度评分A3。其中,A1、A2和A3均为第一评估特征。
可以理解的是,在本申请的实施例中,语音识别结果检测装置将第一翻译评分、第一困惑度评分和第一语言模型评分确定为第一评估特征,不仅从待测语音识别结果的语种特征上进行识别效果的考量,还从待测语音识别结果的另一种语种的表达结果上进行考量,可以获得更多表征待测语 音识别结果识别效果的信息,从而利用第一评估特征可以更准确的进行后续错误检测。
需要说明的是,在本申请的实施例中,语音识别结果检测装置还可以按照其它方式,从其它角度基于待测翻译结果和待测语音识别结果,确定第一评估特征。例如,语音识别结果检测装置也可以对待测翻译结果和待测语音识别结果从通顺度,语言逻辑等方面利用相应的模型各自进行评分,将每一个分数确定为一个第一评估特征。具体的第一评估特征的数量和类型本申请实施例不作限定。
S103、基于第一评估特征对待测语音识别结果进行评估,得到第一评估结果。
在本申请的实施例中,语音识别结果检测装置在获得第一评估特征之后,进一步的,基于第一评估特征对待测语音识别结果进行评估,得到第一评估结果。
具体地,在本申请的实施例中,语音识别结果检测装置基于第一评估特征对待测语音识别结果进行评估,得到第一评估结果,包括:获取第一第一翻译评分、第一困惑度评分和第一语言模型得分中每一个对应的特征权重,得到三个特征权重;利用三个特征权重对第一翻译评分、第一困惑度评分和所述第一语言模型得分进行加权处理,得到第一评估结果。
需要说明的是,在本申请的实施例中,针对第一评估特征中每一种类型的评估特征,设置有对应的特征权重,以表征利用该评估特征在进行待测语音识别结果评估的过程中的重要程度。语音识别结果检测装置可以将第一评估特征中的每一个评估特征,与相应的特征权重相乘,得到三个乘积结果,之后,计算三个乘积结果的总和即为第一评估结果。
具体地,在本申请的实施例中,第一评估特征包括:翻译评分、困惑度评分和语言模型得分,语音识别结果检测装置按照以下公式(1)计算待测语音识别结果的第一评估结果:
Score(s)=λ 1×TM+λ 2×PP+λ 3×LM     (1)
其中,Score(s)为第一评估结果,TM为翻译评分,λ 1为翻译评分对应的特征权重,PP为困惑度评分,λ 2为困惑度评分对应的特征权重,LM为语音模型得分,λ 3为语言模型得分对应的特征权重。
需要说明的是,在本申请的实施例中,语音识别结果检测装置对第一评估特征进行加权处理,不仅可以按照上述加权求和的方式得到第一评估结果,还可以进行其它形式的加权处理,例如,加权平均等,本申请实施例不作限定。
S104、判断第一评估结果是否满足预设条件,得到第一判断结果,并根据第一判断结果确定得到待测语音识别结果的错误检测结果。
在本申请的实施例中,语音识别结果检测装置在得到第一评估结果之后,即可判断第一评估结果是否满足预设条件,得到第一判断结果,并根 据第一判断结果确定确定出待测语音识别结果的错误检测结果。
具体地,在本申请的实施例中,语音识别结果检测装置根据第一判断结果确定待测语音识别结果的错误检测结果,包括:在第一判断结果为第一评估结果不满足预设条件的情况下,确定待测语音识别结果的错误检测结果为存在错误;在第一判断结果为第一评估结果满足预设条件的情况下,确定待测语音识别结果的错误检测结果为无错误。
具体地,在本申请的实施例中,语音识别结果检测装置判断第一评估结果是否满足预设条件,得到第一判断结果,包括:比较第一评估结果和评估阈值;在第一评估结果小于评估阈值的情况下,确定第一判断结果为第一评估结果不满足所述预设条件;在第一评估结果大于或者等于评估阈值的情况下,确定第一判断结果为第一评估结果不满足预设条件。
可以理解的是,在本申请的实施例中,第一评估结果表征了待测语音识别结果在流畅度、错别字等各方面的整体评分,因此,在第一评估结果小于评估阈值的情况下,说明待测语音识别结果存在错误,相应的,在第一评估结果大于或者等于评估阈值的情况下,说明待测语音识别结果无错误。具体的评估阈值可以根据实际需求设置,本申请实施例不作限定。
需要说明的是,在本申请的实施例中,可以根据实际错误检测标准预先设置预设条件,具体的预设条件本申请实施例不作限定。例如,预设条件还可以为预设区间,即在第一评估结果超出预设区间的情况下,确定待测语音识别结果的错误检测结果为存在错误,在第一评估结果处于预设区间范围内的情况下,确定待测语音识别结果的错误检测结果为无错误。
图2为本申请实施例提供的一种示例性的语音识别结果的错误检测过程示意图。如图2所示,语音识别结果检测装置将待测语音识别结果输入机器翻译模型,机器翻译模型即可将待测语音识别结果第一语种翻译至第二语种,输出待测翻译结果,之后,获取待测翻译结果翻译评分、困惑度评分,以及待测语音识别结果的语言模型得分,从而利用相应的特征权重进行加权求和,得到第一评估结果,最后,将第一评估结果与评估阈值进行比较,确定待测语音识别结果的错误检测结果。
需要说明的是,在本申请的实施例中,语音识别结果检测装置在上述步骤S103中获取三个特征权重之前,实际上可以按照一定方式确定出三个特征权重。
图3为本申请实施例提供的一种确定三个特征权重的流程示意图。如图3所示,主要包括以下步骤:
S301、获取样本语音识别结果,以及样本语音识别结果的预设检测结果。
在本申请的实施例中,语音识别结果检测装置可以先获取到样本语音识别结果,以及样本语音识别结果的预设检测结果。
需要说明的是,在本申请的实施例中,样本语音识别结果的数量可以 为多个。例如几十万个。样本语音识别结果的预设检测结果,即为人工对样本语音识别结果是否存在错误的判断结果。
示例性的,在本申请的实施例中,一个样本语音识别结果为“第一大学2019年新生开学”,其预设检测结果为无错误。此外,还有一个样本语音识别结果为“第二大学热列欢迎2019年新生开学”,其预设检测结果为存在错误。
需要说明的是,在本申请的实施例中,可以按照一定比例的无错误和存在错误的预设检测结果,获取大量的样本语音识别结果,具体的比例本申请实施例不作限定。
需要说明的是,在本申请的实施例中,样本语音识别结果与待测语音识别结果的语种相同,即均为第一语种。
S302、利用机器翻译模型将样本语音识别结果从第一语种翻译至第二语种,得到样本翻译结果。
在本申请的实施例中,语音识别结果检测装置在获得样本语音识别结果之后,可以利用机器翻译模型将样本语音识别结果从第一语种翻译至第二语种,得到样本翻译结果。
需要说明的是,在本申请的实施例中,语音识别结果检测装置利用机器翻译模型翻译样本语音识别结果的过程,与上述步骤S101中翻译待测语音识别结果相同,在此不再赘述。
S303、基于样本翻译结果和样本语音识别结果,确定第二翻译评分、第二困惑度评分和第二语言模型得分。
在本申请的实施例中,语音识别结果检测装置在得到样本翻译结果和样本语音识别结果之后,进一步的,基于样本翻译结果和样本语音识别结果,确定第二翻译评分、第二困惑度评分和第二语言模型得分。
需要说明的是,在本申请的实施例中,语音识别结果检测装置基于样本翻译结果和样本语音识别结果,确定第二翻译评分、第二困惑度评分和第二语言模型得分,与上述步骤S102中基于待测翻译结果和待测语音识别结果,确定第一翻译评分、第一困惑度评分和第一语言模型得分的过程类似,在此不再赘述。
S304、获取第二翻译评分、第二困惑度评分和第二语言模型得分中每一个对应的预设权重,得到三个预设权重。
在本申请的实施例中,语音识别结果检测装置还可以获取到第二翻译评分、第二困惑度评分和第二语言模型得分中每一个对应的预设权重,得到三个预设权重。
需要说明的是,在本申请的实施例中,因为在初始情况下,无法判断不同类型的特征在进行错误检测过程中的重要程度,因此,可以预先设置三个预设权重,例如,可以将三个预设权重中的每一个权重均设置为1。具体的三个预设权重本申请实施例不作限定。
S305、利用第二翻译评分、第二困惑度评分、第二语言模型得分、预设检测结果和三个预设权重,确定三个特征权重。
在本申请的实施例中,语音识别结果检测装置在得到第二翻译评分、第二困惑度评分、第二语言模型得分、预设检测结果和三个预设权重之后,即可利用第二翻译评分、第二困惑度评分、第二语言模型得分、预设检测结果和三个预设权重,确定三个特征权重。
具体地,在本申请的实施例中,语音识别结果检测装置利用第二翻译评分、第二困惑度评分、第二语言模型得分、预设检测结果和三个预设权重,确定三个特征权重,包括:利用三个预设权重对第二翻译评分、第二困惑度评分和第二语言模型得分进行加权处理,得到第二评估结果;判断第二评估结果是否满足预设条件,得到第二判断结果,并根据第二判断结果确定样本语音识别结果的错误检测结果;基于样本语音识别结果的错误检测结果和预设检测结果,调整三个预设权重,得到三个特征权重。
需要说明的是,在本申请的实施例中,语音识别结果检测装置利用三个预设权重对第二评估特征进行加权处理,与上述步骤S103中利用三个特征权重对第一评估特征加权处理的过程类似,区别仅在于特征和权重的具体值,在此不再赘述。
需要说明的是,在本申请的实施例中,语音识别结果检测装置判断第二评估结果是否满足预设条件,与上述步骤S104中判断第一评估结果是否满足预设条件的过程类似,区别仅在于判断的对象不同,在此不再赘述。
具体地,在本申请的实施例中,语音识别结果检测装置基于样本语音识别结果的错误检测结果和预设检测结果,调整三个预设权重,得到三个特征权重,包括:按照权重调整算法调整三个预设权重,直至样本语音识别结果的错误检测结果与预设检测结果相同,得到三个特征权重。
可以理解的是,在本申请的实施例中,样本语音识别结果的错误检测结果与预设检测结果相同,即说明权重设置的较为合适,因此,可以将调整后的三个预设权重确定为三个特征权重。
需要说明的是,在本申请的实施例中,可以根据实际需求预设权重调整算法,例如最小错误率训练(Minimum error rate training,MERT)算法等。具体的权重调整算法本申请实施例不作限定。
可以理解的是,在本申请的实施例中,语音识别结果检测装置可以利用大量的样本语音识别结果进行三个特征权重的确定,因此,语音识别结果检测装置可以在调整权重过程中,当大量的样本语音识别结果中较高比例的样本语音识别结果的错误检测结果与预设检测结果相同时,即可将得到的权重确定为特征权重。
需要说明的是,在本申请的实施例中,上述语音识别结果检测方法可以应用于各种需要语音识别的应用场景中,以实现语音识别结果的错误检测。
图4为本申请实施例提供的一种示例性的语音识别结果检测方法应用的系统架构示意图。如图4所示,所述系统可包括:客户端、云端、语音处理服务器和显示屏幕,其中,云端中集成有本申请提供的语音识别结果检测方法。
实际应用中,在进行会议演讲的过程中,客户端采集演讲者的语音数据,将采集的语音数据发送给语音处理服务器,该语音处理服务器对语音数据进行识别,得到待测语音识别结果,之后,语音处理服务器可以将待测语音识别结果发送云端,由云端按照语音识别结果检测方法对待测语音识别结果进行错误检测,并将错误检测结果返回给语音处理服务器,如果错误检测结果为存在错误,语音处理服务器即可根据错误检测结果对待测语音识别结果按照一定方式进行纠正,得到正确的语音识别结果,最终将正确的语音识别结果投屏到显示屏幕上进行展示。
需要说明的是,在本申请的实施例中,上述语音识别结果检测方法采用软件的方式实现,不仅可以集成在上述云端,还可以集成在移动终端上,本申请实施例不作限定。
本申请实施例提供了一种语音识别结果检测方法,包括:获取待测语音识别结果,并利用机器翻译模型将待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果;基于待测翻译结果和待测语音识别结果,确定第一评估特征;第一评估特征用于表征待测语音识别结果的识别和翻译效果;基于第一评估特征对待测语音识别结果进行评估,得到第一评估结果;判断第一评估结果是否满足预设条件,得到第一判断结果,并根据第一判断结果确定待测语音识别结果的错误检测结果。本申请实施例提供的技术方案,将待测语音识别结果翻译成另一种语种,以结合翻译结果对待测语音识别进行错误检测,相比于仅基于单一语种相关特征进行错误检测,可以得到更多的特征以实现错误检测,提高了错误检测的准确率。
本申请实施例提供了一种语音识别结果检测装置。图5为本申请实施例提供的一种语音识别结果检测装置的结构示意图一。如图5所示,语音识别结果检测装置包括:
翻译模块501,配置为获取待测语音识别结果,并利用机器翻译模型将所述待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果;
确定模块502,配置为基于所述待测翻译结果和所述待测语音识别结果,确定第一评估特征;所述第一评估特征用于表征所述待测语音识别结果的识别和翻译效果;
评估模块503,配置为基于所述第一评估特征对所述待测语音识别结果进行评估,得到第一评估结果;
判断模块504,配置为判断所述第一评估结果是否满足预设条件,得到得到第一判断结果,并根据所述第一判断结果确定所述待测语音识别结果的错误检测结果。
在一实施例中,所述确定模块502,配置为获取所述机器翻译模型对所述待测翻译结果的翻译评分和困惑度评分,得到第一翻译评分和第一困惑度评分;将所述待测语音识别结果输入所述第一语种对应的语言模型,得到第一语言模型得分;将所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分确定为第一评估特征。
在一实施例中,所述评估模块503,配置为获取所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分中每一个对应的特征权重,得到三个特征权重;利用所述三个特征权重对所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分进行加权处理,得到所述第一评估结果。
在一实施例中,所述确定模块502,配置为获取样本语音识别结果,以及所述样本语音识别结果的预设检测结果;利用所述机器翻译模型将所述样本语音识别结果从所述第一语种翻译至所述第二语种,得到样本翻译结果;基于所述样本翻译结果和所述样本语音识别结果,确定第二翻译评分、第二困惑度评分和第二语言模型得分;获取所述第二翻译评分、所述第二困惑度评分和所述第二语言模型得分中每一个对应的预设权重,得到三个预设权重;利用所述第二翻译评分、所述第二困惑度评分、所述第二语言模型得分、所述预设检测结果和所述三个预设权重,确定所述三个特征权重。
在一实施例中,所述确定模块502,配置为利用所述三个预设权重对所述第二翻译评分、所述第二困惑度评分、所述第二语言模型得分进行加权处理,得到第二评估结果;判断所述第二评估结果是否满足所述预设条件,得到得到第二判断结果,并根据所述第二判断结果确定所述样本语音识别结果的错误检测结果;基于所述样本语音识别结果的错误检测结果和所述预设检测结果,调整所述三个预设权重,得到所述三个特征权重。
在一实施例中,所述确定模块502,配置为按照权重调整算法调整所述三个预设权重,直至所述样本语音识别结果的错误检测结果与所述预设检测结果相同,得到所述三个特征权重。
在一实施例中,所述判断模块504,配置为在所述第一判断结果为所述第一评估结果不满足所述预设条件的情况下,确定所述待测语音识别结果的错误检测结果为存在错误;在所述第一判断结果为所述第一评估结果满足所述预设条件的情况下,确定所述待测语音识别结果的错误检测结果为无错误。
在一实施例中,所述判断模块504,配置为比较所述第一评估结果和评估阈值;在所述第一评估结果小于所述评估阈值的情况下,确定所述第一判断结果为所述第一评估结果不满足所述预设条件;在所述第一评估结果大于或者等于所述评估阈值的情况下,确定所述第一判断结果为所述第一评估结果满足所述预设条件。
需要说明的是,实际应用时,所述翻译模块501、所述确定模块502、所述评估模块503和所述判断模块504所执行的步骤可由处理器实现。
需要说明的是:上述实施例提供的语音识别结果检测装置在进行语音识别结果的错误检测时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者模块处理。另外,上述实施例提供的语音识别结果检测装置与语音识别结果检测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
基于上述程序模块的硬件实现,且为了实现申请实施例的方法,本申请实施例还提供了一种语音识别结果检测装置。图6为本申请实施例提供的一种语音识别结果检测装置的结构示意图二。如图6所示,语音识别结果检测装置包括:处理器601、存储器602和通信总线603;
所述通信总线603,配置为实现所述处理器601和所述存储器602之间的通信连接;
所述处理器601,配置为执行所述存储器602中存储的语音识别结果检测程序,以实现上述语音识别结果检测方法。
本申请实施例提供了一种语音识别结果检测装置,获取待测语音识别结果,并利用机器翻译模型将待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果;基于待测翻译结果和待测语音识别结果,确定第一评估特征;第一评估特征用于表征所述待测语音识别结果的识别和翻译效果;基于第一评估特征对待测语音识别结果进行评估,得到第一评估结果;判断第一评估结果是否满足预设条件,得到第一判断结果,并根据第一判断结果确定待测语音识别结果的错误检测结果。本申请实施例提供的语音识别结果检测装置,将待测语音识别结果翻译成另一种语种,以结合翻译结果对待测语音识别进行错误检测,相比于仅基于单一语种相关特征进行错误检测,可以得到更多的特征以实现错误检测,提高了错误检测的准确率。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被一个或者多个处理器执行时实现上述同声传译方法。计算机可读存储介质可以是易失性存储器(volatile memory),例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(non-volatile memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);也可以是包括上述存储器之一或任意组合的各自设备,如移动电话、计算机、平板设备、个人数字助理等。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例、或结 合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的实现流程示意图和/或方框图来描述的。应理解可由计算机程序指令实现流程示意图和/或方框图中的每一流程和/或方框、以及实现流程示意图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本实用申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (11)

  1. 一种语音识别结果检测方法,包括:
    获取待测语音识别结果,并利用机器翻译模型将所述待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果;
    基于所述待测翻译结果和所述待测语音识别结果,确定第一评估特征;所述第一评估特征用于表征所述待测语音识别结果的识别和翻译效果;
    基于所述第一评估特征对所述待测语音识别结果进行评估,得到第一评估结果;
    判断所述第一评估结果是否满足预设条件,得到第一判断结果,并根据所述第一判断结果确定所述待测语音识别结果的错误检测结果。
  2. 根据权利要求1所述的方法,其中,所述基于所述待测翻译结果和所述待测语音识别结果,确定第一评估特征,包括:
    获取所述机器翻译模型对所述待测翻译结果的翻译评分和困惑度评分,得到第一翻译评分和第一困惑度评分;
    将所述待测语音识别结果输入所述第一语种对应的语言模型,得到第一语言模型得分;
    将所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分确定为所述第一评估特征。
  3. 根据权利要求2所述的方法,其中,所述基于所述第一评估特征对所述待测语音识别结果进行评估,得到第一评估结果,包括:
    获取所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分中每一个对应的特征权重,得到三个特征权重;
    利用所述三个特征权重对所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分进行加权处理,得到所述第一评估结果。
  4. 根据权利要求3所述的方法,其中,所述获取所述第一翻译评分、所述第一困惑度评分和所述第一语言模型得分中每一个对应的特征权重,得到三个特征权重之前,所述方法还包括:
    获取样本语音识别结果,以及所述样本语音识别结果的预设检测结果;
    利用所述机器翻译模型将所述样本语音识别结果从所述第一语种翻译至所述第二语种,得到样本翻译结果;
    基于所述样本翻译结果和所述样本语音识别结果,确定第二翻译评分、第二困惑度评分和第二语言模型得分;
    获取所述第二翻译评分、所述第二困惑度评分和所述第二语言模型得分中每一个对应的预设权重,得到三个预设权重;
    利用所述第二翻译评分、所述第二困惑度评分、所述第二语言模型得分、所述预设检测结果和所述三个预设权重,确定所述三个特征权重。
  5. 根据权利要求4所述的方法,其中,所述利用所述第二翻译评分、所述第二困惑度评分、所述第二语言模型得分、所述预设检测结果和所述三个预设权重,确定所述三个特征权重,包括:
    利用所述三个预设权重对所述第二翻译评分、所述第二困惑度评分和所述第二语言模型得分进行加权处理,得到第二评估结果;所述第二评估特征用于表征所述样本语音识别结果的识别和翻译效果;
    判断所述第二评估结果是否满足所述预设条件,得到第二判断结果,并根据所述第二判断结果确定所述样本语音识别结果的错误检测结果;
    基于所述样本语音识别结果的错误检测结果和所述预设检测结果,调整所述三个预设权重,得到所述三个特征权重。
  6. 根据权利要求5所述的方法,其中,所述基于所述样本语音识别结果的错误检测结果和所述预设检测结果,调整所述三个预设权重,得到所述三个特征权重,包括:
    按照权重调整算法调整所述三个预设权重,直至所述样本语音识别结果的错误检测结果与所述预设检测结果相同,得到所述三个特征权重。
  7. 根据权利要求1-6任一项所述的方法,其中,所述根据第一判断结果确定所述待测语音识别结果的错误检测结果,包括:
    在所述第一判断结果为所述第一评估结果不满足所述预设条件的情况下,确定所述待测语音识别结果的错误检测结果为存在错误;
    在所述第一判断结果为所述第一评估结果满足所述预设条件的情况下,确定所述待测语音识别结果的错误检测结果为无错误。
  8. 根据权利要求1-7任一项所述的方法,其中,所述判断所述第一评估结果是否满足预设条件,得到第一判断结果,包括:
    比较所述第一评估结果和评估阈值;
    在所述第一评估结果小于所述评估阈值的情况下,确定所述第一判断结果为所述第一评估结果不满足所述预设条件;
    在所述第一评估结果大于或者等于所述评估阈值的情况下,确定第一判断结果为所述第一评估结果满足所述预设条件。
  9. 一种语音识别结果检测装置,包括:
    翻译模块,配置为获取待测语音识别结果,并利用机器翻译模型将所述待测语音识别结果从第一语种翻译至第二语种,得到待测翻译结果;
    确定模块,配置为基于所述待测翻译结果和所述待测语音识别结果,确定第一评估特征;所述第一评估特征用于表征所述待测语音识别结果的识别和翻译效果;
    评估模块,配置为基于所述第一评估特征对所述待测语音识别结果进行评估,得到第一评估结果;
    判断模块,配置为判断所述第一评估结果是否满足预设条件,得到第一判断结果,并根据所述第一判断结果确定所述待测语音识别结果的错误 检测结果。
  10. 一种语音识别结果检测装置,所述装置包括处理器和存储器;
    所述处理器,配置为执行所述存储器中存储的语音识别结果检测程序,以实现权利要求1至8任一项所述的语音识别结果检测方法。
  11. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如权利要求1至8任一项所述的语音识别结果检测方法。
PCT/CN2020/071389 2020-01-10 2020-01-10 语音识别结果检测方法及装置、存储介质 WO2021138898A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/071389 WO2021138898A1 (zh) 2020-01-10 2020-01-10 语音识别结果检测方法及装置、存储介质
CN202080088999.3A CN114846543A (zh) 2020-01-10 2020-01-10 语音识别结果检测方法及装置、存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/071389 WO2021138898A1 (zh) 2020-01-10 2020-01-10 语音识别结果检测方法及装置、存储介质

Publications (1)

Publication Number Publication Date
WO2021138898A1 true WO2021138898A1 (zh) 2021-07-15

Family

ID=76787656

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/071389 WO2021138898A1 (zh) 2020-01-10 2020-01-10 语音识别结果检测方法及装置、存储介质

Country Status (2)

Country Link
CN (1) CN114846543A (zh)
WO (1) WO2021138898A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113746989A (zh) * 2021-08-23 2021-12-03 北京高阳捷迅信息技术有限公司 客服智能质检的方法、装置、设备和存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000222406A (ja) * 1999-01-27 2000-08-11 Sony Corp 音声認識翻訳装置及び方法
CN104050160A (zh) * 2014-03-12 2014-09-17 北京紫冬锐意语音科技有限公司 一种机器与人工翻译相融合的口语翻译方法和装置
CN105336342A (zh) * 2015-11-17 2016-02-17 科大讯飞股份有限公司 语音识别结果评价方法及系统
CN107086040A (zh) * 2017-06-23 2017-08-22 歌尔股份有限公司 语音识别能力测试方法和装置
CN107544726A (zh) * 2017-07-04 2018-01-05 百度在线网络技术(北京)有限公司 基于人工智能的语音识别结果纠错方法、装置及存储介质
CN110211571A (zh) * 2019-04-26 2019-09-06 平安科技(深圳)有限公司 错句检测方法、装置及计算机可读存储介质
CN110556127A (zh) * 2019-09-24 2019-12-10 北京声智科技有限公司 语音识别结果的检测方法、装置、设备及介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000222406A (ja) * 1999-01-27 2000-08-11 Sony Corp 音声認識翻訳装置及び方法
CN104050160A (zh) * 2014-03-12 2014-09-17 北京紫冬锐意语音科技有限公司 一种机器与人工翻译相融合的口语翻译方法和装置
CN105336342A (zh) * 2015-11-17 2016-02-17 科大讯飞股份有限公司 语音识别结果评价方法及系统
CN107086040A (zh) * 2017-06-23 2017-08-22 歌尔股份有限公司 语音识别能力测试方法和装置
CN107544726A (zh) * 2017-07-04 2018-01-05 百度在线网络技术(北京)有限公司 基于人工智能的语音识别结果纠错方法、装置及存储介质
CN110211571A (zh) * 2019-04-26 2019-09-06 平安科技(深圳)有限公司 错句检测方法、装置及计算机可读存储介质
CN110556127A (zh) * 2019-09-24 2019-12-10 北京声智科技有限公司 语音识别结果的检测方法、装置、设备及介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113746989A (zh) * 2021-08-23 2021-12-03 北京高阳捷迅信息技术有限公司 客服智能质检的方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN114846543A (zh) 2022-08-02

Similar Documents

Publication Publication Date Title
US11393492B2 (en) Voice activity detection method, method for establishing voice activity detection model, computer device, and storage medium
US11093813B2 (en) Answer to question neural networks
US9589564B2 (en) Multiple speech locale-specific hotword classifiers for selection of a speech locale
US9589560B1 (en) Estimating false rejection rate in a detection system
US10242670B2 (en) Syntactic re-ranking of potential transcriptions during automatic speech recognition
US20140350934A1 (en) Systems and Methods for Voice Identification
CN109473093B (zh) 语音识别方法、装置、计算机设备及存储介质
US10997965B2 (en) Automated voice processing testing system and method
WO2021093380A1 (zh) 一种噪声处理方法、装置、系统
US11270686B2 (en) Deep language and acoustic modeling convergence and cross training
WO2023283823A1 (zh) 语音对抗样本检测方法、装置、设备及计算机可读存储介质
CN110704597B (zh) 对话系统可靠性校验方法、模型生成方法及装置
WO2020252935A1 (zh) 声纹验证方法、装置、设备及存储介质
US20150242386A1 (en) Using language models to correct morphological errors in text
US8005674B2 (en) Data modeling of class independent recognition models
CN109947651B (zh) 人工智能引擎优化方法和装置
WO2021138898A1 (zh) 语音识别结果检测方法及装置、存储介质
KR20180039371A (ko) 자동 통역 시스템
CN112395857A (zh) 基于对话系统的语音文本处理方法、装置、设备及介质
EP4027337A1 (en) Speech recognition method and apparatus, electronic device and storage medium
JP2023162104A (ja) 機械翻訳方法、装置、デバイス及び記憶媒体
US10026396B2 (en) Frequency warping in a speech recognition system
JP6675683B2 (ja) 言語判断装置、音声認識装置、言語判断方法、およびプログラム
TWI697890B (zh) 語音校正系統及語音校正方法
US11922943B1 (en) KPI-threshold selection for audio-transcription models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911494

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.12.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20911494

Country of ref document: EP

Kind code of ref document: A1