WO2022095353A1 - Speech recognition result evaluation method, apparatus and device, and storage medium - Google Patents

Speech recognition result evaluation method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2022095353A1
WO2022095353A1 PCT/CN2021/090436 CN2021090436W WO2022095353A1 WO 2022095353 A1 WO2022095353 A1 WO 2022095353A1 CN 2021090436 W CN2021090436 W CN 2021090436W WO 2022095353 A1 WO2022095353 A1 WO 2022095353A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
characters
detected
preset
sequence
Prior art date
Application number
PCT/CN2021/090436
Other languages
French (fr)
Chinese (zh)
Inventor
陈益
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022095353A1 publication Critical patent/WO2022095353A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a method, device, device and storage medium for evaluating a speech recognition result.
  • Video return visit is one of the methods for the company to maintain customers.
  • the company's operation and maintenance personnel conduct video return visits to customers, so that the company can further understand customer needs.
  • One of the technologies used in the video interview is speech recognition technology (automatic speech recognition, ASR).
  • Speech recognition technology is also called automatic speech recognition.
  • Input that is to say, in the video return visit project, the voice replied by the customer is recognized by the speech recognition technology, and then the recognized speech is converted into the corresponding text to realize the speech recognition of the video return visit. After the speech is converted to text using the speech recognition technology, the accuracy of the speech-to-text conversion is usually determined by random inspection.
  • the inventor realizes that in the process of detecting the conversion of speech into text by means of random inspection, not only the steps are complicated, but also a lot of time is consumed, which in turn leads to low efficiency in evaluating the accuracy of converting the initial speech into the initial text.
  • the present application provides an evaluation of speech recognition results, which is used to improve the evaluation efficiency of evaluating the accuracy of converting initial speech into initial text.
  • a first aspect of the present application provides a method for evaluating a speech recognition result, including: acquiring initial speech in a video return visit item, and converting the initial speech based on a speech recognition function to obtain converted initial text;
  • the initial text is preprocessed by removing space characters, sorting preprocessing, and removing punctuation characters to obtain text to be detected; based on a preset sequence function, the sequence of words to be detected in the text to be detected is obtained, and the sequence of words to be detected in the text to be detected is obtained according to the preset standard word sequence.
  • Proofreading the to-be-detected word sequence, and performing proofreading marks in the to-be-detected word sequence to obtain proofreading text; using a preset calculation formula to calculate the character recognition error rate of the proofreading text; by comparing the character recognition A preset comparison result is selected for the error rate and the standard error rate, and the conversion evaluation result of the speech-to-text conversion is determined according to the preset comparison result.
  • a second aspect of the present application provides a device for evaluating speech recognition results, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor executes the The following steps are implemented during the computer-readable instruction: obtaining the initial voice in the video return visit project, and transforming the initial voice based on the voice recognition function to obtain the initial text after the conversion; performing preprocessing to delete space characters on the initial text, Sorting preprocessing and deleting punctuation character preprocessing to obtain the text to be detected; obtaining the word sequence to be detected in the text to be detected based on a preset sequence function, and proofreading the word sequence to be detected according to the preset standard word sequence , and carry out proofreading marks in the word sequence to be detected to obtain proofreading text; adopt a preset calculation formula to calculate the character recognition error rate of the proofreading text; select a preset by comparing the character recognition error rate and the standard error rate The comparison result is determined, and the conversion evaluation result of the speech-to-text conversion is determined according to
  • a third aspect of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer is caused to perform the following steps: obtaining a video in the return visit item
  • the initial voice is converted based on the voice recognition function to obtain the converted initial text;
  • the initial text is preprocessed by deleting space characters, sorting preprocessing and deleting punctuation characters to obtain the text to be detected.
  • a fourth aspect of the present application provides a device for evaluating speech recognition results, comprising: a conversion module for acquiring initial speech in a video return visit project, and converting the initial speech based on a speech recognition function to obtain a converted initial speech text; a preprocessing module is used to perform preprocessing of deleting space characters, sorting preprocessing and deleting punctuation characters on the initial text to obtain text to be detected; a proofreading module is used to obtain the to-be-detected text based on a preset sequence function Detect the word sequence to be detected in the text, proofread the word sequence to be detected according to the preset standard word sequence, and carry out proofreading marking in the word sequence to be detected to obtain the proofreading text; the calculation module is used for adopting the pre-tested word sequence.
  • the preset calculation formula calculates the character recognition error rate of the proofreading text; the determination module is used to select a preset comparison result by comparing the character recognition error rate and the standard error rate, and according to the preset comparison result Determines the results of the conversion assessment for speech-to-text.
  • the initial voice in the video return visit item is obtained, and the initial voice is converted based on a voice recognition function to obtain the converted initial text; the initial text is preprocessed and sorted by deleting space characters Preprocessing and deleting punctuation characters preprocessing to obtain text to be detected; obtaining the word sequence to be detected in the text to be detected based on a preset sequence function, and proofreading the word sequence to be detected according to the preset standard word sequence, And carry out proofreading marks in the described word sequence to be detected to obtain proofreading text; adopt a preset calculation formula to calculate the character recognition error rate of the proofreading text; select a preset by comparing the character recognition error rate and the standard error rate The comparison results are compared, and the conversion evaluation results of the speech-to-text conversion are determined according to the preset comparison results.
  • the initial speech in the video return visit item is converted by the speech recognition function to obtain the initial text, and then the initial text is preprocessed, word sequence proofreading and error rate calculation are performed to obtain the character recognition error rate, and finally the The character recognition error rate and the standard error rate are selected from the preset comparison results, and the conversion evaluation results of the speech-to-text are obtained, which improves the evaluation efficiency of evaluating the accuracy of converting the initial speech into the initial text.
  • FIG. 1 is a schematic diagram of an embodiment of a method for evaluating a speech recognition result in an embodiment of the present application
  • FIG. 2 is a schematic diagram of another embodiment of a method for evaluating a speech recognition result in an embodiment of the present application
  • FIG. 3 is a schematic diagram of an embodiment of a device for evaluating speech recognition results in an embodiment of the present application
  • FIG. 4 is a schematic diagram of another embodiment of the apparatus for evaluating the speech recognition result in the embodiment of the present application.
  • FIG. 5 is a schematic diagram of an embodiment of a device for evaluating a speech recognition result in an embodiment of the present application.
  • Embodiments of the present application provide a method, device, device, and storage medium for evaluating a speech recognition result, which are used to improve the evaluation efficiency for evaluating the accuracy of converting an initial speech into an initial text.
  • An embodiment of the method for evaluating the speech recognition result in the embodiment of the present application includes:
  • the execution subject of the present application may be a device for evaluating a speech recognition result, or may be a terminal or a server, which is not specifically limited here.
  • the embodiments of the present application take the server as an execution subject as an example for description.
  • the server collects the initial voice in the video interview through the voice collector.
  • the initial voice refers to the voice of the call or dialogue in the video interview project, and its content can include different business contents.
  • the format of the initial voice can be cda track index format ( CD audio format), WAVE format, audio interchange file format (audio interchange file format, AIFF) and moving picture experts compression standard audio layer 3 format (moving picture experts group audio layer III, MP3 format).
  • the format of the voice is limited.
  • the server After the server collects the initial voice, it converts the initial voice through the voice recognition function, and converts the initial voice into the form of text to obtain the initial text. Since the correct rate of converting speech into text by the speech recognition system is not 100%, the server needs to process the initial text and detect the accuracy rate of converting the initial speech into the initial text.
  • the initial text of the initial speech conversion is saved in the project log file through the speech recognition function. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned initial text, the above-mentioned initial text can also be stored in a node of a blockchain.
  • the server Before detecting the initial text, the server needs to preprocess the initial text to obtain the preprocessed text to be detected. processing, so as to reduce the influence on the character recognition error rate calculated by the server to convert the initial speech into the initial text in the subsequent steps.
  • the server After the server obtains the preprocessed text to be detected, it needs to obtain the word sequence to be detected in the text to be detected, and use the preset standard word sequence to proofread the to-be-detected word sequence. There are many preset standard word sequences here.
  • the server calculates the basic similarity between the word sequence to be detected and the preset standard word sequence, determines the basic similarity with the largest basic similarity value as the target similarity, and calculates the preset similarity corresponding to the target similarity.
  • the standard word sequence is used as the target standard word sequence, and then the server judges the relationship between the number of characters of the word sequence to be detected and the number of characters of the target standard word sequence, so that the word sequence to be detected in the text to be detected is proofread, and the final proofreading is obtained. text.
  • the character recognition error rate of the proofreading text is calculated by the preset calculation formula, and the character recognition error rate is the error rate when the initial speech is converted into the initial text. How many incorrectly converted characters exist in the process of converting the initial speech into the initial text, and the incorrectly converted characters are one of the factors for judging the conversion efficiency.
  • the server After the server obtains the character recognition error rate, it compares the numerical value between the character recognition error rate and the standard error rate to determine the conversion evaluation result of the speech-to-text text.
  • the comparison result here includes the first comparison result and the second comparison result. , wherein, the first comparison result is that the accuracy rate of the speech-converted text is low, and the second comparison result is that the accuracy rate of the speech-converted text is high.
  • the selected comparison result is the first comparison result, and at this time, the first comparison result is determined as the conversion evaluation result of the speech-to-speech text; when the character recognition error rate When the value of is less than or equal to the value of the standard error rate, the selected comparison result is the second comparison result, and at this time, the second comparison result is determined as the conversion evaluation result of the speech-to-text text.
  • the initial speech in the video return visit item is converted by the speech recognition function to obtain the initial text, and then the initial text is preprocessed, word sequence proofreading and error rate calculation are performed to obtain the character recognition error rate, and finally the The character recognition error rate and the standard error rate are selected from the preset comparison results, and the conversion evaluation results of the speech-to-text are obtained, which improves the evaluation efficiency of evaluating the accuracy of converting the initial speech into the initial text.
  • another embodiment of the method for evaluating the speech recognition result in the embodiment of the present application includes:
  • the server first obtains the initial voice in the video return visit item, inputs the initial voice into the voice recognition function, and extracts the voice features in the initial voice through the voice recognition function; the server converts the voice features into phonemes through a preset translation model information, wherein the phoneme information is used to indicate the smallest phonetic unit that constitutes a phonetic syllable; finally, the server matches the phoneme information with a preset standard text to generate an initial text corresponding to the initial voice.
  • the server After the server obtains the initial voice in the video return visit project, it needs to use the voice recognition function to recognize and transform the initial voice.
  • the main principle of the voice recognition function is: the server first collects a large number of voice samples for training, and then analyzes each voice in the voice samples for training. The voice feature parameters are analyzed and integrated, and a voice feature template of the voice feature parameters is established in the voice comparison library. Then the server obtains the voice information to be recognized, and performs the same processing on the voice information to obtain the target voice parameters, which are matched by the judgment method. The speech feature parameter corresponding to the target speech parameter determines the speech recognition result.
  • the recognition frameworks such as dynamic time warping method based on pattern matching and hidden Markov model method based on statistical model are used to convert multiple initial sentences of multiple target voices conveniently and quickly.
  • the phoneme information is the smallest phonetic unit divided according to the natural attributes of the voice, and the voice is analyzed according to the pronunciation action in the syllable, and an action is divided into a corresponding phoneme.
  • the phoneme information can be more accurately combined into text information.
  • the server obtains the target voice "Your company's service is good"
  • the server extracts the voice features in the target voice.
  • the obtained voice features are: [1 2 8 4 7 6 0 9 3]
  • the server converts the extracted speech features into phoneme information through the acoustic model.
  • the obtained factor information is: g u i g o n g s i f u w u h a o, to be
  • the server matches the characters corresponding to the phoneme information in the preset dictionary, such as the following characters: cabinet: g u i; expensive: g u i; worker: go n g; public: go n g; four: s i; company: s i; service: fu; service: w u; good: ha o; then the server obtains the association probability between text information in the preset association probability, such as the following probability: expensive : 0.1786, public: 0.0546, company: 0.7898, service: 0.8967, good: 0.3982; good service: 0.6785; finally, the server selects the text information with the highest correlation probability as the target text. The higher the probability of the sentence appearing, the server will combine the target texts in order to obtain the target sentence.
  • the obtained target sentence is:
  • the above-mentioned initial text can also be stored in a node of a blockchain.
  • the server first obtains the text characters of the initial text, and determines whether there are space characters between the text characters; if there are space characters between the text characters, the server deletes the space characters, and determines the remaining text characters after deleting the space characters as the first character.
  • the server obtains the position of the punctuation character in the first preprocessing text character, and takes the next character of the punctuation character as the first character of the next line, and sorts the first preprocessing text character in segments, Obtaining the second preprocessed text characters, the punctuation characters are used to indicate the symbols of the auxiliary text record language; finally, the server deletes the punctuation characters in the second preprocessed text characters, and determines the remaining second preprocessed text characters after the punctuation characters are deleted as the target. Text characters, get the text to be detected.
  • the server first deletes the space characters between each text character in the initial text to obtain the first preprocessed text character, which prevents garbled characters and facilitates the sorting of text characters by the server; then the server passes Sort the text characters by the positions of the punctuation characters in the first preprocessed text characters to ensure that there is one punctuation character and at least one text character in each row after sorting, and obtain the second preprocessed text characters, so that the first preprocessed text characters are sorted.
  • Sorting is performed to facilitate the proofreading of the first preprocessing text characters; finally, the server deletes the punctuation characters in the second preprocessing text characters, and determines the remaining second preprocessing text characters after the punctuation characters are deleted as the target text characters, and obtains the target text character to be detected.
  • Text because the punctuation characters only play the role of auxiliary text recording language, whether the punctuation characters are recognized correctly will not affect the accuracy of the text characters. Therefore, punctuation characters need to be removed.
  • the server first obtains the basic text characters in the text to be detected and the initial observation sequence, and the initial observation sequence is used to indicate the text character sequence of the basic text characters; secondly, the server divides the basic text characters according to the division rules in the preset sequence function. In order to predict the observation sequence, the predicted observation sequence is used to indicate the combination of the text character sequence; then the server uses the preset conditional probability formula to calculate the basic conditional probability that the basic text characters are arranged according to the predicted observation sequence under the arrangement condition of the initial observation sequence.
  • the text to be detected is "Your company serves well", and the basic text characters are "Your/Company/Company/Service/Service/Good", each text is a text character, and the initial observation sequence is "Your/ The initial observation sequence here is used to indicate the text character sequence of the basic text characters;
  • the server divides the basic text characters into predicted observation sequences through the division rules in the preset sequence function, and the obtained predicted observation sequence can be For "your/company/good service”, “your company/service/good”, "your company/service is good”; then the server uses the preset conditional probability formula to calculate the occurrence of basic text characters under the arrangement condition of the initial observation sequence
  • the basic conditional probability arranged according to the predicted observation sequence through the calculation of the conditional probability formula, the basic conditional probability of occurrence of "your company/company/good service” is 0.682, and the basic conditional probability of occurrence of "your company/service/good” is 0.798 , the basic conditional probability of occurrence of "your company/good service” is
  • the server marks the preset caret characters at the position of the word sequence to be detected.
  • the known standard text is: I am short of money temporarily, the number of characters corresponding to the preset standard word sequence is 5, and the recognized text to be detected is: I am not short of money temporarily, corresponding to the number of characters of the word sequence to be detected 6.
  • the server directly marks the preset insertion character at the position of the word sequence to be detected.
  • the server marks the preset deletion character at the position of the word sequence to be detected.
  • the known standard text is: I am not short of money temporarily, the number of characters corresponding to the preset standard word sequence is 6, and the recognized text to be detected is: I am short of money temporarily, corresponding to the number of characters of the word sequence to be detected 5.
  • the server directly marks the preset deletion character at the position of the word sequence to be detected.
  • the text to be detected recognized by the server may be the same as the standard text, and it is necessary to further judge whether the sequence of the to-be-detected word and the preset standard word sequence are not Similarly, the standard text here is the text content corresponding to the preset standard word sequence.
  • the word sequence to be detected is not the same as the preset standard word sequence, it means that the corresponding text to be detected is not the same as the standard text, that is to say, there are replacement characters in the text to be detected, and the server is directly at the position of the word sequence to be detected. Mark the preset replacement characters, and then determine the text to be detected after the proofreading mark is done as proofreading text.
  • the known standard text is: I am not short of money temporarily, the number of characters corresponding to the preset standard word sequence is 6, and the recognized text to be detected is: I am short of money temporarily, the characters corresponding to the word sequence to be detected Number 6,
  • the server determines whether the word sequence to be detected is the same as the preset standard word sequence, and if the server detects that the word sequence to be detected is different from the preset standard word sequence, it marks the preset word sequence at the position of the word sequence to be detected. Replace the characters, and finally the server determines the text to be detected marked with the preset insertion characters, the preset deletion characters and the preset replacement characters as the proofreading text.
  • the server counts the number of inserted characters, the number of deleted characters, the number of replaced characters, and the number of characters in the proofreading text respectively;
  • the preset calculation formula the character recognition error rate of the proofreading text is obtained, wherein the preset calculation formula is:
  • WER is the character recognition error rate
  • i is the number of inserted characters
  • s is the number of replaced characters
  • d is the number of deleted characters
  • t is the number of characters in the proofreading text.
  • the server Before the server calculates the character recognition error rate of the proofreading text, it first needs to specify the number of inserted characters, the number of deleted characters, the number of replaced characters and the number of characters in the proofreading text. Only through these variables and the preset calculation formula can the proofreading text be calculated.
  • the character recognition error rate of The number of characters to get the number of deleted characters, the number of replacement characters is obtained by counting the number of preset replacement characters, and the number of characters in the proofreading text can be obtained by directly counting the number of characters in the proofreading text. Input the above-obtained factors into the preset In the calculation formula, the character recognition error rate of the proofreading text can be obtained.
  • the server compares the character recognition error rate with the standard error rate, and determines whether the character recognition error rate is greater than the standard error rate; if the character recognition error rate is greater than the standard error rate, the server determines the preset first comparison result as speech conversion The evaluation result of text conversion, wherein the preset first comparison result is that the accuracy rate of speech-to-text conversion is low; if the character recognition error rate is not greater than the standard error rate, the server determines the preset second comparison result as speech The conversion evaluation result of the converted text, wherein the preset second comparison result is that the accuracy rate of the speech converted text is high.
  • the server After the server obtains the character recognition error rate, it compares the numerical value between the character recognition error rate and the standard error rate to determine the conversion evaluation result of the speech-to-text text.
  • the comparison result here includes the first comparison result and the second comparison result. , wherein, the first comparison result is that the accuracy rate of the speech-converted text is low, and the second comparison result is that the accuracy rate of the speech-converted text is high.
  • the selected comparison result is the first comparison result, and at this time, the first comparison result is determined as the conversion evaluation result of the speech-to-speech text; when the character recognition error rate When the value of is less than or equal to the value of the standard error rate, the selected comparison result is the second comparison result, and at this time, the second comparison result is determined as the conversion evaluation result of the speech-to-text text.
  • the standard error rate here refers to the standard for judging the conversion of initial speech into initial text.
  • the value of the standard error rate can be 60% or 88%. This application does not limit the value of the standard error rate. , the value of the standard error rate can be set according to the actual situation.
  • FIG. 3 an embodiment of the apparatus for evaluating the speech recognition result in the embodiment of the present application.
  • a conversion module 301 used to obtain the initial voice in the video return visit project, and based on the voice recognition function to convert the initial voice, to obtain the initial text after conversion
  • preprocessing module 302 used for the initial text.
  • the proofreading module 303 is configured to obtain the sequence of words to be detected in the text to be detected based on a preset sequence function, and according to the preset sequence function The standard word sequence proofreads the to-be-detected word sequence, and proofreads the to-be-detected word sequence to obtain proofreading text;
  • the calculation module 304 is used to calculate the character recognition of the proofreading text by using a preset calculation formula Error rate;
  • the determining module 305 is configured to select a preset comparison result by comparing the character recognition error rate with the standard error rate, and determine the conversion evaluation result of the speech-to-text conversion according to the preset comparison result.
  • FIG. 4 another embodiment of the apparatus for evaluating speech recognition results in the embodiment of the present application includes:
  • the conversion module 301 is used to obtain the initial voice in the video return visit project, and based on the speech recognition function, the initial voice is converted to obtain the initial text after the conversion; the preprocessing module 302 is used to delete the space for the initial text. Character preprocessing, sorting preprocessing and deleting punctuation character preprocessing, to obtain the text to be detected; the proofreading module 303 is used to obtain the sequence of words to be detected in the text to be detected based on a preset sequence function, according to the preset standard word sequence The sequence proofreads the sequence of words to be detected, and proofreads the sequence of words to be detected to obtain proofreading text; the calculation module 304 is used to calculate the character recognition error rate of the proofreading text by using a preset calculation formula The determination module 305 is used to select a preset comparison result by comparing the character recognition error rate and the standard error rate, and determine the conversion evaluation result of the speech-to-text conversion according to the preset comparison result.
  • the proofreading module 303 includes: a comparison unit 3031, configured to obtain the word sequence to be detected in the text to be detected based on a preset sequence function, and compare the word sequence to be detected with the preset standard word sequence. Perform a comparison to determine the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence; the first marking unit 3032, if the number of characters of the word sequence to be detected is greater than the preset standard The number of characters of the word sequence, it is used to mark the preset insertion character at the position of the word sequence to be detected; the second marking unit 3033, if the number of characters of the word sequence to be detected is less than the preset standard word sequence The number of characters of the word sequence to be detected is used to mark the preset deletion character at the position of the word sequence to be detected; the judgment unit 3034, if the number of characters of the word sequence to be detected is equal to the number of characters of the preset standard word sequence , then it is used to judge whether the word sequence
  • the comparison unit 3031 is specifically configured to: acquire basic text characters in the text to be detected and an initial observation sequence, where the initial observation sequence is used to indicate the text character sequence of the basic text characters;
  • the conversion module 301 is specifically used to: obtain the initial voice in the video return visit project, and input the initial voice into the voice recognition function, and extract the voice feature in the initial voice through the voice recognition function;
  • the preset translation model converts the phoneme features into phoneme information, wherein the phoneme information is used to indicate the smallest phonetic unit that constitutes a phonetic syllable; the phoneme information is matched with the preset standard text to generate the initial The initial text corresponding to the speech.
  • the preprocessing module 302 is specifically configured to: obtain the text characters of the initial text, and determine whether there are space characters between the text characters; if there are space characters between the text characters, delete the space characters , determine the remaining text characters after deleting the space character as the first preprocessing text character; obtain the position of the punctuation character in the first preprocessing text character, and take the next character of the punctuation character as the next line
  • the first character of the first character in the preprocessed text is segmented and sorted to obtain the second preprocessed text character, and the punctuation character is used to indicate the symbol of the auxiliary word record language; in the second preprocessed text
  • the punctuation character is deleted from the characters, and the second preprocessed text character remaining after the punctuation character is deleted is determined as the target text character, and the text to be detected is obtained.
  • the calculation module 304 is specifically configured to: respectively count the number of inserted characters, the number of deleted characters, the number of replaced characters and the number of characters in the proofread text; the number of inserted characters, the number of deleted characters, the number of deleted characters, The number of replacement characters and the number of characters in the proofreading text are input into a preset calculation formula to obtain the character recognition error rate of the proofreading text, wherein the preset calculation formula is:
  • WER is the character recognition error rate
  • i is the number of inserted characters
  • s is the number of replaced characters
  • d is the number of deleted characters
  • t is the number of characters in the proofreading text.
  • the determining module 305 is specifically configured to: compare the character recognition error rate with the standard error rate, and determine whether the character recognition error rate is greater than the standard error rate; if the character recognition error rate is greater than the standard error
  • the preset first comparison result is determined as the conversion evaluation result of the speech-to-speech text, wherein the preset first comparison result is that the accuracy of the speech-to-speech text is low; the character recognition error rate is not greater than the standard error rate, the preset second comparison result is determined as the conversion evaluation result of the speech-to-speech text, wherein the preset second comparison result is that the accuracy rate of the speech-to-text is high.
  • Figures 3 and 4 above describe in detail the apparatus for evaluating speech recognition results in the embodiment of the present application from the perspective of modular functional entities, and the following describes the device for evaluating speech recognition results in the embodiment of the present application in detail from the perspective of hardware processing.
  • the device 500 for evaluating a speech recognition result may vary greatly due to different configurations or performances, and may include one or more processors (central processing units, CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store application programs 533 or data 532.
  • the memory 520 and the storage medium 530 may be short-term storage or persistent storage.
  • the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 500 for evaluating the speech recognition result.
  • the processor 510 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the device 500 for evaluating the speech recognition result.
  • the apparatus 500 for evaluating speech recognition results may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and more.
  • operating systems 531 such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and more.
  • the present application also provides a device for evaluating speech recognition results.
  • the computer device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the above embodiments. The steps in the evaluation method of the speech recognition result.
  • the present application also provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer performs the following steps:
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

A speech recognition result evaluation method, apparatus and device, and a storage medium, which relate to the field of artificial intelligence, and are used for improving the evaluation efficiency of the accuracy of evaluating conversion from initial speech to initial text. The evaluation method comprises: converting initial speech in a video review item on the basis of a speech recognition function, so as to obtain initial text (101); performing space character deletion pre-processing, sorting pre-processing and punctuation character deletion pre-processing on the initial text, so as to obtain text to be subjected to detection (102); acquiring a word sequence to be subjected to detection in the text to be subjected to detection, and proofreading and marking, according to a pre-set standard word sequence, the word sequence to be subjected to detection, so as to obtain proofread text (103); calculating a character recognition error rate of the proofread text by using a pre-set calculation formula (104); and selecting a pre-set comparison result by comparing the character recognition error rate and a standard error rate, and determining a conversion evaluation result of speech conversion text (105). The present invention further relates to blockchain technology, and the initial text can be stored in a blockchain.

Description

语音识别结果的测评方法、装置、设备及存储介质Evaluation method, device, equipment and storage medium for speech recognition results
本申请要求于2020年11月04日提交中国专利局、申请号为202011215789.4、发明名称为“语音识别结果的测评方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of the Chinese patent application filed on November 04, 2020 with the application number 202011215789.4 and the invention titled "Method, Apparatus, Equipment and Storage Medium for Evaluation of Speech Recognition Results", the entire contents of which are approved by Reference is incorporated in the application.
技术领域technical field
本申请涉及人工智能领域,尤其涉及一种语音识别结果的测评方法、装置、设备及存储介质。The present application relates to the field of artificial intelligence, and in particular, to a method, device, device and storage medium for evaluating a speech recognition result.
背景技术Background technique
视频回访是现公司维护客户的手段之一,通过公司的运维人员对客户进行视频回访,令公司可以进一步了解客户需求。视频回访中采用到的技术之一为语音识别技术(automatic speech recognition,ASR),语音识别技术也被称为自动语音识别,其主要目的是将人类的语音中的词汇内容转换为计算机可读的输入,也就是说,在视频回访项目中,通过语音识别技术对客户所回复的语音进行识别,然后将识别到的语音转化成对应的文本文字,实现视频回访的语音识别。在利用语音识别技术对语音进行文本转化后,通常会采用随机抽查的方式确定语音转化为文本的准确性。Video return visit is one of the methods for the company to maintain customers. The company's operation and maintenance personnel conduct video return visits to customers, so that the company can further understand customer needs. One of the technologies used in the video interview is speech recognition technology (automatic speech recognition, ASR). Speech recognition technology is also called automatic speech recognition. Input, that is to say, in the video return visit project, the voice replied by the customer is recognized by the speech recognition technology, and then the recognized speech is converted into the corresponding text to realize the speech recognition of the video return visit. After the speech is converted to text using the speech recognition technology, the accuracy of the speech-to-text conversion is usually determined by random inspection.
发明人意识到在采用随机抽查的方式检测语音转化为文本的过程中,不仅步骤冗杂,而且消耗大量的时间,进而导致评定初始语音转化为初始文本的准确率的测评效率低下。The inventor realizes that in the process of detecting the conversion of speech into text by means of random inspection, not only the steps are complicated, but also a lot of time is consumed, which in turn leads to low efficiency in evaluating the accuracy of converting the initial speech into the initial text.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种语音识别结果的测评,用于提高评定初始语音转化为初始文本的准确率的测评效率。The present application provides an evaluation of speech recognition results, which is used to improve the evaluation efficiency of evaluating the accuracy of converting initial speech into initial text.
本申请第一方面提供了一种语音识别结果的测评方法,包括:获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;采用预置的计算公式计算所述校对文本的字符识别错误率;通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。A first aspect of the present application provides a method for evaluating a speech recognition result, including: acquiring initial speech in a video return visit item, and converting the initial speech based on a speech recognition function to obtain converted initial text; The initial text is preprocessed by removing space characters, sorting preprocessing, and removing punctuation characters to obtain text to be detected; based on a preset sequence function, the sequence of words to be detected in the text to be detected is obtained, and the sequence of words to be detected in the text to be detected is obtained according to the preset standard word sequence. Proofreading the to-be-detected word sequence, and performing proofreading marks in the to-be-detected word sequence to obtain proofreading text; using a preset calculation formula to calculate the character recognition error rate of the proofreading text; by comparing the character recognition A preset comparison result is selected for the error rate and the standard error rate, and the conversion evaluation result of the speech-to-text conversion is determined according to the preset comparison result.
本申请第二方面提供了一种语音识别结果的测评设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;基于预置的序列函数获取所述待检 测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;采用预置的计算公式计算所述校对文本的字符识别错误率;通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。A second aspect of the present application provides a device for evaluating speech recognition results, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor executes the The following steps are implemented during the computer-readable instruction: obtaining the initial voice in the video return visit project, and transforming the initial voice based on the voice recognition function to obtain the initial text after the conversion; performing preprocessing to delete space characters on the initial text, Sorting preprocessing and deleting punctuation character preprocessing to obtain the text to be detected; obtaining the word sequence to be detected in the text to be detected based on a preset sequence function, and proofreading the word sequence to be detected according to the preset standard word sequence , and carry out proofreading marks in the word sequence to be detected to obtain proofreading text; adopt a preset calculation formula to calculate the character recognition error rate of the proofreading text; select a preset by comparing the character recognition error rate and the standard error rate The comparison result is determined, and the conversion evaluation result of the speech-to-text conversion is determined according to the preset comparison result.
本申请的第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;采用预置的计算公式计算所述校对文本的字符识别错误率;通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。A third aspect of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer is caused to perform the following steps: obtaining a video in the return visit item The initial voice is converted based on the voice recognition function to obtain the converted initial text; the initial text is preprocessed by deleting space characters, sorting preprocessing and deleting punctuation characters to obtain the text to be detected. Obtain the word sequence to be detected in the text to be detected based on the preset sequence function, proofread the word sequence to be detected according to the preset standard word sequence, and carry out proofreading mark in the word sequence to be detected, Obtaining proofreading text; using a preset calculation formula to calculate the character recognition error rate of the proofreading text; selecting a preset comparison result by comparing the character recognition error rate and the standard error rate, and according to the preset comparison The result determines the conversion evaluation result of the speech-to-text.
本申请第四方面提供了一种语音识别结果的测评装置,包括:转化模块,用于获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;预处理模块,用于对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;校对模块,用于基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;计算模块,用于采用预置的计算公式计算所述校对文本的字符识别错误率;确定模块,用于通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。A fourth aspect of the present application provides a device for evaluating speech recognition results, comprising: a conversion module for acquiring initial speech in a video return visit project, and converting the initial speech based on a speech recognition function to obtain a converted initial speech text; a preprocessing module is used to perform preprocessing of deleting space characters, sorting preprocessing and deleting punctuation characters on the initial text to obtain text to be detected; a proofreading module is used to obtain the to-be-detected text based on a preset sequence function Detect the word sequence to be detected in the text, proofread the word sequence to be detected according to the preset standard word sequence, and carry out proofreading marking in the word sequence to be detected to obtain the proofreading text; the calculation module is used for adopting the pre-tested word sequence. The preset calculation formula calculates the character recognition error rate of the proofreading text; the determination module is used to select a preset comparison result by comparing the character recognition error rate and the standard error rate, and according to the preset comparison result Determines the results of the conversion assessment for speech-to-text.
本申请提供的技术方案中,获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;采用预置的计算公式计算所述校对文本的字符识别错误率;通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。本申请实施例中,通过语音识别函数对视频回访项目中的初始语音进行转化,得到初始文本,然后再对初始文本进行预处理、词序列校对和错误率计算,得到字符识别错误率,最后通过字符识别错误率与标准错误率选取预置的比对结果,得到语音转化文本的转化测评结果,提高了评定初始语音转化为初始文本的准确率的测评效率。In the technical solution provided by the present application, the initial voice in the video return visit item is obtained, and the initial voice is converted based on a voice recognition function to obtain the converted initial text; the initial text is preprocessed and sorted by deleting space characters Preprocessing and deleting punctuation characters preprocessing to obtain text to be detected; obtaining the word sequence to be detected in the text to be detected based on a preset sequence function, and proofreading the word sequence to be detected according to the preset standard word sequence, And carry out proofreading marks in the described word sequence to be detected to obtain proofreading text; adopt a preset calculation formula to calculate the character recognition error rate of the proofreading text; select a preset by comparing the character recognition error rate and the standard error rate The comparison results are compared, and the conversion evaluation results of the speech-to-text conversion are determined according to the preset comparison results. In the embodiment of the present application, the initial speech in the video return visit item is converted by the speech recognition function to obtain the initial text, and then the initial text is preprocessed, word sequence proofreading and error rate calculation are performed to obtain the character recognition error rate, and finally the The character recognition error rate and the standard error rate are selected from the preset comparison results, and the conversion evaluation results of the speech-to-text are obtained, which improves the evaluation efficiency of evaluating the accuracy of converting the initial speech into the initial text.
附图说明Description of drawings
图1为本申请实施例中语音识别结果的测评方法的一个实施例示意图;1 is a schematic diagram of an embodiment of a method for evaluating a speech recognition result in an embodiment of the present application;
图2为本申请实施例中语音识别结果的测评方法的另一个实施例示意图;2 is a schematic diagram of another embodiment of a method for evaluating a speech recognition result in an embodiment of the present application;
图3为本申请实施例中语音识别结果的测评装置的一个实施例示意图;3 is a schematic diagram of an embodiment of a device for evaluating speech recognition results in an embodiment of the present application;
图4为本申请实施例中语音识别结果的测评装置的另一个实施例示意图;4 is a schematic diagram of another embodiment of the apparatus for evaluating the speech recognition result in the embodiment of the present application;
图5为本申请实施例中语音识别结果的测评设备的一个实施例示意图。FIG. 5 is a schematic diagram of an embodiment of a device for evaluating a speech recognition result in an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供了一种语音识别结果的测评方法、装置、设备及存储介质,用于提高评定初始语音转化为初始文本的准确率的测评效率。Embodiments of the present application provide a method, device, device, and storage medium for evaluating a speech recognition result, which are used to improve the evaluation efficiency for evaluating the accuracy of converting an initial speech into an initial text.
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中语音识别结果的测评方法的一个实施例包括:For ease of understanding, the specific process of the embodiment of the present application will be described below. Please refer to FIG. 1. An embodiment of the method for evaluating the speech recognition result in the embodiment of the present application includes:
101、获取视频回访项目中的初始语音,并基于语音识别函数对初始语音进行转化,得到转化过后的初始文本;101. Acquire the initial voice in the video return visit item, and convert the initial voice based on the voice recognition function to obtain the converted initial text;
可以理解的是,本申请的执行主体可以为语音识别结果的测评装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。It can be understood that the execution subject of the present application may be a device for evaluating a speech recognition result, or may be a terminal or a server, which is not specifically limited here. The embodiments of the present application take the server as an execution subject as an example for description.
服务器通过语音收集器收集视频回访中的初始语音,初始语音指的是视频回访项目中进行通话或对话的语音,其内容可以包括不同的业务内容,初始语音的格式可以为cda音轨索引格式(CD音频格式)、WAVE格式、音频交换文件格式(audio interchange file format,AIFF)与动态影像专家压缩标准音频层面3格式(moving picture experts group audio layer III,MP3格式),在本申请中并不对初始语音的格式进行限定。The server collects the initial voice in the video interview through the voice collector. The initial voice refers to the voice of the call or dialogue in the video interview project, and its content can include different business contents. The format of the initial voice can be cda track index format ( CD audio format), WAVE format, audio interchange file format (audio interchange file format, AIFF) and moving picture experts compression standard audio layer 3 format (moving picture experts group audio layer III, MP3 format). The format of the voice is limited.
服务器收集到初始语音后,通过语音识别函数对初始语音进行转化,将初始语音转化为文字文本的形式,得到初始文本。由于语音识别系统将语音转化为文本的正确率并不为100%,因此服务器需要对初始文本进行处理,并检测由初始语音转化为初始文本的准确率。After the server collects the initial voice, it converts the initial voice through the voice recognition function, and converts the initial voice into the form of text to obtain the initial text. Since the correct rate of converting speech into text by the speech recognition system is not 100%, the server needs to process the initial text and detect the accuracy rate of converting the initial speech into the initial text.
需要说明的是,通过语音识别函数将初始语音转化的初始文本保存在项目日志文件中。需要强调的是,为进一步保证上述初始文本的私密和安全性,上述初始文本还可以存储于一区块链的节点中。It should be noted that the initial text of the initial speech conversion is saved in the project log file through the speech recognition function. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned initial text, the above-mentioned initial text can also be stored in a node of a blockchain.
102、对初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;102. Perform preprocessing of deleting space characters, sorting preprocessing and deleting punctuation characters on the initial text to obtain the text to be detected;
服务器在对初始文本进行检测前需要对初始文本进行预处理,得到预处理后的待检测文本,预处理包括删除空格字符预处理、排序预处理与删除标点字符预处理,通过对初始文本进行预处理,减少对后续步骤中服务器计算初始语音转化为初始文本的字符识别错误率的影响。Before detecting the initial text, the server needs to preprocess the initial text to obtain the preprocessed text to be detected. processing, so as to reduce the influence on the character recognition error rate calculated by the server to convert the initial speech into the initial text in the subsequent steps.
103、基于预置的序列函数获取待检测文本中的待检测词序列,根据预置的标准词序列对待检测词序列进行校对,并在待检测词序列中进行校对标记,得到校对文本;103. Obtain the sequence of words to be detected in the text to be detected based on a preset sequence function, proofread the sequence of words to be detected according to the preset standard sequence of words, and perform proofreading marks in the sequence of words to be detected to obtain proofreading text;
待服务器得到通过预处理后的待检测文本后,需要获取到待检测文本中的待检测词序 列,并利用预置的标准词序列对待检测词序列进行校对,这里预置的标准词序列有很多个,首先服务器计算待检测词序列与预置的标准词序列之间的基础相似度,将基础相似度数值最大的基础相似度确定为目标相似度,并将目标相似度所对应的预置的标准词序列作为目标标准词序列,然后服务器判断待检测词序列的字符数与目标标准词序列的字符数之间的关系,由此对待检测文本中的待检测词序列进行校对,得到最后的校对文本。After the server obtains the preprocessed text to be detected, it needs to obtain the word sequence to be detected in the text to be detected, and use the preset standard word sequence to proofread the to-be-detected word sequence. There are many preset standard word sequences here. First, the server calculates the basic similarity between the word sequence to be detected and the preset standard word sequence, determines the basic similarity with the largest basic similarity value as the target similarity, and calculates the preset similarity corresponding to the target similarity. The standard word sequence is used as the target standard word sequence, and then the server judges the relationship between the number of characters of the word sequence to be detected and the number of characters of the target standard word sequence, so that the word sequence to be detected in the text to be detected is proofread, and the final proofreading is obtained. text.
104、采用预置的计算公式计算校对文本的字符识别错误率;104. Use a preset calculation formula to calculate the character recognition error rate of the proofreading text;
待服务器得到校对文本后通过预置的计算公式计算校对文本的字符识别错误率,字符识别错误率即为由初始语音转化为初始文本时的错误率,通过对错误率的计算,服务器可以明确在初始语音转化为初始文本的过程中存在多少错误转化字符,错误转化字符为评判转化效率好坏的因素之一。After the server obtains the proofreading text, the character recognition error rate of the proofreading text is calculated by the preset calculation formula, and the character recognition error rate is the error rate when the initial speech is converted into the initial text. How many incorrectly converted characters exist in the process of converting the initial speech into the initial text, and the incorrectly converted characters are one of the factors for judging the conversion efficiency.
105、通过对比字符识别错误率与标准错误率选取预置的比对结果,并根据预置的比对结果确定语音转化文本的转化测评结果。105. Select a preset comparison result by comparing the character recognition error rate with the standard error rate, and determine a conversion evaluation result of the speech-to-text conversion according to the preset comparison result.
服务器得到字符识别错误率后,通过比较字符识别错误率与标准错误率之间的数值大小从而确定语音转化文本的转化测评结果,这里的比对结果包括第一比对结果与第二比对结果,其中,第一比对结果为语音转化文本的准确率低,第二比对结果为语音转化文本的准确率高。当字符识别错误率的数值大于标准错误率的数值时,选择的比对结果为第一比对结果,此时将第一比对结果确定为语音转化文本的转化测评结果;当字符识别错误率的数值小于或等于标准错误率的数值时,选择的比对结果为第二比对结果,此时将第二比对结果确定为语音转化文本的转化测评结果。After the server obtains the character recognition error rate, it compares the numerical value between the character recognition error rate and the standard error rate to determine the conversion evaluation result of the speech-to-text text. The comparison result here includes the first comparison result and the second comparison result. , wherein, the first comparison result is that the accuracy rate of the speech-converted text is low, and the second comparison result is that the accuracy rate of the speech-converted text is high. When the value of the character recognition error rate is greater than the value of the standard error rate, the selected comparison result is the first comparison result, and at this time, the first comparison result is determined as the conversion evaluation result of the speech-to-speech text; when the character recognition error rate When the value of is less than or equal to the value of the standard error rate, the selected comparison result is the second comparison result, and at this time, the second comparison result is determined as the conversion evaluation result of the speech-to-text text.
本申请实施例中,通过语音识别函数对视频回访项目中的初始语音进行转化,得到初始文本,然后再对初始文本进行预处理、词序列校对和错误率计算,得到字符识别错误率,最后通过字符识别错误率与标准错误率选取预置的比对结果,得到语音转化文本的转化测评结果,提高了评定初始语音转化为初始文本的准确率的测评效率。In the embodiment of the present application, the initial speech in the video return visit item is converted by the speech recognition function to obtain the initial text, and then the initial text is preprocessed, word sequence proofreading and error rate calculation are performed to obtain the character recognition error rate, and finally the The character recognition error rate and the standard error rate are selected from the preset comparison results, and the conversion evaluation results of the speech-to-text are obtained, which improves the evaluation efficiency of evaluating the accuracy of converting the initial speech into the initial text.
请参阅图2,本申请实施例中语音识别结果的测评方法的另一个实施例包括:Referring to FIG. 2, another embodiment of the method for evaluating the speech recognition result in the embodiment of the present application includes:
201、获取视频回访项目中的初始语音,并基于语音识别函数对初始语音进行转化,得到转化过后的初始文本;201. Obtain the initial voice in the video return visit project, and convert the initial voice based on the voice recognition function to obtain the converted initial text;
具体的,服务器首先获取视频回访项目中的初始语音,并将初始语音输入至语音识别函数中,通过语音识别函数提取初始语音中的语音特征;服务器通过预置的转译模型将语音特征转化为音素信息,其中,音素信息用于指示构成语音音节的最小语音单位;最后服务器将音素信息与预置的标准文字进行匹配,生成初始语音对应的初始文本。Specifically, the server first obtains the initial voice in the video return visit item, inputs the initial voice into the voice recognition function, and extracts the voice features in the initial voice through the voice recognition function; the server converts the voice features into phonemes through a preset translation model information, wherein the phoneme information is used to indicate the smallest phonetic unit that constitutes a phonetic syllable; finally, the server matches the phoneme information with a preset standard text to generate an initial text corresponding to the initial voice.
服务器在获取到视频回访项目中的初始语音之后,需要利用语音识别函数对初始语音进行识别与转化,语音识别函数的主要原理是:服务器首先收集大量的语音样本进行训练,对语音样本中的每个语音特征参数进行分析与整合,在语音比对库中建立语音特征参数的语音特征模板,然后服务器获取待识别的语音信息,对语音信息进行相同的处理后得到目 标语音参数,利用判决法匹配目标语音参数对应的语音特征参数,确定语音识别结果。整个语音识别过程中,采用基于模式匹配的动态时间规整法和基于统计模型的隐马尔可夫模型法等识别框架,便捷迅速的将多条目标语音转化的多条初始语句。After the server obtains the initial voice in the video return visit project, it needs to use the voice recognition function to recognize and transform the initial voice. The main principle of the voice recognition function is: the server first collects a large number of voice samples for training, and then analyzes each voice in the voice samples for training. The voice feature parameters are analyzed and integrated, and a voice feature template of the voice feature parameters is established in the voice comparison library. Then the server obtains the voice information to be recognized, and performs the same processing on the voice information to obtain the target voice parameters, which are matched by the judgment method. The speech feature parameter corresponding to the target speech parameter determines the speech recognition result. In the whole speech recognition process, the recognition frameworks such as dynamic time warping method based on pattern matching and hidden Markov model method based on statistical model are used to convert multiple initial sentences of multiple target voices conveniently and quickly.
可以理解的是,音素信息是根据语音的自然属性划分出来的最小语音单位,依据音节里的发音动作对语音进行解析,将一个动作划分成一个对应的音素。通过对音素单位的分析,并将音素信息与预置的标准文字进行匹配,可以更精准的将音素信息拼合成文字信息。It can be understood that the phoneme information is the smallest phonetic unit divided according to the natural attributes of the voice, and the voice is analyzed according to the pronunciation action in the syllable, and an action is divided into a corresponding phoneme. By analyzing the phoneme unit and matching the phoneme information with the preset standard text, the phoneme information can be more accurately combined into text information.
举例说明,以识别及转化目标语音“贵公司服务好”为例,首先服务器获取到目标语音“贵公司服务好”,然后服务器提取目标语音中的语音特征,如得到的语音特征为:[1 2 8 4 7 6 0 9 3],然后服务器通过声学模型将提取到的语音特征转化为音素信息,如得到因素信息为:g u i g o n g s i f u w u h a o,待得到音素信息之后,服务器在预置字典中匹配到与音素信息相对应的文字,如得到以下文字:柜:g u i;贵:g u i;工:g o n g;公:g o n g;四:s i;司:s i;服:fu;务:w u;好:h a o;然后服务器在预置关联概率中获取文字信息之间的关联概率,如得到以下概率:贵:0.1786,公:0.0546,公司:0.7898,服务:0.8967,好:0.3982;服务好:0.6785;最后服务器选取关联概率最大的文字信息作为目标文字,关联概率越大,说明按照该组合形成的词语或句子出现的概率越大,服务器按照顺序将目标文字组合在一起,得到目标语句,如得到的目标语句为:贵公司服务好。For example, taking the recognition and transformation of the target voice "Your company's service is good" as an example, first the server obtains the target voice "Your company's service is good", and then the server extracts the voice features in the target voice. For example, the obtained voice features are: [1 2 8 4 7 6 0 9 3], and then the server converts the extracted speech features into phoneme information through the acoustic model. For example, the obtained factor information is: g u i g o n g s i f u w u h a o, to be After obtaining the phoneme information, the server matches the characters corresponding to the phoneme information in the preset dictionary, such as the following characters: cabinet: g u i; expensive: g u i; worker: go n g; public: go n g; four: s i; company: s i; service: fu; service: w u; good: ha o; then the server obtains the association probability between text information in the preset association probability, such as the following probability: expensive : 0.1786, public: 0.0546, company: 0.7898, service: 0.8967, good: 0.3982; good service: 0.6785; finally, the server selects the text information with the highest correlation probability as the target text. The higher the probability of the sentence appearing, the server will combine the target texts in order to obtain the target sentence. For example, the obtained target sentence is: Your company serves well.
需要强调的是,为进一步保证上述初始文本的私密和安全性,上述初始文本还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned initial text, the above-mentioned initial text can also be stored in a node of a blockchain.
202、对初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;202. Perform preprocessing of deleting space characters, sorting preprocessing, and deleting punctuation characters on the initial text to obtain the text to be detected;
具体的,服务器首先获取初始文本的文本字符,判断文本字符之间是否存在空格字符;若文本字符之间存在空格字符,则服务器删除空格字符,将删除空格字符后剩余的文本字符确定为第一预处理文本字符;然后服务器在第一预处理文本字符中获取标点字符的位置,并将标点字符的后一个字符作为下一行的第一个字符,对第一预处理文本字符进行分段排序,得到第二预处理文本字符,标点字符用于指示辅助文字记录语言的符号;最后服务器在第二预处理文本字符中删除标点字符,将删除标点字符后剩余的第二预处理文本字符确定为目标文本字符,得到到待检测文本。Specifically, the server first obtains the text characters of the initial text, and determines whether there are space characters between the text characters; if there are space characters between the text characters, the server deletes the space characters, and determines the remaining text characters after deleting the space characters as the first character. Preprocessing text characters; then the server obtains the position of the punctuation character in the first preprocessing text character, and takes the next character of the punctuation character as the first character of the next line, and sorts the first preprocessing text character in segments, Obtaining the second preprocessed text characters, the punctuation characters are used to indicate the symbols of the auxiliary text record language; finally, the server deletes the punctuation characters in the second preprocessed text characters, and determines the remaining second preprocessed text characters after the punctuation characters are deleted as the target. Text characters, get the text to be detected.
服务器在预处理的过程中,首先将初始文本中的每个文本字符之间的空格字符删除,得到第一预处理文本字符,防止出现字符乱码,同时便于服务器对文本字符的排序;然后服务器通过第一预处理文本字符中的标点字符的位置对文本字符进行排序,确保排序后的每行存在一个标点字符与至少一个文本字符,得到第二预处理文本字符,这样将第一预处理文本字符进行排序,便于第一预处理文本字符的校对;最后服务器将第二预处理文本字符中的标点字符删除,将删除标点字符后剩余的第二预处理文本字符确定为目标文本字符,得到待检测文本,因为标点字符仅仅起到辅助文字记录语言的作用,标点字符是否识别正 确并不会影响文本字符的准确率,若不将文本字符中的标点字符删除会影响后续文本字符校对的准确率,因此需要将标点字符删除。In the process of preprocessing, the server first deletes the space characters between each text character in the initial text to obtain the first preprocessed text character, which prevents garbled characters and facilitates the sorting of text characters by the server; then the server passes Sort the text characters by the positions of the punctuation characters in the first preprocessed text characters to ensure that there is one punctuation character and at least one text character in each row after sorting, and obtain the second preprocessed text characters, so that the first preprocessed text characters are sorted. Sorting is performed to facilitate the proofreading of the first preprocessing text characters; finally, the server deletes the punctuation characters in the second preprocessing text characters, and determines the remaining second preprocessing text characters after the punctuation characters are deleted as the target text characters, and obtains the target text character to be detected. Text, because the punctuation characters only play the role of auxiliary text recording language, whether the punctuation characters are recognized correctly will not affect the accuracy of the text characters. Therefore, punctuation characters need to be removed.
203、基于预置的序列函数获取待检测文本中的待检测词序列,并将待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系;203. Obtain the word sequence to be detected in the text to be detected based on the preset sequence function, compare the word sequence to be detected with the preset standard word sequence, and determine the number of characters of the word sequence to be detected and the preset standard word The relationship between the number of characters of the sequence;
具体的,服务器首先获取待检测文本中的基础文本字符以及初始观测序列,初始观测序列用于指示基础文本字符的文本字符序列;其次服务器通过预置的序列函数中的划分规则将基础文本字符划分为预测观测序列,预测观测序列用于指示文本字符序列的组合;然后服务器利用预置的条件概率公式计算基础文本字符在初始观测序列的排列条件下,发生按照预测观测序列进行排列的基础条件概率,其中,预置的条件概率公式为:S *=argmaxP(S|O),其中,S *为目标观测序列,S为预测观测序列,且S=(s 1,s 2,…,s T),T为初始观测序列的长度,s 1为按照预测观测序列划分基础文本字符的第一个词序列,O为初始观测序列,且O=(o 1,o 2,…,o T),o 1为按照初始观测序列划分基础文本字符的第一个字序列;服务器将基础条件概率数值最大的目标条件概率对应的预测观测序列作为目标观测序列;服务器按照目标观测序列对基础文本字符进行划分,得到待检测词序列;最后服务器将待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系。 Specifically, the server first obtains the basic text characters in the text to be detected and the initial observation sequence, and the initial observation sequence is used to indicate the text character sequence of the basic text characters; secondly, the server divides the basic text characters according to the division rules in the preset sequence function. In order to predict the observation sequence, the predicted observation sequence is used to indicate the combination of the text character sequence; then the server uses the preset conditional probability formula to calculate the basic conditional probability that the basic text characters are arranged according to the predicted observation sequence under the arrangement condition of the initial observation sequence. , where the preset conditional probability formula is: S * =argmaxP(S|O), where S * is the target observation sequence, S is the predicted observation sequence, and S=(s 1 ,s 2 ,...,s T ), T is the length of the initial observation sequence, s 1 is the first word sequence that divides the basic text characters according to the predicted observation sequence, O is the initial observation sequence, and O=(o 1 ,o 2 ,...,o T ), o 1 is the first word sequence that divides the basic text characters according to the initial observation sequence; the server takes the predicted observation sequence corresponding to the target conditional probability with the largest value of the basic conditional probability as the target observation sequence; the server divides the basic text characters according to the target observation sequence to obtain the word sequence to be detected; finally, the server compares the word sequence to be detected with the preset standard word sequence, and determines the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence.
举例说明,待检测文本为“贵公司服务好”,其中的基础文本字符为“贵/公/司/服/务/好”每一个文字为一个文本字符,其中的初始观测序列为“贵/公司/服务/好”这里的初始观测序列用于指示基础文本字符的文本字符序列;其次服务器通过预置的序列函数中的划分规则将基础文本字符划分为预测观测序列,得到的预测观测序列可以为“贵/公司/服务好”、“贵公司/服务/好”、“贵公司/服务好”;然后服务器利用预置的条件概率公式计算基础文本字符在初始观测序列的排列条件下,发生按照预测观测序列进行排列的基础条件概率,通过条件概率公式的计算,得到发生“贵/公司/服务好”的基础条件概率为0.682,发生“贵公司/服务/好”的基础条件概率为0.798,发生“贵公司/服务好”的基础条件概率为0.865;服务器选取基础条件概率为0.865所对应的预测观测序列作为目标观测序列;服务器直接照“贵公司/服务好”的划分序列对“贵/公/司/服/务/好”进行划分,得到待检测词序列“贵公司/服务好”;最后服务器将待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系。For example, the text to be detected is "Your company serves well", and the basic text characters are "Your/Company/Company/Service/Service/Good", each text is a text character, and the initial observation sequence is "Your/ The initial observation sequence here is used to indicate the text character sequence of the basic text characters; secondly, the server divides the basic text characters into predicted observation sequences through the division rules in the preset sequence function, and the obtained predicted observation sequence can be For "your/company/good service", "your company/service/good", "your company/service is good"; then the server uses the preset conditional probability formula to calculate the occurrence of basic text characters under the arrangement condition of the initial observation sequence According to the basic conditional probability arranged according to the predicted observation sequence, through the calculation of the conditional probability formula, the basic conditional probability of occurrence of "your company/company/good service" is 0.682, and the basic conditional probability of occurrence of "your company/service/good" is 0.798 , the basic conditional probability of occurrence of "your company/good service" is 0.865; the server selects the predicted observation sequence corresponding to the basic conditional probability of 0.865 as the target observation sequence; /company/company/service/service/good" to get the word sequence to be detected "your company/good service"; finally, the server compares the word sequence to be detected with the preset standard word sequence to determine the word sequence to be detected The relationship between the number of characters of the preset standard word sequence and the number of characters of the preset standard word sequence.
204、若待检测词序列的字符数大于预置的标准词序列的字符数,则在待检测词序列的位置上标记预置的插入字符;204. If the number of characters of the word sequence to be detected is greater than the number of characters of the preset standard word sequence, mark the preset insertion character at the position of the word sequence to be detected;
当待检测词序列的字符数大于预置的标准序列的字符数时,说明服务器识别到的待检测词序列中的字符数比预置的标准词序列的字符数多,也就是说,待检测词序列中存在多余的插入字符,服务器则在待检测词序列的位置上标记预置的插入字符。When the number of characters in the word sequence to be detected is greater than the number of characters in the preset standard sequence, it means that the number of characters in the word sequence to be detected recognized by the server is more than the number of characters in the preset standard word sequence. If there are redundant caret characters in the word sequence, the server marks the preset caret characters at the position of the word sequence to be detected.
举例说明,已知的标准文本为:我暂时缺钱,对应预置的标准词序列的字符数为5,识别到的待检测文本为:我暂时不缺钱,对应待检测词序列的字符数6,则服务器直接在待检测词序列的位置上标记预置的插入字符。For example, the known standard text is: I am short of money temporarily, the number of characters corresponding to the preset standard word sequence is 5, and the recognized text to be detected is: I am not short of money temporarily, corresponding to the number of characters of the word sequence to be detected 6. The server directly marks the preset insertion character at the position of the word sequence to be detected.
205、若待检测词序列的字符数小于预置的标准词序列的字符数,则在待检测词序列的位置上标记预置的删除字符;205. If the number of characters of the word sequence to be detected is less than the number of characters of the preset standard word sequence, mark the preset deletion character on the position of the word sequence to be detected;
当待检测词序列的字符数小于预置的标准序列的字符数时,说明服务器识别到的待检测词序列中的字符数比预置的标准词序列的字符数少,也就是说,待检测词序列中存在缺少的删除字符,服务器则在待检测词序列的位置上标记预置的删除字符。When the number of characters in the word sequence to be detected is less than the number of characters in the preset standard sequence, it means that the number of characters in the word sequence to be detected recognized by the server is less than the number of characters in the preset standard word sequence, that is, the number of characters in the to-be-detected word sequence is less. If there is a missing deletion character in the word sequence, the server marks the preset deletion character at the position of the word sequence to be detected.
举例说明,已知的标准文本为:我暂时不缺钱,对应预置的标准词序列的字符数为6,识别到的待检测文本为:我暂时缺钱,对应待检测词序列的字符数5,则服务器直接在待检测词序列的位置上标记预置的删除字符。For example, the known standard text is: I am not short of money temporarily, the number of characters corresponding to the preset standard word sequence is 6, and the recognized text to be detected is: I am short of money temporarily, corresponding to the number of characters of the word sequence to be detected 5. The server directly marks the preset deletion character at the position of the word sequence to be detected.
206、若待检测词序列的字符数等于预置的标准词序列的字符数,则判断待检测词序列与预置的标准词序列是否相同;206. If the number of characters of the word sequence to be detected is equal to the number of characters of the preset standard word sequence, determine whether the word sequence to be detected is the same as the preset standard word sequence;
当待检测词序列的字符数等于预置的标准序列的字符数时,说明服务器识别到的待检测文本可能与标准文本可能相同,则需要进一步判断待检测词序列与预置的标准词序列是否相同,这里的标准文本是预置的标准词序列对应的文本内容。When the number of characters of the word sequence to be detected is equal to the number of characters of the preset standard sequence, it means that the text to be detected recognized by the server may be the same as the standard text, and it is necessary to further judge whether the sequence of the to-be-detected word and the preset standard word sequence are not Similarly, the standard text here is the text content corresponding to the preset standard word sequence.
207、若待检测词序列与预置的标准词序列不相同,则在待检测词序列的位置上标记预置的替换字符,将校对标记后的待检测文本确定为校对文本;207. If the word sequence to be detected is different from the preset standard word sequence, mark the preset replacement character on the position of the word sequence to be detected, and determine the text to be detected after the proofreading mark as proofreading text;
当待检测词序列与预置的标准词序列不相同时,说明对应的待检测文本与标准文本不相同,也就是说,待检测文本中存在替换字符,服务器直接在待检测词序列的位置上标记预置的替换字符,然后将做好校对标记后的待检测文本确定为校对文本即可。When the word sequence to be detected is not the same as the preset standard word sequence, it means that the corresponding text to be detected is not the same as the standard text, that is to say, there are replacement characters in the text to be detected, and the server is directly at the position of the word sequence to be detected. Mark the preset replacement characters, and then determine the text to be detected after the proofreading mark is done as proofreading text.
举例说明,已知的标准文本为:我暂时不缺钱,对应预置的标准词序列的字符数为6,识别到的待检测文本为:我暂时很缺钱,对应待检测词序列的字符数6,则服务器判断待检测词序列与预置的标准词序列是否相同,服务器检测到待检测词序列与预置的标准词序列不相同,则在待检测词序列的位置上标记预置的替换字符,最后服务器将标记预置的插入字符、预置的删除字符和预置的替换字符的待检测文本确定为校对文本。For example, the known standard text is: I am not short of money temporarily, the number of characters corresponding to the preset standard word sequence is 6, and the recognized text to be detected is: I am short of money temporarily, the characters corresponding to the word sequence to be detected Number 6, the server determines whether the word sequence to be detected is the same as the preset standard word sequence, and if the server detects that the word sequence to be detected is different from the preset standard word sequence, it marks the preset word sequence at the position of the word sequence to be detected. Replace the characters, and finally the server determines the text to be detected marked with the preset insertion characters, the preset deletion characters and the preset replacement characters as the proofreading text.
208、采用预置的计算公式计算校对文本的字符识别错误率;208. Use a preset calculation formula to calculate the character recognition error rate of the proofreading text;
具体的,服务器分别统计校对文本中的插入字符数量、删除字符数量、替换字符数量与校对文本的字符数量;服务器将插入字符数量、删除字符数量、替换字符数量与校对文本的字符数量输入至预置的计算公式中,得到校对文本的字符识别错误率,其中,预置计算公式为:Specifically, the server counts the number of inserted characters, the number of deleted characters, the number of replaced characters, and the number of characters in the proofreading text respectively; In the preset calculation formula, the character recognition error rate of the proofreading text is obtained, wherein the preset calculation formula is:
Figure PCTCN2021090436-appb-000001
Figure PCTCN2021090436-appb-000001
在式中,WER表示字符识别错误率,i表示插入字符数量,s表示替换字符数量,d表 示删除字符数量,t表示校对文本的字符数量。In the formula, WER is the character recognition error rate, i is the number of inserted characters, s is the number of replaced characters, d is the number of deleted characters, and t is the number of characters in the proofreading text.
服务器在计算校对文本的字符识别错误率之前,首先需要明确校对文本中的插入字符数量、删除字符数量、替换字符数量与校对文本的字符数量,通过这些变量与预置计算公式才可以计算校对文本的字符识别错误率,步骤203-207是服务器利用词序列对待检测文本进行校对的过程,在上述过程中,服务器可以通过统计预置的插入字符的数量得到插入字符数量,通过统计预置的删除字符的数量得到删除字符数量,通过统计预置的替换字符的数量得到替换字符数量,直接统计对校对文本中字符的数量即可得到校对文本的字符数量,将上述获取到的因素输入到预置计算公式中,即可得到校对文本的字符识别错误率。Before the server calculates the character recognition error rate of the proofreading text, it first needs to specify the number of inserted characters, the number of deleted characters, the number of replaced characters and the number of characters in the proofreading text. Only through these variables and the preset calculation formula can the proofreading text be calculated. The character recognition error rate of The number of characters to get the number of deleted characters, the number of replacement characters is obtained by counting the number of preset replacement characters, and the number of characters in the proofreading text can be obtained by directly counting the number of characters in the proofreading text. Input the above-obtained factors into the preset In the calculation formula, the character recognition error rate of the proofreading text can be obtained.
209、通过对比字符识别错误率与标准错误率选取预置的比对结果,并根据预置的比对结果确定语音转化文本的转化测评结果。209. Select a preset comparison result by comparing the character recognition error rate with the standard error rate, and determine a conversion evaluation result of the speech-to-text conversion according to the preset comparison result.
具体的,服务器对比字符识别错误率与标准错误率,判断字符识别错误率是否大于标准错误率;若字符识别错误率大于标准错误率,则服务器将预置的第一比对结果确定为语音转化文本的转化测评结果,其中,预置的第一比对结果为语音转化文本的准确率低;若字符识别错误率不大于标准错误率,则服务器将预置的第二比对结果确定为语音转化文本的转化测评结果,其中,预置的第二比对结果为语音转化文本的准确率高。Specifically, the server compares the character recognition error rate with the standard error rate, and determines whether the character recognition error rate is greater than the standard error rate; if the character recognition error rate is greater than the standard error rate, the server determines the preset first comparison result as speech conversion The evaluation result of text conversion, wherein the preset first comparison result is that the accuracy rate of speech-to-text conversion is low; if the character recognition error rate is not greater than the standard error rate, the server determines the preset second comparison result as speech The conversion evaluation result of the converted text, wherein the preset second comparison result is that the accuracy rate of the speech converted text is high.
服务器得到字符识别错误率后,通过比较字符识别错误率与标准错误率之间的数值大小从而确定语音转化文本的转化测评结果,这里的比对结果包括第一比对结果与第二比对结果,其中,第一比对结果为语音转化文本的准确率低,第二比对结果为语音转化文本的准确率高。当字符识别错误率的数值大于标准错误率的数值时,选择的比对结果为第一比对结果,此时将第一比对结果确定为语音转化文本的转化测评结果;当字符识别错误率的数值小于或等于标准错误率的数值时,选择的比对结果为第二比对结果,此时将第二比对结果确定为语音转化文本的转化测评结果。After the server obtains the character recognition error rate, it compares the numerical value between the character recognition error rate and the standard error rate to determine the conversion evaluation result of the speech-to-text text. The comparison result here includes the first comparison result and the second comparison result. , wherein, the first comparison result is that the accuracy rate of the speech-converted text is low, and the second comparison result is that the accuracy rate of the speech-converted text is high. When the value of the character recognition error rate is greater than the value of the standard error rate, the selected comparison result is the first comparison result, and at this time, the first comparison result is determined as the conversion evaluation result of the speech-to-speech text; when the character recognition error rate When the value of is less than or equal to the value of the standard error rate, the selected comparison result is the second comparison result, and at this time, the second comparison result is determined as the conversion evaluation result of the speech-to-text text.
可以理解的是,这里的标准错误率指的是评判初始语音转化为初始文本的标准,标准错误率的数值可以为60%,也可以为88%,本申请并不对标准错误率的数值进行限定,可以根据实际情况对标准错误率的数值进行设定。It can be understood that the standard error rate here refers to the standard for judging the conversion of initial speech into initial text. The value of the standard error rate can be 60% or 88%. This application does not limit the value of the standard error rate. , the value of the standard error rate can be set according to the actual situation.
上面对本申请实施例中语音识别结果的测评方法进行了描述,下面对本申请实施例中语音识别结果的测评装置进行描述,请参阅图3,本申请实施例中语音识别结果的测评装置一个实施例包括:转化模块301,用于获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;预处理模块302,用于对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;校对模块303,用于基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;计算模块304,用于采用预置的计算公式计算所述校对文本的字符识别错误率;确定模块305,用于通过对比所述字符识别错误率与标准错误率选取预置的比 对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。The method for evaluating the speech recognition result in the embodiment of the present application has been described above, and the apparatus for evaluating the speech recognition result in the embodiment of the present application is described below. Please refer to FIG. 3 , an embodiment of the apparatus for evaluating the speech recognition result in the embodiment of the present application. Including: a conversion module 301, used to obtain the initial voice in the video return visit project, and based on the voice recognition function to convert the initial voice, to obtain the initial text after conversion; preprocessing module 302, used for the initial text. Preprocessing by deleting space characters, sorting preprocessing and deleting punctuation characters, to obtain the text to be detected; the proofreading module 303 is configured to obtain the sequence of words to be detected in the text to be detected based on a preset sequence function, and according to the preset sequence function The standard word sequence proofreads the to-be-detected word sequence, and proofreads the to-be-detected word sequence to obtain proofreading text; the calculation module 304 is used to calculate the character recognition of the proofreading text by using a preset calculation formula Error rate; the determining module 305 is configured to select a preset comparison result by comparing the character recognition error rate with the standard error rate, and determine the conversion evaluation result of the speech-to-text conversion according to the preset comparison result.
请参阅图4,本申请实施例中语音识别结果的测评装置的另一个实施例包括:Referring to FIG. 4 , another embodiment of the apparatus for evaluating speech recognition results in the embodiment of the present application includes:
转化模块301,用于获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;预处理模块302,用于对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;校对模块303,用于基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;计算模块304,用于采用预置的计算公式计算所述校对文本的字符识别错误率;确定模块305,用于通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。The conversion module 301 is used to obtain the initial voice in the video return visit project, and based on the speech recognition function, the initial voice is converted to obtain the initial text after the conversion; the preprocessing module 302 is used to delete the space for the initial text. Character preprocessing, sorting preprocessing and deleting punctuation character preprocessing, to obtain the text to be detected; the proofreading module 303 is used to obtain the sequence of words to be detected in the text to be detected based on a preset sequence function, according to the preset standard word sequence The sequence proofreads the sequence of words to be detected, and proofreads the sequence of words to be detected to obtain proofreading text; the calculation module 304 is used to calculate the character recognition error rate of the proofreading text by using a preset calculation formula The determination module 305 is used to select a preset comparison result by comparing the character recognition error rate and the standard error rate, and determine the conversion evaluation result of the speech-to-text conversion according to the preset comparison result.
可选的,校对模块303包括:比对单元3031,用于基于预置的序列函数获取所述待检测文本中的待检测词序列,并将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系;第一标记单元3032,若所述待检测词序列的字符数大于所述预置的标准词序列的字符数,则用于在所述待检测词序列的位置上标记预置的插入字符;二标记单元3033,若所述待检测词序列的字符数小于所述预置的标准词序列的字符数,则用于在所述待检测词序列的位置上标记预置的删除字符;判断单元3034,若所述待检测词序列的字符数等于所述预置的标准词序列的字符数,则用于判断所述待检测词序列与所述预置的标准词序列是否相同;第三标记单元3035,若所述待检测词序列与所述预置的标准词序列不相同,则用于在所述待检测词序列的位置上标记预置的替换字符,将校对标记后的待检测文本确定为校对文本。Optionally, the proofreading module 303 includes: a comparison unit 3031, configured to obtain the word sequence to be detected in the text to be detected based on a preset sequence function, and compare the word sequence to be detected with the preset standard word sequence. Perform a comparison to determine the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence; the first marking unit 3032, if the number of characters of the word sequence to be detected is greater than the preset standard The number of characters of the word sequence, it is used to mark the preset insertion character at the position of the word sequence to be detected; the second marking unit 3033, if the number of characters of the word sequence to be detected is less than the preset standard word sequence The number of characters of the word sequence to be detected is used to mark the preset deletion character at the position of the word sequence to be detected; the judgment unit 3034, if the number of characters of the word sequence to be detected is equal to the number of characters of the preset standard word sequence , then it is used to judge whether the word sequence to be detected is the same as the preset standard word sequence; the third marking unit 3035, if the word sequence to be detected is different from the preset standard word sequence, use The preset replacement characters are marked on the positions of the word sequences to be detected, and the text to be detected after the proofreading mark is determined as proofreading text.
可选的,比对单元3031具体用于:获取所述待检测文本中的基础文本字符以及初始观测序列,所述初始观测序列用于指示所述基础文本字符的文本字符序列;通过所述预置的序列函数中的划分规则将所述基础文本字符划分为预测观测序列,所述预测观测序列用于指示所述文本字符序列的组合;利用预置的条件概率公式计算所述基础文本字符在初始观测序列的排列条件下,发生按照所述预测观测序列进行排列的基础条件概率,其中,预置的条件概率公式为:S *=argmaxP(S|O),其中,S *为目标观测序列,S为预测观测序列,且S=(s 1,s 2,…,s T),T为初始观测序列的长度,s 1为按照预测观测序列划分基础文本字符的第一个词序列,O为初始观测序列,且O=(o 1,o 2,…,o T),o 1为按照初始观测序列划分基础文本字符的第一个字序列;将所述基础条件概率数值最大的目标条件概率对应的预测观测序列作为目标观测序列;按照所述目标观测序列对所述基础文本字符进行划分,得到待检测词序列;将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系。 Optionally, the comparison unit 3031 is specifically configured to: acquire basic text characters in the text to be detected and an initial observation sequence, where the initial observation sequence is used to indicate the text character sequence of the basic text characters; The division rule in the preset sequence function divides the basic text characters into predicted observation sequences, and the predicted observation sequences are used to indicate the combination of the text character sequences; using a preset conditional probability formula to calculate the basic text characters in Under the arrangement condition of the initial observation sequence, the basic conditional probability of arranging according to the predicted observation sequence occurs, wherein the preset conditional probability formula is: S * =argmaxP(S|O), where S * is the target observation sequence , S is the predicted observation sequence, and S=(s 1 , s 2 ,...,s T ), T is the length of the initial observation sequence, s 1 is the first word sequence that divides the basic text characters according to the predicted observation sequence, O is the initial observation sequence, and O=(o 1 , o 2 ,...,o T ), o 1 is the first word sequence for dividing basic text characters according to the initial observation sequence; the target condition with the largest probability value of the basic condition The predicted observation sequence corresponding to the probability is used as the target observation sequence; the basic text characters are divided according to the target observation sequence to obtain the word sequence to be detected; the word sequence to be detected is compared with the preset standard word sequence, Determine the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence.
可选的,转化模块301具体用于:获取视频回访项目中的初始语音,并将所述初始语音输入至语音识别函数中,通过所述语音识别函数提取所述初始语音中的语音特征;通过 预置的转译模型将所述语音特征转化为音素信息,其中,所述音素信息用于指示构成语音音节的最小语音单位;将所述音素信息与预置的标准文字进行匹配,生成所述初始语音对应的初始文本。Optionally, the conversion module 301 is specifically used to: obtain the initial voice in the video return visit project, and input the initial voice into the voice recognition function, and extract the voice feature in the initial voice through the voice recognition function; The preset translation model converts the phoneme features into phoneme information, wherein the phoneme information is used to indicate the smallest phonetic unit that constitutes a phonetic syllable; the phoneme information is matched with the preset standard text to generate the initial The initial text corresponding to the speech.
可选的,预处理模块302具体用于:获取所述初始文本的文本字符,判断所述文本字符之间是否存在空格字符;若所述文本字符之间存在空格字符,则删除所述空格字符,将删除所述空格字符后剩余的文本字符确定为第一预处理文本字符;在所述第一预处理文本字符中获取标点字符的位置,并将所述标点字符的后一个字符作为下一行的第一个字符,对所述第一预处理文本字符进行分段排序,得到第二预处理文本字符,所述标点字符用于指示辅助文字记录语言的符号;在所述第二预处理文本字符中删除所述标点字符,将删除所述标点字符后剩余的第二预处理文本字符确定为目标文本字符,得到到待检测文本。Optionally, the preprocessing module 302 is specifically configured to: obtain the text characters of the initial text, and determine whether there are space characters between the text characters; if there are space characters between the text characters, delete the space characters , determine the remaining text characters after deleting the space character as the first preprocessing text character; obtain the position of the punctuation character in the first preprocessing text character, and take the next character of the punctuation character as the next line The first character of the first character in the preprocessed text is segmented and sorted to obtain the second preprocessed text character, and the punctuation character is used to indicate the symbol of the auxiliary word record language; in the second preprocessed text The punctuation character is deleted from the characters, and the second preprocessed text character remaining after the punctuation character is deleted is determined as the target text character, and the text to be detected is obtained.
可选的,计算模块304具体用于:分别统计所述校对文本中的插入字符数量、删除字符数量、替换字符数量与校对文本的字符数量;将所述插入字符数量、所述删除字符数量、所述替换字符数量与所述校对文本的字符数量输入至预置的计算公式中,得到所述校对文本的字符识别错误率,其中,所述预置计算公式为:Optionally, the calculation module 304 is specifically configured to: respectively count the number of inserted characters, the number of deleted characters, the number of replaced characters and the number of characters in the proofread text; the number of inserted characters, the number of deleted characters, the number of deleted characters, The number of replacement characters and the number of characters in the proofreading text are input into a preset calculation formula to obtain the character recognition error rate of the proofreading text, wherein the preset calculation formula is:
Figure PCTCN2021090436-appb-000002
Figure PCTCN2021090436-appb-000002
在式中,WER表示字符识别错误率,i表示插入字符数量,s表示替换字符数量,d表示删除字符数量,t表示校对文本的字符数量。In the formula, WER is the character recognition error rate, i is the number of inserted characters, s is the number of replaced characters, d is the number of deleted characters, and t is the number of characters in the proofreading text.
可选的,确定模块305具体用于:对比所述字符识别错误率与标准错误率,判断所述字符识别错误率是否大于所述标准错误率;若所述字符识别错误率大于所述标准错误率,则将预置的第一比对结果确定为语音转化文本的转化测评结果,其中,所述预置的第一比对结果为语音转化文本的准确率低;所述字符识别错误率不大于所述标准错误率,则将预置的第二比对结果确定为语音转化文本的转化测评结果,其中,所述预置的第二比对结果为语音转化文本的准确率高。Optionally, the determining module 305 is specifically configured to: compare the character recognition error rate with the standard error rate, and determine whether the character recognition error rate is greater than the standard error rate; if the character recognition error rate is greater than the standard error The preset first comparison result is determined as the conversion evaluation result of the speech-to-speech text, wherein the preset first comparison result is that the accuracy of the speech-to-speech text is low; the character recognition error rate is not greater than the standard error rate, the preset second comparison result is determined as the conversion evaluation result of the speech-to-speech text, wherein the preset second comparison result is that the accuracy rate of the speech-to-text is high.
上面图3和图4从模块化功能实体的角度对本申请实施例中的语音识别结果的测评装置进行详细描述,下面从硬件处理的角度对本申请实施例中语音识别结果的测评设备进行详细描述。Figures 3 and 4 above describe in detail the apparatus for evaluating speech recognition results in the embodiment of the present application from the perspective of modular functional entities, and the following describes the device for evaluating speech recognition results in the embodiment of the present application in detail from the perspective of hardware processing.
图5是本申请实施例提供的一种语音识别结果的测评设备的结构示意图,该语音识别结果的测评设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)510(例如,一个或一个以上处理器)和存储器520,一个或一个以上存储应用程序533或数据532的存储介质530(例如一个或一个以上海量存储设备)。其中,存储器520和存储介质530可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对语音识别结果的测评设备500中的一系列指令操作。更进一步地,处理器510可以设置为与存 储介质530通信,在语音识别结果的测评设备500上执行存储介质530中的一系列指令操作。5 is a schematic structural diagram of a device for evaluating a speech recognition result provided by an embodiment of the present application. The device 500 for evaluating a speech recognition result may vary greatly due to different configurations or performances, and may include one or more processors (central processing units, CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store application programs 533 or data 532. Among them, the memory 520 and the storage medium 530 may be short-term storage or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 500 for evaluating the speech recognition result. Further, the processor 510 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the device 500 for evaluating the speech recognition result.
语音识别结果的测评设备500还可以包括一个或一个以上电源540,一个或一个以上有线或无线网络接口550,一个或一个以上输入输出接口560,和/或,一个或一个以上操作系统531,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5示出的语音识别结果的测评设备结构并不构成对语音识别结果的测评设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The apparatus 500 for evaluating speech recognition results may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and more. Those skilled in the art can understand that the structure of the evaluation device for speech recognition results shown in FIG. 5 does not constitute a limitation on the evaluation device for speech recognition results, and may include more or less components than those shown in the figure, or combine some components , or a different component arrangement.
本申请还提供一种语音识别结果的测评设备,所述计算机设备包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述各实施例中的所述语音识别结果的测评方法的步骤。The present application also provides a device for evaluating speech recognition results. The computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor executes the above embodiments. The steps in the evaluation method of the speech recognition result.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:The present application also provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer performs the following steps:
获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;采用预置的计算公式计算所述校对文本的字符识别错误率;通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。Acquiring the initial voice in the video return visit project, and transforming the initial voice based on the voice recognition function to obtain the converted initial text; performing preprocessing of deleting space characters, sorting preprocessing and deleting punctuation characters on the initial text , obtain the text to be detected; obtain the word sequence to be detected in the text to be detected based on the preset sequence function, proofread the to-be-detected word sequence according to the preset standard word sequence, and add the to-be-detected word sequence in the to-be-detected word sequence Carry out proofreading mark in , obtain proofreading text; Adopt preset calculation formula to calculate the character recognition error rate of described proofreading text; Select preset comparison result by comparing described character recognition error rate and standard error rate, and The preset comparison result determines the conversion evaluation result of the speech-to-text.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims (20)

  1. 一种语音识别结果的测评方法,所述语音识别结果的测评方法包括:A method for evaluating a speech recognition result, the method for evaluating the speech recognition result includes:
    获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;Obtain the initial voice in the video return visit project, and convert the initial voice based on the voice recognition function to obtain the converted initial text;
    对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;Performing preprocessing to delete space characters, sorting preprocessing and deleting punctuation characters to the initial text to obtain text to be detected;
    基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;Obtain the word sequence to be detected in the text to be detected based on the preset sequence function, proofread the word sequence to be detected according to the preset standard word sequence, and carry out proofreading marking in the word sequence to be detected to obtain proofread text;
    采用预置的计算公式计算所述校对文本的字符识别错误率;Using a preset calculation formula to calculate the character recognition error rate of the proofreading text;
    通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。A preset comparison result is selected by comparing the character recognition error rate with the standard error rate, and the conversion evaluation result of the speech-to-text text is determined according to the preset comparison result.
  2. 根据权利要求1所述的语音识别结果的测评方法,其中,所述基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本包括:The method for evaluating speech recognition results according to claim 1, wherein the word sequence to be detected in the text to be detected is obtained by the preset sequence function, and the word sequence to be detected is evaluated according to the preset standard word sequence. The sequence is proofread, and proofreading is performed in the word sequence to be detected, and the proofreading text obtained includes:
    基于预置的序列函数获取所述待检测文本中的待检测词序列,并将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系;Obtain the word sequence to be detected in the text to be detected based on the preset sequence function, compare the word sequence to be detected with the preset standard word sequence, and determine the number of characters in the word sequence to be detected and the preset word sequence. The relationship between the number of characters of a standard word sequence;
    若所述待检测词序列的字符数大于所述预置的标准词序列的字符数,则在所述待检测词序列的位置上标记预置的插入字符;If the number of characters of the word sequence to be detected is greater than the number of characters of the preset standard word sequence, then mark the preset insertion character at the position of the word sequence to be detected;
    若所述待检测词序列的字符数小于所述预置的标准词序列的字符数,则在所述待检测词序列的位置上标记预置的删除字符;If the number of characters of the word sequence to be detected is less than the number of characters of the preset standard word sequence, mark a preset deletion character at the position of the word sequence to be detected;
    若所述待检测词序列的字符数等于所述预置的标准词序列的字符数,则判断所述待检测词序列与所述预置的标准词序列是否相同;If the number of characters of the to-be-detected word sequence is equal to the number of characters of the preset standard word sequence, then determine whether the to-be-detected word sequence is the same as the preset standard word sequence;
    若所述待检测词序列与所述预置的标准词序列不相同,则在所述待检测词序列的位置上标记预置的替换字符,将校对标记后的待检测文本确定为校对文本。If the word sequence to be detected is different from the preset standard word sequence, a preset replacement character is marked on the position of the word sequence to be detected, and the text to be detected after the proofreading mark is determined as proofreading text.
  3. 根据权利要求2所述的语音识别结果的测评方法,其中,所述基于预置的序列函数获取所述待检测文本中的待检测词序列,并将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系包括:The method for evaluating speech recognition results according to claim 2, wherein the preset sequence function obtains the sequence of words to be detected in the text to be detected, and compares the sequence of words to be detected with a preset standard The word sequences are compared, and the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence is determined, including:
    获取所述待检测文本中的基础文本字符以及初始观测序列,所述初始观测序列用于指示所述基础文本字符的文本字符序列;acquiring basic text characters in the text to be detected and an initial observation sequence, where the initial observation sequence is used to indicate a text character sequence of the basic text characters;
    通过所述预置的序列函数中的划分规则将所述基础文本字符划分为预测观测序列,所述预测观测序列用于指示所述文本字符序列的组合;The basic text characters are divided into predicted observation sequences according to the division rules in the preset sequence function, and the predicted observation sequences are used to indicate the combination of the text character sequences;
    利用预置的条件概率公式计算所述基础文本字符在初始观测序列的排列条件下,发生按照所述预测观测序列进行排列的基础条件概率,其中,预置的条件概率公式为:Use a preset conditional probability formula to calculate the basic conditional probability that the basic text characters are arranged according to the predicted observation sequence under the arrangement condition of the initial observation sequence, wherein the preset conditional probability formula is:
    S *=arg max P(S|O),其中,S *为目标观测序列,S为预测观测序列,且S=(s 1,s 2,…,s T),T为初始观测序列的长度,s 1为按照预测观测序列划分基础文本字符的第一个词序列,O为初始观测序列,且O=(o 1,o 2,…,o T),o 1为按照初始观测序列划分基础文本字符的第一个字序列; S * =arg max P(S|O), where S * is the target observation sequence, S is the predicted observation sequence, and S=(s 1 ,s 2 ,...,s T ), T is the length of the initial observation sequence , s 1 is the first word sequence that divides the basic text characters according to the predicted observation sequence, O is the initial observation sequence, and O=(o 1 , o 2 ,..., o T ), o 1 is the basis for dividing the basic text according to the initial observation sequence the first word sequence of text characters;
    将所述基础条件概率数值最大的目标条件概率对应的预测观测序列作为目标观测序列;Taking the predicted observation sequence corresponding to the target conditional probability with the largest value of the basic conditional probability as the target observation sequence;
    按照所述目标观测序列对所述基础文本字符进行划分,得到待检测词序列;Divide the basic text characters according to the target observation sequence to obtain a word sequence to be detected;
    将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系。The word sequence to be detected is compared with the preset standard word sequence, and the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence is determined.
  4. 根据权利要求1所述的语音识别结果的测评方法,其中,所述获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本包括:The method for evaluating a speech recognition result according to claim 1, wherein said obtaining the initial voice in the video return visit item, and converting the initial voice based on a voice recognition function, obtaining the converted initial text comprises:
    获取视频回访项目中的初始语音,并将所述初始语音输入至语音识别函数中,通过所述语音识别函数提取所述初始语音中的语音特征;Obtain the initial voice in the video return visit project, and input the initial voice into the voice recognition function, and extract the voice features in the initial voice through the voice recognition function;
    通过预置的转译模型将所述语音特征转化为音素信息,其中,所述音素信息用于指示构成语音音节的最小语音单位;Convert the phonetic features into phoneme information through a preset translation model, wherein the phoneme information is used to indicate the smallest phonetic unit that constitutes a phonetic syllable;
    将所述音素信息与预置的标准文字进行匹配,生成所述初始语音对应的初始文本。The phoneme information is matched with a preset standard text to generate an initial text corresponding to the initial speech.
  5. 根据权利要求1所述的语音识别结果的测评方法,其中,所述对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本包括:The method for evaluating a speech recognition result according to claim 1, wherein the performing preprocessing of deleting space characters, sorting preprocessing and deleting punctuation characters on the initial text, and obtaining the text to be detected comprises:
    获取所述初始文本的文本字符,判断所述文本字符之间是否存在空格字符;Obtain the text characters of the initial text, and determine whether there is a space character between the text characters;
    若所述文本字符之间存在空格字符,则删除所述空格字符,将删除所述空格字符后剩余的文本字符确定为第一预处理文本字符;If there is a space character between the text characters, then delete the space character, and determine the remaining text character after deleting the space character as the first preprocessed text character;
    在所述第一预处理文本字符中获取标点字符的位置,并将所述标点字符的后一个字符作为下一行的第一个字符,对所述第一预处理文本字符进行分段排序,得到第二预处理文本字符,所述标点字符用于指示辅助文字记录语言的符号;Obtain the position of the punctuation character in the first preprocessed text character, take the next character of the punctuation character as the first character of the next line, and perform segment sorting on the first preprocessed text character to obtain a second preprocessed text character, the punctuation character is used to indicate a symbol of an auxiliary transcript language;
    在所述第二预处理文本字符中删除所述标点字符,将删除所述标点字符后剩余的第二预处理文本字符确定为目标文本字符,得到到待检测文本。The punctuation characters are deleted from the second preprocessed text characters, and the remaining second preprocessed text characters after the punctuation characters are deleted are determined as target text characters, and the text to be detected is obtained.
  6. 根据权利要求1所述的语音识别结果的测评方法,其中,所述采用预置的计算公式 计算所述校对文本的字符识别错误率包括:The evaluation method of speech recognition result according to claim 1, wherein, the character recognition error rate that described adopting preset calculation formula to calculate described proofreading text comprises:
    分别统计所述校对文本中的插入字符数量、删除字符数量、替换字符数量与校对文本的字符数量;Respectively count the number of inserted characters, the number of deleted characters, the number of replacement characters and the number of characters in the proofreading text in the proofreading text;
    将所述插入字符数量、所述删除字符数量、所述替换字符数量与所述校对文本的字符数量输入至预置的计算公式中,得到所述校对文本的字符识别错误率,其中,所述预置计算公式为:Input the number of inserted characters, the number of deleted characters, the number of replacement characters and the number of characters of the proofreading text into a preset calculation formula to obtain the character recognition error rate of the proofreading text, wherein the The preset calculation formula is:
    Figure PCTCN2021090436-appb-100001
    Figure PCTCN2021090436-appb-100001
    在式中,WER表示字符识别错误率,i表示插入字符数量,s表示替换字符数量,d表示删除字符数量,t表示校对文本的字符数量。In the formula, WER is the character recognition error rate, i is the number of inserted characters, s is the number of replaced characters, d is the number of deleted characters, and t is the number of characters in the proofreading text.
  7. 根据权利要求1-6中任一项所述的语音识别结果的测评方法,其特征在于,所述通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果包括:The method for evaluating a speech recognition result according to any one of claims 1-6, characterized in that, selecting a preset comparison result by comparing the character recognition error rate and a standard error rate, and selecting a preset comparison result according to the The preset comparison results determine that the conversion evaluation results of the speech-to-text text include:
    对比所述字符识别错误率与标准错误率,判断所述字符识别错误率是否大于所述标准错误率;Compare the character recognition error rate with the standard error rate, and determine whether the character recognition error rate is greater than the standard error rate;
    若所述字符识别错误率大于所述标准错误率,则将预置的第一比对结果确定为语音转化文本的转化测评结果,其中,所述预置的第一比对结果为语音转化文本的准确率低;If the character recognition error rate is greater than the standard error rate, the preset first comparison result is determined as the conversion evaluation result of the speech-to-speech text, wherein the preset first comparison result is the speech-to-speech text low accuracy;
    若所述字符识别错误率不大于所述标准错误率,则将预置的第二比对结果确定为语音转化文本的转化测评结果,其中,所述预置的第二比对结果为语音转化文本的准确率高。If the character recognition error rate is not greater than the standard error rate, the preset second comparison result is determined as the conversion evaluation result of the speech-to-speech text, wherein the preset second comparison result is the speech-to-speech conversion The accuracy of the text is high.
  8. 一种语音识别结果的测评设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A device for evaluating speech recognition results, comprising a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, and the processor implements the following when executing the computer-readable instructions step:
    获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;Obtain the initial voice in the video return visit project, and convert the initial voice based on the voice recognition function to obtain the converted initial text;
    对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;Performing preprocessing to delete space characters, sorting preprocessing and deleting punctuation characters to the initial text to obtain text to be detected;
    基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;Obtain the word sequence to be detected in the text to be detected based on the preset sequence function, proofread the word sequence to be detected according to the preset standard word sequence, and carry out proofreading marking in the word sequence to be detected to obtain proofread text;
    采用预置的计算公式计算所述校对文本的字符识别错误率;Using a preset calculation formula to calculate the character recognition error rate of the proofreading text;
    通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。A preset comparison result is selected by comparing the character recognition error rate with the standard error rate, and the conversion evaluation result of the speech-to-text text is determined according to the preset comparison result.
  9. 根据权利要求8所述的语音识别结果的测评设备,其中,所述处理器执行所述计算机可读指令实现所述基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本时,包括以下步骤:The device for evaluating speech recognition results according to claim 8, wherein the processor executes the computer-readable instructions to obtain the sequence of words to be detected in the text to be detected by the preset sequence function, according to The preset standard word sequence is used to proofread the to-be-detected word sequence, and proofreading is performed on the to-be-detected word sequence to obtain proofreading text, including the following steps:
    基于预置的序列函数获取所述待检测文本中的待检测词序列,并将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系;Obtain the word sequence to be detected in the text to be detected based on the preset sequence function, compare the word sequence to be detected with the preset standard word sequence, and determine the number of characters in the word sequence to be detected and the preset word sequence. The relationship between the number of characters of a standard word sequence;
    若所述待检测词序列的字符数大于所述预置的标准词序列的字符数,则在所述待检测词序列的位置上标记预置的插入字符;If the number of characters of the word sequence to be detected is greater than the number of characters of the preset standard word sequence, then mark the preset insertion character at the position of the word sequence to be detected;
    若所述待检测词序列的字符数小于所述预置的标准词序列的字符数,则在所述待检测词序列的位置上标记预置的删除字符;If the number of characters of the word sequence to be detected is less than the number of characters of the preset standard word sequence, mark a preset deletion character at the position of the word sequence to be detected;
    若所述待检测词序列的字符数等于所述预置的标准词序列的字符数,则判断所述待检测词序列与所述预置的标准词序列是否相同;If the number of characters of the to-be-detected word sequence is equal to the number of characters of the preset standard word sequence, then determine whether the to-be-detected word sequence is the same as the preset standard word sequence;
    若所述待检测词序列与所述预置的标准词序列不相同,则在所述待检测词序列的位置上标记预置的替换字符,将校对标记后的待检测文本确定为校对文本。If the word sequence to be detected is different from the preset standard word sequence, a preset replacement character is marked on the position of the word sequence to be detected, and the text to be detected after the proofreading mark is determined as proofreading text.
  10. 根据权利要求9所述的语音识别结果的测评设备,其中,所述处理器执行所述计算机可读指令实现所述基于预置的序列函数获取所述待检测文本中的待检测词序列,并将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系时,包括以下步骤:The device for evaluating speech recognition results according to claim 9, wherein the processor executes the computer-readable instructions to obtain the sequence of words to be detected in the text to be detected based on the preset sequence function, and Comparing the word sequence to be detected with the preset standard word sequence, and judging the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence, the following steps are included:
    获取所述待检测文本中的基础文本字符以及初始观测序列,所述初始观测序列用于指示所述基础文本字符的文本字符序列;acquiring basic text characters in the text to be detected and an initial observation sequence, where the initial observation sequence is used to indicate a text character sequence of the basic text characters;
    通过所述预置的序列函数中的划分规则将所述基础文本字符划分为预测观测序列,所述预测观测序列用于指示所述文本字符序列的组合;The basic text characters are divided into predicted observation sequences according to the division rules in the preset sequence function, and the predicted observation sequences are used to indicate the combination of the text character sequences;
    利用预置的条件概率公式计算所述基础文本字符在初始观测序列的排列条件下,发生按照所述预测观测序列进行排列的基础条件概率,其中,预置的条件概率公式为:Use a preset conditional probability formula to calculate the basic conditional probability that the basic text characters are arranged according to the predicted observation sequence under the arrangement condition of the initial observation sequence, wherein the preset conditional probability formula is:
    S *=arg max P(S|O),其中,S *为目标观测序列,S为预测观测序列,且S=(s 1,s 2,…,s T),T为初始观测序列的长度,s 1为按照预测观测序列划分基础文本字符的第一个词序列,O为初始观测序列,且O=(o 1,o 2,…,o T),o 1为按照初始观测序列划分基础文本字符的第一个字序列; S * =arg max P(S|O), where S * is the target observation sequence, S is the predicted observation sequence, and S=(s 1 ,s 2 ,...,s T ), T is the length of the initial observation sequence , s 1 is the first word sequence that divides the basic text characters according to the predicted observation sequence, O is the initial observation sequence, and O=(o 1 , o 2 ,..., o T ), o 1 is the basis for dividing the basic text according to the initial observation sequence the first word sequence of text characters;
    将所述基础条件概率数值最大的目标条件概率对应的预测观测序列作为目标观测序列;Taking the predicted observation sequence corresponding to the target conditional probability with the largest value of the basic conditional probability as the target observation sequence;
    按照所述目标观测序列对所述基础文本字符进行划分,得到待检测词序列;Divide the basic text characters according to the target observation sequence to obtain a word sequence to be detected;
    将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系。The word sequence to be detected is compared with the preset standard word sequence, and the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence is determined.
  11. 根据权利要求8所述的语音识别结果的测评设备,其中,所述处理器执行所述计算机可读指令实现所述获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本时,包括以下步骤:The device for evaluating speech recognition results according to claim 8, wherein the processor executes the computer-readable instructions to achieve the acquisition of the initial speech in the video return visit item, and performs an evaluation on the initial speech based on a speech recognition function. Conversion, when obtaining the initial text after conversion, includes the following steps:
    获取视频回访项目中的初始语音,并将所述初始语音输入至语音识别函数中,通过所述语音识别函数提取所述初始语音中的语音特征;Obtain the initial voice in the video return visit project, and input the initial voice into the voice recognition function, and extract the voice features in the initial voice through the voice recognition function;
    通过预置的转译模型将所述语音特征转化为音素信息,其中,所述音素信息用于指示构成语音音节的最小语音单位;Convert the phonetic features into phoneme information through a preset translation model, wherein the phoneme information is used to indicate the smallest phonetic unit that constitutes a phonetic syllable;
    将所述音素信息与预置的标准文字进行匹配,生成所述初始语音对应的初始文本。The phoneme information is matched with a preset standard text to generate an initial text corresponding to the initial speech.
  12. 根据权利要求8所述的语音识别结果的测评设备,其中,所述处理器执行所述计算机可读指令实现所述对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本时,包括以下步骤:The device for evaluating speech recognition results according to claim 8, wherein the processor executes the computer-readable instructions to realize the preprocessing of removing space characters, sorting preprocessing, and removing punctuation characters on the initial text When processing to obtain the text to be detected, the following steps are included:
    获取所述初始文本的文本字符,判断所述文本字符之间是否存在空格字符;Obtain the text characters of the initial text, and determine whether there is a space character between the text characters;
    若所述文本字符之间存在空格字符,则删除所述空格字符,将删除所述空格字符后剩余的文本字符确定为第一预处理文本字符;If there is a space character between the text characters, then delete the space character, and determine the remaining text character after deleting the space character as the first preprocessed text character;
    在所述第一预处理文本字符中获取标点字符的位置,并将所述标点字符的后一个字符作为下一行的第一个字符,对所述第一预处理文本字符进行分段排序,得到第二预处理文本字符,所述标点字符用于指示辅助文字记录语言的符号;Obtain the position of the punctuation character in the first preprocessed text character, take the next character of the punctuation character as the first character of the next line, and perform segment sorting on the first preprocessed text character to obtain a second preprocessed text character, the punctuation character is used to indicate a symbol of an auxiliary transcript language;
    在所述第二预处理文本字符中删除所述标点字符,将删除所述标点字符后剩余的第二预处理文本字符确定为目标文本字符,得到到待检测文本。The punctuation characters are deleted from the second preprocessed text characters, and the remaining second preprocessed text characters after the punctuation characters are deleted are determined as target text characters, and the text to be detected is obtained.
  13. 根据权利要求8所述的语音识别结果的测评设备,其中,所述处理器执行所述计算机可读指令实现所述采用预置的计算公式计算所述校对文本的字符识别错误率时,还包括以下步骤:The device for evaluating speech recognition results according to claim 8, wherein when the processor executes the computer-readable instructions to realize the calculation of the character recognition error rate of the proofreading text by using a preset calculation formula, the method further comprises: The following steps:
    分别统计所述校对文本中的插入字符数量、删除字符数量、替换字符数量与校对文本的字符数量;Respectively count the number of inserted characters, the number of deleted characters, the number of replacement characters and the number of characters in the proofreading text in the proofreading text;
    将所述插入字符数量、所述删除字符数量、所述替换字符数量与所述校对文本的字符数量输入至预置的计算公式中,得到所述校对文本的字符识别错误率,其中,所述预置计算公式为:Input the number of inserted characters, the number of deleted characters, the number of replacement characters and the number of characters of the proofreading text into a preset calculation formula to obtain the character recognition error rate of the proofreading text, wherein the The preset calculation formula is:
    Figure PCTCN2021090436-appb-100002
    Figure PCTCN2021090436-appb-100002
    在式中,WER表示字符识别错误率,i表示插入字符数量,s表示替换字符数量,d表示删除字符数量,t表示校对文本的字符数量。In the formula, WER is the character recognition error rate, i is the number of inserted characters, s is the number of replaced characters, d is the number of deleted characters, and t is the number of characters in the proofreading text.
  14. 根据权利要求8-13中任一项所述的语音识别结果的测评设备,所述处理器执行所述计算机可读指令实现所述通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果时,包括以下步骤:The device for evaluating speech recognition results according to any one of claims 8-13, wherein the processor executes the computer-readable instructions to achieve the selection of a preset error rate by comparing the character recognition error rate with a standard error rate When comparing the results, and determining the conversion evaluation result of the speech-to-text according to the preset comparison results, the following steps are included:
    对比所述字符识别错误率与标准错误率,判断所述字符识别错误率是否大于所述标准错误率;Compare the character recognition error rate with the standard error rate, and determine whether the character recognition error rate is greater than the standard error rate;
    若所述字符识别错误率大于所述标准错误率,则将预置的第一比对结果确定为语音转化文本的转化测评结果,其中,所述预置的第一比对结果为语音转化文本的准确率低;If the character recognition error rate is greater than the standard error rate, the preset first comparison result is determined as the conversion evaluation result of the speech-to-speech text, wherein the preset first comparison result is the speech-to-speech text low accuracy;
    若所述字符识别错误率不大于所述标准错误率,则将预置的第二比对结果确定为语音转化文本的转化测评结果,其中,所述预置的第二比对结果为语音转化文本的准确率高。If the character recognition error rate is not greater than the standard error rate, the preset second comparison result is determined as the conversion evaluation result of the speech-to-speech text, wherein the preset second comparison result is the speech-to-speech conversion The accuracy of the text is high.
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:A computer-readable storage medium, storing computer instructions in the computer-readable storage medium, when the computer instructions are executed on a computer, the computer is made to perform the following steps:
    获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;Obtain the initial voice in the video return visit project, and convert the initial voice based on the voice recognition function to obtain the converted initial text;
    对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;Performing preprocessing for deleting space characters, sorting preprocessing and deleting punctuation characters on the initial text to obtain text to be detected;
    基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;Obtain the word sequence to be detected in the text to be detected based on the preset sequence function, proofread the word sequence to be detected according to the preset standard word sequence, and carry out proofreading marking in the word sequence to be detected to obtain proofread text;
    采用预置的计算公式计算所述校对文本的字符识别错误率;Using a preset calculation formula to calculate the character recognition error rate of the proofreading text;
    通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。A preset comparison result is selected by comparing the character recognition error rate with the standard error rate, and the conversion evaluation result of the speech-to-text text is determined according to the preset comparison result.
  16. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:The computer-readable storage medium of claim 15, when the computer instructions are executed on a computer, causing the computer to further perform the following steps:
    基于预置的序列函数获取所述待检测文本中的待检测词序列,并将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系;Obtain the word sequence to be detected in the text to be detected based on the preset sequence function, compare the word sequence to be detected with the preset standard word sequence, and determine the number of characters in the word sequence to be detected and the preset word sequence. The relationship between the number of characters of a standard word sequence;
    若所述待检测词序列的字符数大于所述预置的标准词序列的字符数,则在所述待检测词序列的位置上标记预置的插入字符;If the number of characters of the word sequence to be detected is greater than the number of characters of the preset standard word sequence, then mark the preset insertion character at the position of the word sequence to be detected;
    若所述待检测词序列的字符数小于所述预置的标准词序列的字符数,则在所述待检测词序列的位置上标记预置的删除字符;If the number of characters of the word sequence to be detected is less than the number of characters of the preset standard word sequence, mark a preset deletion character at the position of the word sequence to be detected;
    若所述待检测词序列的字符数等于所述预置的标准词序列的字符数,则判断所述待检测词序列与所述预置的标准词序列是否相同;If the number of characters of the to-be-detected word sequence is equal to the number of characters of the preset standard word sequence, then determine whether the to-be-detected word sequence is the same as the preset standard word sequence;
    若所述待检测词序列与所述预置的标准词序列不相同,则在所述待检测词序列的位置上标记预置的替换字符,将校对标记后的待检测文本确定为校对文本。If the word sequence to be detected is different from the preset standard word sequence, a preset replacement character is marked on the position of the word sequence to be detected, and the text to be detected after the proofreading mark is determined as proofreading text.
  17. 根据权利要求16所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:The computer-readable storage medium of claim 16, when the computer instructions are executed on a computer, causing the computer to further perform the following steps:
    获取所述待检测文本中的基础文本字符以及初始观测序列,所述初始观测序列用于指示所述基础文本字符的文本字符序列;acquiring basic text characters in the text to be detected and an initial observation sequence, where the initial observation sequence is used to indicate a text character sequence of the basic text characters;
    通过所述预置的序列函数中的划分规则将所述基础文本字符划分为预测观测序列,所述预测观测序列用于指示所述文本字符序列的组合;The basic text characters are divided into predicted observation sequences according to the division rules in the preset sequence function, and the predicted observation sequences are used to indicate the combination of the text character sequences;
    利用预置的条件概率公式计算所述基础文本字符在初始观测序列的排列条件下,发生按照所述预测观测序列进行排列的基础条件概率,其中,预置的条件概率公式为:Use a preset conditional probability formula to calculate the basic conditional probability that the basic text characters are arranged according to the predicted observation sequence under the arrangement condition of the initial observation sequence, wherein the preset conditional probability formula is:
    S *=arg max P(S|O),其中,S *为目标观测序列,S为预测观测序列,且S=(s 1,s 2,…,s T),T为初始观测序列的长度,s 1为按照预测观测序列划分基础文本字符的第一个词序列,O为初始观测序列,且O=(o 1,o 2,…,o T),o 1为按照初始观测序列划分基础文本字符的第一个字序列; S * =arg max P(S|O), where S * is the target observation sequence, S is the predicted observation sequence, and S=(s 1 ,s 2 ,...,s T ), T is the length of the initial observation sequence , s 1 is the first word sequence that divides the basic text characters according to the predicted observation sequence, O is the initial observation sequence, and O=(o 1 , o 2 ,..., o T ), o 1 is the basis for dividing the basic text according to the initial observation sequence the first word sequence of text characters;
    将所述基础条件概率数值最大的目标条件概率对应的预测观测序列作为目标观测序列;Taking the predicted observation sequence corresponding to the target conditional probability with the largest value of the basic conditional probability as the target observation sequence;
    按照所述目标观测序列对所述基础文本字符进行划分,得到待检测词序列;Divide the basic text characters according to the target observation sequence to obtain a word sequence to be detected;
    将所述待检测词序列与预置的标准词序列进行比对,判断待检测词序列的字符数与预置的标准词序列的字符数之间的关系。The word sequence to be detected is compared with the preset standard word sequence, and the relationship between the number of characters of the word sequence to be detected and the number of characters of the preset standard word sequence is determined.
  18. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:The computer-readable storage medium of claim 15, when the computer instructions are executed on a computer, causing the computer to further perform the following steps:
    获取视频回访项目中的初始语音,并将所述初始语音输入至语音识别函数中,通过所述语音识别函数提取所述初始语音中的语音特征;Obtain the initial voice in the video return visit project, and input the initial voice into the voice recognition function, and extract the voice features in the initial voice through the voice recognition function;
    通过预置的转译模型将所述语音特征转化为音素信息,其中,所述音素信息用于指示构成语音音节的最小语音单位;Convert the phonetic features into phoneme information through a preset translation model, wherein the phoneme information is used to indicate the smallest phonetic unit that constitutes a phonetic syllable;
    将所述音素信息与预置的标准文字进行匹配,生成所述初始语音对应的初始文本。The phoneme information is matched with a preset standard text to generate an initial text corresponding to the initial speech.
  19. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:The computer-readable storage medium of claim 15, when the computer instructions are executed on a computer, causing the computer to further perform the following steps:
    获取所述初始文本的文本字符,判断所述文本字符之间是否存在空格字符;Obtain the text characters of the initial text, and determine whether there is a space character between the text characters;
    若所述文本字符之间存在空格字符,则删除所述空格字符,将删除所述空格字符后剩余的文本字符确定为第一预处理文本字符;If there is a space character between the text characters, then delete the space character, and determine the remaining text character after deleting the space character as the first preprocessed text character;
    在所述第一预处理文本字符中获取标点字符的位置,并将所述标点字符的后一个字符作为下一行的第一个字符,对所述第一预处理文本字符进行分段排序,得到第二预处理文本字符,所述标点字符用于指示辅助文字记录语言的符号;Obtain the position of the punctuation character in the first preprocessed text character, take the next character of the punctuation character as the first character of the next line, and perform segmental sorting on the first preprocessed text character to obtain a second preprocessed text character, the punctuation character being used to indicate a symbol of an auxiliary transcript language;
    在所述第二预处理文本字符中删除所述标点字符,将删除所述标点字符后剩余的第二预处理文本字符确定为目标文本字符,得到到待检测文本。The punctuation characters are deleted from the second preprocessed text characters, and the second preprocessed text characters remaining after the punctuation characters are deleted are determined as target text characters, and the text to be detected is obtained.
  20. 一种语音识别结果的测评装置,所述语音识别结果的测评装置包括:A device for evaluating a speech recognition result, the device for evaluating the speech recognition result comprising:
    转化模块,用于获取视频回访项目中的初始语音,并基于语音识别函数对所述初始语音进行转化,得到转化过后的初始文本;a conversion module, used for obtaining the initial voice in the video return visit project, and converting the initial voice based on the voice recognition function to obtain the converted initial text;
    预处理模块,用于对所述初始文本进行删除空格字符预处理、排序预处理与删除标点字符预处理,得到待检测文本;a preprocessing module, configured to perform preprocessing of deleting space characters, sorting preprocessing and deleting punctuation characters on the initial text to obtain the text to be detected;
    校对模块,用于基于预置的序列函数获取所述待检测文本中的待检测词序列,根据预置的标准词序列对所述待检测词序列进行校对,并在所述待检测词序列中进行校对标记,得到校对文本;The proofreading module is used to obtain the sequence of words to be detected in the text to be detected based on a preset sequence function, proofread the sequence of words to be detected according to the preset standard sequence of words, and put the sequence of words to be detected in the sequence of words to be detected. Make proofreading marks to get proofreading text;
    计算模块,用于采用预置的计算公式计算所述校对文本的字符识别错误率;a calculation module for calculating the character recognition error rate of the proofreading text by using a preset calculation formula;
    确定模块,用于通过对比所述字符识别错误率与标准错误率选取预置的比对结果,并根据所述预置的比对结果确定语音转化文本的转化测评结果。A determination module, configured to select a preset comparison result by comparing the character recognition error rate with the standard error rate, and determine the conversion evaluation result of the speech-to-text conversion according to the preset comparison result.
PCT/CN2021/090436 2020-11-04 2021-04-28 Speech recognition result evaluation method, apparatus and device, and storage medium WO2022095353A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011215789.4 2020-11-04
CN202011215789.4A CN112151014B (en) 2020-11-04 2020-11-04 Speech recognition result evaluation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022095353A1 true WO2022095353A1 (en) 2022-05-12

Family

ID=73953912

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090436 WO2022095353A1 (en) 2020-11-04 2021-04-28 Speech recognition result evaluation method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN112151014B (en)
WO (1) WO2022095353A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151014B (en) * 2020-11-04 2023-07-21 平安科技(深圳)有限公司 Speech recognition result evaluation method, device, equipment and storage medium
CN112599129B (en) * 2021-03-01 2021-05-28 北京世纪好未来教育科技有限公司 Speech recognition method, apparatus, device and storage medium
CN113129935B (en) * 2021-06-16 2021-08-31 北京新唐思创教育科技有限公司 Audio dotting data acquisition method and device, storage medium and electronic equipment
CN113312456A (en) * 2021-06-28 2021-08-27 中国平安人寿保险股份有限公司 Short video text generation method, device, equipment and storage medium
CN115687334B (en) * 2023-01-05 2023-05-16 粤港澳大湾区数字经济研究院(福田) Data quality inspection method, device, equipment and storage medium
CN116403604B (en) * 2023-06-07 2023-11-03 北京奇趣万物科技有限公司 Child reading ability evaluation method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1571013A (en) * 2003-02-13 2005-01-26 微软公司 Method and device for predicting word error rate from text
CN109637536A (en) * 2018-12-27 2019-04-16 苏州思必驰信息科技有限公司 A kind of method and device of automatic identification semantic accuracy
CN111179939A (en) * 2020-04-13 2020-05-19 北京海天瑞声科技股份有限公司 Voice transcription method, voice transcription device and computer storage medium
CN112151014A (en) * 2020-11-04 2020-12-29 平安科技(深圳)有限公司 Method, device and equipment for evaluating voice recognition result and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653517A (en) * 2015-11-05 2016-06-08 乐视致新电子科技(天津)有限公司 Recognition rate determining method and apparatus
CN108766437B (en) * 2018-05-31 2020-06-23 平安科技(深圳)有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN110968730B (en) * 2019-12-16 2023-06-09 Oppo(重庆)智能科技有限公司 Audio mark processing method, device, computer equipment and storage medium
CN111223498A (en) * 2020-01-10 2020-06-02 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium
CN111681642B (en) * 2020-06-03 2022-04-15 北京字节跳动网络技术有限公司 Speech recognition evaluation method, device, storage medium and equipment
CN111696557A (en) * 2020-06-23 2020-09-22 深圳壹账通智能科技有限公司 Method, device and equipment for calibrating voice recognition result and storage medium
CN111816165A (en) * 2020-07-07 2020-10-23 北京声智科技有限公司 Voice recognition method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1571013A (en) * 2003-02-13 2005-01-26 微软公司 Method and device for predicting word error rate from text
CN109637536A (en) * 2018-12-27 2019-04-16 苏州思必驰信息科技有限公司 A kind of method and device of automatic identification semantic accuracy
CN111179939A (en) * 2020-04-13 2020-05-19 北京海天瑞声科技股份有限公司 Voice transcription method, voice transcription device and computer storage medium
CN112151014A (en) * 2020-11-04 2020-12-29 平安科技(深圳)有限公司 Method, device and equipment for evaluating voice recognition result and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FINDYOU: "[HResults calculates word error rate (WER) and sentence error rate (SER)]", CN BLOGS, CNBLOGS, 3 April 2019 (2019-04-03), pages 1 - 11, XP055931090, Retrieved from the Internet <URL:https://www.cnblogs.com/FINDYOU/P/10646312.HTML> [retrieved on 20220614] *
MICHAELLIU_DEV: "[Detailed explanation of CTC algorithm]", BLOG CSDN, CSDN, 2 November 2018 (2018-11-02), pages 1 - 8, XP055931095, Retrieved from the Internet <URL:https://blog.csdn.net/michaelshare/article/details/83660557> [retrieved on 20220614] *

Also Published As

Publication number Publication date
CN112151014A (en) 2020-12-29
CN112151014B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
WO2022095353A1 (en) Speech recognition result evaluation method, apparatus and device, and storage medium
CN109887497B (en) Modeling method, device and equipment for speech recognition
US8185376B2 (en) Identifying language origin of words
US7421387B2 (en) Dynamic N-best algorithm to reduce recognition errors
US6836760B1 (en) Use of semantic inference and context-free grammar with speech recognition system
CN107229627B (en) Text processing method and device and computing equipment
JP2016536652A (en) Real-time speech evaluation system and method for mobile devices
CN112992125B (en) Voice recognition method and device, electronic equipment and readable storage medium
CN109858025B (en) Word segmentation method and system for address standardized corpus
CN107766560B (en) Method and system for evaluating customer service flow
CN113626573A (en) Sales session objection and response extraction method and system
CN113221542A (en) Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening
CN115687621A (en) Short text label labeling method and device
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
JP5376341B2 (en) Model adaptation apparatus, method and program thereof
JP5897718B2 (en) Voice search device, computer-readable storage medium, and voice search method
CN112287657A (en) Information matching system based on text similarity
JP5590549B2 (en) Voice search apparatus and voice search method
JP5253317B2 (en) Summary sentence creation device, summary sentence creation method, program
CN107886233B (en) Service quality evaluation method and system for customer service
JP2017191278A (en) Phoneme error acquisition device, dictionary addition device, speech recognition device, phoneme error acquisition method, speech recognition method, and program
CN114254628A (en) Method and device for quickly extracting hot words by combining user text in voice transcription, electronic equipment and storage medium
JP2938865B1 (en) Voice recognition device
CN114444491A (en) New word recognition method and device
JP2008165718A (en) Intention determination device, intention determination method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888057

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21888057

Country of ref document: EP

Kind code of ref document: A1