CN111696557A - Method, device and equipment for calibrating voice recognition result and storage medium - Google Patents

Method, device and equipment for calibrating voice recognition result and storage medium Download PDF

Info

Publication number
CN111696557A
CN111696557A CN202010581203.XA CN202010581203A CN111696557A CN 111696557 A CN111696557 A CN 111696557A CN 202010581203 A CN202010581203 A CN 202010581203A CN 111696557 A CN111696557 A CN 111696557A
Authority
CN
China
Prior art keywords
calibration
statement
sentence
target
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010581203.XA
Other languages
Chinese (zh)
Inventor
王振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010581203.XA priority Critical patent/CN111696557A/en
Publication of CN111696557A publication Critical patent/CN111696557A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The scheme relates to artificial intelligence, and provides a method, a device, equipment and a storage medium for calibrating a voice recognition result, which are used for solving the problem of high text error recognition rate when voice is converted into characters. The calibration method of the voice recognition result comprises the following steps: acquiring a plurality of target voices and converting the plurality of target voices into a plurality of initial sentences; screening a plurality of keywords in a target sentence through a fuzzy matching algorithm, and replacing the plurality of keywords with a plurality of basic standard words according to a conversion threshold value to obtain a first calibration sentence; according to other standard words in the sentence, performing matching calibration on the first calibration sentence to obtain a second calibration sentence; calculating a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement by adopting a similarity algorithm; and if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than the matching threshold, determining the second calibration statement as the output statement, otherwise, determining the first calibration statement as the output statement.

Description

Method, device and equipment for calibrating voice recognition result and storage medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for calibrating a speech recognition result.
Background
The speech recognition technology in artificial intelligence is a technology for converting a speech signal into a corresponding text or command by a machine through a recognition and understanding process, and with the progress and continuous development of scientific technology, speech recognition is applied to a plurality of fields such as industry, household appliances, communication, medical treatment, electronic products and the like in time, wherein Automatic Speech Recognition (ASR) is the technology with the widest application range in the speech recognition technology, and ASR converts recognized voice information into corresponding text information by using a model.
The inventor of the present application has found in research that, when speech recognition is performed by using a speech recognition technology, due to lack of understanding and analysis of upper and lower sentences of a target sentence, a word misrecognition rate during speech recognition is high, and a conversion efficiency of speech recognition is low.
Disclosure of Invention
The invention mainly aims to solve the problem of high error recognition rate of characters when voice is converted into characters.
The first aspect of the present invention provides a method for calibrating a speech recognition result, including: acquiring a plurality of target voices based on a voice recognition algorithm, and converting the target voices into characters to obtain a plurality of initial sentences; screening a plurality of keywords in a target sentence through a fuzzy matching algorithm, and replacing the plurality of keywords with a plurality of basic standard words according to a conversion threshold value to obtain a first calibration sentence, wherein the target sentence is any one of the plurality of initial sentences, and the basic standard words are common words in service data; performing matching calibration on the first calibration statement according to other standard words in the previous statement to obtain a second calibration statement, wherein the previous statement is a previous statement of the first calibration statement, and the other standard words are common words except the basic standard words in the service data; respectively calculating a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement by adopting a similarity algorithm; and if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than a matching threshold, determining the second calibration statement as an output statement, otherwise, determining the first calibration statement as an output statement.
Optionally, in a first implementation manner of the first aspect of the present invention, the obtaining a plurality of target voices based on a voice recognition algorithm, and converting the plurality of target voices into characters to obtain a plurality of initial sentences includes: acquiring a plurality of target voices based on a voice recognition algorithm, and extracting voice features in the target voices; converting the voice features into phoneme information through a preset acoustic model, wherein the phoneme information is used for indicating the minimum voice unit forming the syllable; and matching the corresponding text information by using the phoneme information to obtain a plurality of initial sentences.
Optionally, in a second implementation manner of the first aspect of the present invention, the obtaining, by using the phoneme information to match with corresponding text information, a plurality of initial sentences includes: matching character information corresponding to the phoneme information in a preset dictionary, wherein the character information comprises a single character or a word; acquiring the association probability of the character information from preset association probabilities, and extracting the character information with the maximum association probability as a target character, wherein the preset association probability is used for indicating the probability of mutual association between any two single characters or words; and combining the target characters together according to the arrangement sequence to obtain a plurality of initial sentences, wherein the number of the plurality of initial sentences is the same as that of the plurality of target voices.
Optionally, in a third implementation manner of the first aspect of the present invention, the screening, by using a fuzzy matching algorithm, a plurality of keywords in a target sentence, and replacing the plurality of keywords with a plurality of basic standard words according to a conversion threshold to obtain a first calibration sentence includes: converting the target sentence into a pinyin sentence through a fuzzy matching algorithm; screening out a target phonetic symbol in the pinyin sentence, and converting the target phonetic symbol into a near phonetic symbol to obtain a converted pinyin sentence, wherein the target phonetic symbol comprises a simple or compound vowel and/or an initial consonant which are easy to be confused; extracting a plurality of keywords with near phonetic symbols in the converted pinyin sentences, and calculating the similarity between the keywords and corresponding basic standard words, wherein the basic standard words are common words in service data; and when the numerical value of the target similarity is larger than the replacement threshold, replacing the keywords corresponding to the target similarity with the corresponding basic standard words to obtain a first calibration statement.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the performing, according to another standard word in the above sentence, matching and calibrating the first calibration statement to obtain a second calibration statement includes: judging whether the previous sentence of the first calibration sentence comprises other standard words or not; if the above sentence comprises the other standard words, judging whether the first calibration sentence comprises keywords with similar properties corresponding to the other standard words, wherein the keywords with similar properties comprise similar meaning keywords and homophonic keywords; and if the first calibration statement comprises the keywords with similar properties, replacing the keywords with similar properties with other corresponding standard words to obtain a second calibration statement.
Optionally, in a fifth implementation manner of the first aspect of the present invention, if the above sentence includes the other standard words, determining whether the first calibration sentence includes keywords having similar properties to the other standard words includes: if the above sentence comprises the other standard words, calculating a plurality of intention similarities between the other standard words and the first calibration sentence; judging whether the first calibration statement comprises a near sense keyword or not based on a first preset algorithm and the target intention similarity; if the first calibration statement does not include the near-meaning keyword, converting the first calibration statement into a calibration pinyin statement, and calculating a plurality of pinyin similarities between the calibration pinyin statement and the pinyins of the other standard words; and judging whether the calibrated pinyin sentence comprises homophonic keywords or not based on a second preset algorithm and the target pinyin similarity.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the separately calculating the first intention matching degree of the first calibration statement and the second intention matching degree of the second calibration statement includes: extracting a basic standard word in the first calibration statement; calculating a first intention matching degree between the basic standard words and the first calibration sentences by adopting a similarity algorithm, wherein the first intention matching degree is used for indicating that the preset keywords accord with a matching value of the first calibration sentence expression meaning; extracting other standard words in the second calibration statement; and calculating a second intention matching degree between the other standard words and the second calibration sentences by adopting the similarity algorithm.
A second aspect of the present invention provides a device for calibrating a speech recognition result, including: the acquisition and conversion module is used for acquiring a plurality of target voices based on a voice recognition algorithm and converting the target voices into characters to obtain a plurality of initial sentences; the system comprises a screening and replacing module, a first calibration statement and a second calibration statement, wherein the screening and replacing module is used for screening a plurality of keywords in a target statement through a fuzzy matching algorithm and replacing the keywords with a plurality of basic standard words according to a conversion threshold value to obtain the first calibration statement, the target statement is any one of the initial statements, and the basic standard words are common words in service data; a calibration module, configured to perform matching calibration on the first calibration statement according to other standard words in an upper statement to obtain a second calibration statement, where the upper statement is a previous statement of the first calibration statement, and the other standard words are common words in the service data except for the basic standard word; a calculation module, configured to calculate a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement respectively by using a similarity algorithm; and the output module is used for determining the second calibration statement as an output statement if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than a matching threshold, otherwise, determining the first calibration statement as an output statement.
Optionally, in a first implementation manner of the second aspect of the present invention, the obtaining and converting module includes: the extraction unit is used for acquiring a plurality of target voices based on a voice recognition algorithm and extracting voice features in the target voices; a converting unit, configured to convert the speech features into phoneme information through a preset acoustic model, where the phoneme information is used to indicate a minimum speech unit constituting a syllable; and the matching unit is used for matching the corresponding character information by utilizing the phoneme information to obtain a plurality of initial sentences.
Optionally, in a second implementation manner of the second aspect of the present invention, the matching unit is specifically configured to: matching character information corresponding to the phoneme information in a preset dictionary, wherein the character information comprises a single character or a word; acquiring the association probability of the character information from preset association probabilities, and extracting the character information with the maximum association probability as a target character, wherein the preset association probability is used for indicating the probability of mutual association between any two single characters or words; and combining the target characters together according to the arrangement sequence to obtain a plurality of initial sentences, wherein the number of the plurality of initial sentences is the same as that of the plurality of target voices.
Optionally, in a third implementation manner of the second aspect of the present invention, the screening and replacing module is specifically configured to: converting the target sentence into a pinyin sentence through a fuzzy matching algorithm; screening out a target phonetic symbol in the pinyin sentence, and converting the target phonetic symbol into a near phonetic symbol to obtain a converted pinyin sentence, wherein the target phonetic symbol comprises a simple or compound vowel and/or an initial consonant which are easy to be confused; extracting a plurality of keywords with near phonetic symbols in the converted pinyin sentences, and calculating the similarity between the keywords and corresponding basic standard words, wherein the basic standard words are common words in service data; and when the numerical value of the target similarity is larger than the replacement threshold, replacing the keywords corresponding to the target similarity with the corresponding basic standard words to obtain a first calibration statement.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the calibration module includes: a first judging unit, configured to judge whether an upper sentence of the first calibration sentence includes other standard words; a second determining unit, configured to determine whether the first calibration sentence includes keywords with similar properties corresponding to the other standard words if the previous sentence includes the other standard words, where the keywords with similar properties include near-meaning keywords and homophonic keywords; and a replacing unit, configured to replace the keywords with similar properties with corresponding other standard words to obtain a second calibration statement if the first calibration statement includes the keywords with similar properties.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the second determining unit is specifically configured to: if the above sentence comprises the other standard words, calculating a plurality of intention similarities between the other standard words and the first calibration sentence; judging whether the first calibration statement comprises a near sense keyword or not based on a first preset algorithm and the target intention similarity; if the first calibration statement does not include the near-meaning keyword, converting the first calibration statement into a calibration pinyin statement, and calculating a plurality of pinyin similarities between the calibration pinyin statement and the pinyins of the other standard words; and judging whether the calibrated pinyin sentence comprises homophonic keywords or not based on a second preset algorithm and the target pinyin similarity.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the calculation module is specifically configured to: extracting a basic standard word in the first calibration statement; calculating a first intention matching degree between the basic standard words and the first calibration sentences by adopting a similarity algorithm, wherein the first intention matching degree is used for indicating that the preset keywords accord with a matching value of the first calibration sentence expression meaning; extracting other standard words in the second calibration statement; and calculating a second intention matching degree between the other standard words and the second calibration sentences by adopting the similarity algorithm.
A third aspect of the present invention provides a device for calibrating a speech recognition result, comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the calibration device of the speech recognition results to perform the calibration method of the speech recognition results described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-described calibration method of a speech recognition result.
In the technical scheme provided by the invention, a plurality of target voices are obtained based on a voice recognition algorithm, and the target voices are converted into characters to obtain a plurality of initial sentences; screening a plurality of keywords in a target sentence through a fuzzy matching algorithm, and replacing the plurality of keywords with a plurality of basic standard words according to a conversion threshold value to obtain a first calibration sentence, wherein the target sentence is any one of the plurality of initial sentences, and the basic standard words are a plurality of common words in service data; performing matching calibration on the first calibration statement according to other standard words in the previous statement to obtain a second calibration statement, where the previous statement is a previous statement of the first calibration statement, and the other standard words are a plurality of common words in the service data except for the plurality of basic standard words; respectively calculating a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement; and if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than a preset threshold value, determining the second calibration statement as an output statement, otherwise, determining the first calibration statement as an output statement. In the embodiment of the invention, the target sentence is corrected according to the above sentence of the target voice, and then the output sentence is determined according to the preset threshold, so that the character error recognition rate during voice recognition is reduced, and the conversion efficiency of voice recognition is improved.
Drawings
FIG. 1 is a diagram of an embodiment of a method for calibrating speech recognition results according to an embodiment of the present invention;
FIG. 2 is a diagram of another embodiment of a method for calibrating speech recognition results according to an embodiment of the present invention;
FIG. 3 is a diagram of an embodiment of a device for calibrating speech recognition results according to an embodiment of the present invention;
FIG. 4 is a diagram of another embodiment of a calibration apparatus for speech recognition results according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of a device for calibrating a speech recognition result according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for calibrating a voice recognition result, which are used for correcting a target statement according to an upper statement of a target voice and then determining an output statement according to a preset threshold value, so that the word error recognition rate during voice recognition is reduced, and the conversion efficiency of voice recognition is improved.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a method for calibrating a speech recognition result according to an embodiment of the present invention includes:
101. acquiring a plurality of target voices based on a voice recognition algorithm, and converting the plurality of target voices into characters to obtain a plurality of initial sentences;
it is to be understood that the execution subject of the present invention may be a calibration apparatus of a speech recognition result, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
The server obtains a plurality of target voices based on a voice recognition algorithm, and converts the target voices into characters to obtain a plurality of initial sentences.
The automatic speech recognition algorithm is used for converting a plurality of target speeches into a plurality of sentences, and the principle of the automatic speech recognition algorithm is mainly as follows: the server firstly collects a large number of voice samples for training, analyzes voice characteristic parameters, makes the voice characteristic parameters into voice templates and stores the voice templates in a voice parameter library; then the server obtains the voice to be recognized, the voice to be recognized is subjected to the same steps as those in training to obtain voice recognition parameters, the voice recognition parameters are compared with reference templates in a voice parameter library one by one, the best matched template is found out by adopting a judging method to obtain a recognition result, and the server has a distortion measure when the voice recognition parameters are compared with the reference templates to optimize the comparison result in time. In the whole recognition process, the adopted recognition frames comprise a dynamic time warping method based on pattern matching and a hidden Markov model method based on a statistical model, so that a plurality of initial sentences converted by a plurality of target voices can be obtained.
102. Screening a plurality of keywords in a target sentence through a fuzzy matching algorithm, and replacing the plurality of keywords with a plurality of basic standard words according to a conversion threshold value to obtain a first calibration sentence, wherein the target sentence is any one of a plurality of initial sentences, and the basic standard words are common words in service data;
the server screens a plurality of keywords in any one of a plurality of initial sentences through a fuzzy matching algorithm, and replaces the plurality of keywords with a plurality of basic standard words according to a conversion threshold value to obtain a first calibration sentence, wherein the basic standard words are common words in the service data.
A fuzzy matching algorithm is used for screening a plurality of keywords in a target sentence, the fuzzy matching algorithm is based on the principle that the target sentence is converted into pinyin corresponding to the target sentence, the target phonetic symbol with the confusable phonetic symbol is converted into a near-phonetic symbol, so that a plurality of sentences which are near to the target sentence can be obtained, a plurality of possibilities of the recognized sentences are obtained, and then a sentence which is most similar to the scene of the previous sentence is selected from the possible sentences, namely the finally recognized calibration sentence can be obtained.
It should be noted that the basic standard words refer to common words appearing many times in the service data, taking an insurance scene as the service data as an example, the basic standard words are premium, claim, exempt, money amount, and the like, which are common words appearing many times in the insurance scene. The setting of basic standard words enables the recognition of the target sentences to be closer to the actual situation, and the recognition degree of the scene is enhanced. The service data can be in many kinds, and the basic standard words in each kind of service data are more than one and are all common words related to service data scenes.
103. Performing matching calibration on the first calibration statement according to other standard words in the previous statement to obtain a second calibration statement, wherein the previous statement is a previous statement of the first calibration statement, and the other standard words are common words except the basic standard words in the service data;
and the server performs matching calibration on the first calibration statement according to other standard words in the sentence above the previous statement indicating the first calibration statement to obtain a second calibration statement, wherein the other standard words are common words except the basic standard words in the service data.
And performing matching calibration on the first calibration statement according to other standard words in the previous statement used for indicating the previous statement of the first calibration statement to obtain a second calibration statement, wherein the other standard words are a plurality of common words except the basic standard word in the service data.
The above sentence of the first calibration sentence is a sentence before the first calibration sentence, the above sentence is obtained for extracting other standard words in the above sentence, extracting other standard words, and determining whether a keyword having a property similar to that of the other standard words exists in the target sentence, where the other standard words are a plurality of common words except a plurality of basic standard words in the service data, because there are a plurality of common words in the service data, after determining that the keyword in the target sentence is replaced by the basic standard word, the target sentence may include the common word in the service data or the common word having a property similar to that of the common word, and therefore, the server needs to determine whether the above sentence includes the other standard words, and further, the keyword having a property similar to that of the target sentence includes a near-meaning keyword and a homophone keyword, if the first calibration sentence includes the keyword having a property similar to that of the target sentence, the keywords with similar properties are replaced by other corresponding standard words. The server can further calibrate words in the target sentences by performing the operation, so that the target sentences are closer to the scene of the service data, and the recognition accuracy of the target sentences is improved.
104. Respectively calculating a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement by adopting a similarity algorithm;
the server respectively calculates a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement by adopting a similarity algorithm.
After the base standard word or other standard words are replaced, the server judges whether the replaced sentences conform to the logic, so that the server needs to calculate the first intention matching degree of the first calibration sentence and the second intention matching degree of the second calibration sentence. Here, the calculation of the first intention matching degree uses a cosine similarity algorithm, and the calculation formula of the first intention matching degree is as follows:
Figure BDA0002553258220000091
in the formula, cos (θ) represents a first intention matching degree, n represents the number of times the first intention matching degree is calculated, i represents an ith intention matching degree, wiDenotes the ith preset keyword, diRepresenting the ith first calibration statement. The server firstly inputs basic standard words and first calibration sentences into a Word2vec network model and a Doc2vec network model in sequence, and a first intention matching degree is obtained through calculation by a cosine identification degree algorithm in two model networks.The method for calculating the second intention matching degree is the same as the method for calculating the first intention matching degree, and therefore, the description thereof is omitted.
After the first intention matching degree and the second intention matching degree are obtained, the server needs to compare the numerical relationship between the first intention matching degree and the second intention matching degree, check whether the second intention matching degree is higher than the first intention matching degree, and if the matching degree is higher, the higher the logic accuracy of the statement is, the more correct the step of replacing the keyword by the basic standard word is executed, otherwise, the step of replacing the keyword by the basic standard word is meaningless.
105. And if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than the matching threshold, determining the second calibration statement as the output statement, otherwise, determining the first calibration statement as the output statement.
If the second intention matching degree is larger than the first intention matching degree and the numerical value of the second intention matching degree is larger than the matching threshold, the server determines the second calibration statement as the output statement, otherwise, the server determines the first calibration statement as the output statement
After obtaining the first intention matching degree and the second intention matching degree, the server compares the first intention matching degree with the second intention matching degree, when the second intention matching degree is greater than the first intention matching degree, the second corrected sentence corresponding to the second intention matching degree is more consistent with the logical relationship of the language, the obtained second intention matching degree is also greater than a matching threshold, the matching threshold refers to a basic threshold that the sentence is consistent with the logical relationship of the language, if the intention matching degree is less than or equal to the matching threshold, the basic language logic of the sentence is not passed, the sentence cannot be identified as a logically clear sentence, the server outputs the sentence without replacing the keyword as an output sentence, namely the second intention matching degree is greater than the first intention matching degree, and the second intention matching degree is greater than the matching threshold, the second corrected sentence is used as the output sentence, otherwise, the first calibration statement is used as an output statement. Such operation also ensures that the output sentence can restore the meaning expressed by the target voice as much as possible on the basis of considering the relevance of the target voice context.
In the embodiment of the invention, the target sentence is corrected according to the above sentence of the target voice, and then the output sentence is determined according to the preset threshold, so that the character error recognition rate during voice recognition is reduced, and the conversion efficiency of voice recognition is improved.
Referring to fig. 2, another embodiment of the calibration method for the speech recognition result according to the embodiment of the present invention includes:
201. acquiring a plurality of target voices based on a voice recognition algorithm, and extracting voice features in the target voices;
the server acquires a plurality of target voices based on a voice recognition algorithm and extracts voice features in the target voices.
The automatic speech recognition algorithm is used for converting a plurality of target speeches into a plurality of sentences, and the principle of the automatic speech recognition algorithm is mainly as follows: the server firstly collects a large number of voice samples for training, analyzes voice characteristic parameters, makes the voice characteristic parameters into voice templates and stores the voice templates in a voice parameter library; then the server obtains the voice to be recognized, the voice to be recognized is subjected to the same steps as those in training to obtain voice recognition parameters, the voice recognition parameters are compared with reference templates in a voice parameter library one by one, the best matched template is found out by adopting a judging method to obtain a recognition result, and the server has a distortion measure when the voice recognition parameters are compared with the reference templates to optimize the comparison result in time. In the whole recognition process, the adopted recognition frames comprise a dynamic time warping method based on pattern matching and a hidden Markov model method based on a statistical model, so that a plurality of initial sentences converted by a plurality of target voices can be obtained.
202. Converting the voice features into phoneme information through a preset acoustic model, wherein the phoneme information is used for indicating the minimum voice unit forming the syllable;
the server converts the voice feature into phoneme information indicating a minimum voice unit constituting a syllable through a preset acoustic model.
It is understood that the phoneme information is the minimum phonetic unit divided according to the natural attributes of the speech and the minimum linear phonetic unit divided from the perspective of the voice quality, which is analyzed according to the pronunciation action in the syllable, and one action constitutes one phoneme. Phonemes are divided into two major categories, vowels and consonants. For example, the chinese syllables o (ā) have only one phoneme, the love (aji) has two phonemes, the generation (d aji) has three phonemes, etc. The server converts the voice characteristics into the minimum voice unit forming the syllables through a preset acoustic model, and the phoneme information can be accurately spliced into character information through analyzing the factor units.
203. Matching corresponding text information by using the phoneme information to obtain a plurality of initial sentences;
the server matches the corresponding text information by using the phoneme information to obtain a plurality of initial sentences. Specifically, the method comprises the following steps:
the server firstly matches character information corresponding to the phoneme information in a preset dictionary, wherein the character information comprises single characters or words; then the server acquires the association probability of the character information in the preset association probability for indicating the probability of mutual association between any two single characters or words, and extracts the character information with the maximum association probability as a target character; and finally, combining the target characters together by the server according to the arrangement sequence to obtain a plurality of initial sentences, wherein the number of the plurality of initial sentences is the same as that of the plurality of target voices.
It should be noted that the preset association probability is obtained by training a language model on a large amount of text information, where the text information includes a single text or word. The server first collects a large amount of text information, for example, the text information may be a single text "i" or a word "us", different text information is input into the language model, and through the deep neural network, a correlation probability between different text information is calculated, where the correlation probability refers to a probability that different text information can be put together and combined to form a complete word or sentence, for example: through the calculation of the language model, the association probability of a single character "I" is 0.0786, the association probability of a single character "s" is 0.0359, and the association probability of the word "us" is 0.8572, which indicates that when the "I" and the "s" appear in the character information, the probability of forming "us" is higher. The server calculates the association probability of a large amount of character information through the language model, takes the association probability as a preset association probability, and compares the character information acquired in real time, so that the target sentence converted from voice to characters can be acquired through the preset association probability.
It can be understood that after the target speech is obtained, the server first preprocesses the target speech, where the purpose of the preprocessing is to make the result of the subsequent speech recognition more accurate, and the preprocessing process generally includes: 1. the silence of the head end and the tail end of the target voice is cut off, so that the interference to the subsequent steps is reduced; 2. the target speech is processed by framing, i.e., the sound is cut into small segments by a moving window function, and each segment is called a frame, where there is generally an overlap between frames. In addition, the target voice can be one or more, and the number of the recognized target characters is the same as that of the target voice and corresponds to that of the target voice.
For example, taking recognizing a target voice "i is a robot" as an example, first, a server acquires the target voice "i is a robot", and preprocesses the target voice, and after the preprocessing, the server extracts a voice feature in the target voice, where the obtained voice feature is as follows: [1234560], the server then converts the speech features into phoneme information through an acoustic model, such as obtaining factor information as: w o s i j i q i r n, after obtaining the phoneme information, the server matches the characters corresponding to the phoneme information in a preset dictionary, such as obtaining the following characters: nesting: w o, respectively; i: w o, respectively; the method comprises the following steps: s i, respectively; machine: j i, respectively; the device comprises: q i, respectively; human: rn; stage (2): j i, respectively; honeysuckle flower: r n, respectively; then the server obtains the association probability between the text messages in the preset association probability, such as obtaining the following probability: i: 0.0786, is: 0.0546, i are: 0.0898, machine: 0.0967, robot: 0.6785, respectively; and finally, the server selects the character information with the maximum association probability as the target character, the greater the association probability is, the greater the probability of occurrence of words or sentences formed according to the combination is, and the server combines the target characters together according to the sequence to obtain the target sentence, wherein if the obtained target sentence is as follows: i am a robot.
204. Screening a plurality of keywords in a target sentence through a fuzzy matching algorithm, and replacing the plurality of keywords with a plurality of basic standard words according to a conversion threshold value to obtain a first calibration sentence, wherein the target sentence is any one of a plurality of initial sentences, and the basic standard words are common words in service data;
the server screens a plurality of keywords in any one of a plurality of initial sentences through a fuzzy matching algorithm, and replaces the plurality of keywords with a plurality of basic standard words according to a conversion threshold value to obtain a first calibration sentence, wherein the basic standard words are common words in the service data. Specifically, the method comprises the following steps:
firstly, a server converts a target sentence into a pinyin sentence through a fuzzy matching algorithm; secondly, the server screens out a target phonetic symbol in the pinyin sentence, and converts the target phonetic symbol into a near phonetic symbol to obtain a converted pinyin sentence, wherein the target phonetic symbol comprises a simple or compound vowel and/or an initial consonant which are easy to be confused; then the server extracts a plurality of keywords with near phonetic symbols in the converted pinyin sentences, and calculates the similarity between the keywords and corresponding basic standard words, wherein the basic standard words are common words in the service data; and finally, when the numerical value of the target similarity is larger than the replacement threshold value, the server replaces the keywords corresponding to the target similarity with the corresponding basic standard words to obtain a first calibration statement.
The server screens a plurality of keywords in the target sentences, and aims to replace the keywords with basic standard words related to the service data, so that the sentences after replacement are closely related to the service scene, and the fitting degree of the target sentences and the actual target voice is improved. The server uses fuzzy matching algorithm to filter the keyword in the target sentence, the principle of fuzzy matching algorithm is to convert the target sentence into pinyin corresponding to the target sentence, convert the target phonetic symbol with confusable phonetic symbol into near phonetic symbol, the target phonetic symbol with confusable phonetic symbol and the near phonetic symbol corresponding to the target phonetic symbol are: consonants are confusing: b/p; the front and back nasal sounds are easy to confuse: en/eng; plain warped tongue is easy to confuse: z/zh. After the target phonetic symbol is converted into a near phonetic symbol, the server calculates the similarity between the keyword with the near phonetic symbol and the basic standard word, and when the calculated similarity is greater than a replacement threshold, the keyword is replaced by the basic standard word to obtain a replaced sentence. In addition, the replacement threshold herein refers to a replacement standard for replacing the keyword with the basic standard word, and the value of the replacement threshold may be set according to specific service data, which is not limited in this application.
For example: taking the target sentence as 'milk flowing' and the first calibration sentence as 'milk' as an example, firstly, the server converts the target sentence into a corresponding pinyin sentence liu nai, then, a target phonetic symbol n with an easily confusable phonetic symbol in the pinyin sentence is screened out, a near phonetic symbol corresponding to the target phonetic symbol is l, and the conversion of the target phonetic symbol in the pinyin sentence into the near phonetic symbol is: niu nai and niu lai, the server calculates the similarity between the key words with the near phonetic symbols and the known basic standard word 'milk', and the obtained similarity result is the milk niu nai: 0.86 of the total weight of the mixture; bovine leiu lai: 0.32 of; liu milk liu nai: 0.45, and the preset conversion threshold is 0.56, and the similarity of the basketball is greater than the conversion threshold, the keyword 'milk flowing' is replaced by the basic standard word 'milk'.
205. Performing matching calibration on the first calibration statement according to other standard words in the previous statement to obtain a second calibration statement, wherein the previous statement is a previous statement of the first calibration statement, and the other standard words are common words except the basic standard words in the service data;
and the server performs matching calibration on the first calibration statement according to other standard words in the sentence above the previous statement indicating the first calibration statement to obtain a second calibration statement, wherein the other standard words are common words except the basic standard words in the service data. Specifically, the method comprises the following steps:
the server firstly judges whether the sentence of the first calibration sentence comprises other standard words or not; if the language sentence comprises other standard words, the server judges whether the first calibration sentence comprises keywords with similar properties corresponding to the other standard words, wherein the keywords with similar properties comprise near-meaning keywords and homophonic keywords; and if the first calibration statement comprises the keywords with similar properties, replacing the keywords with the similar properties by the server with the corresponding other standard words to obtain a second calibration statement.
It should be noted that the basic standard words and other standard words are common words appearing multiple times in the service data, and are related to the scene of the service data. After the server judges that the sentence includes other standard words, it is also required to judge whether keywords with similar properties to the other standard words appear in the first calibration sentence, where the keywords with similar properties to the other standard words include near-meaning keywords and homophonic keywords, and the server is required to screen the two keywords with similar properties respectively and then execute a replacement instruction of the keywords with similar properties.
If the above sentence includes other basic standard words, the server determines whether the first calibration sentence includes keywords with similar properties to the other basic standard words. Specifically, the method comprises the following steps: if the above sentence includes other standard words, the server first calculates a plurality of intention similarities between the other standard words and the first calibration sentence; secondly, the server judges whether the first calibration statement comprises a near-meaning keyword or not based on a first preset algorithm and the intention similarity; if the first calibration statement does not include the near-meaning keyword, the server converts the first calibration statement into a calibration pinyin statement, and calculates a plurality of pinyin similarities between the calibration pinyin statement and pinyins of other standard words; and finally, the server judges whether the pinyin sentence is calibrated to include homophonic keywords or not based on a second preset algorithm and the pinyin similarity.
The server judges whether a near-meaning keyword exists in a first calibration statement, the server firstly needs to calculate a plurality of intention similarities between other standard words and the first calibration statement, namely judges whether a word similar to the intentions of other standard words exists in the first calibration statement according to a calculation result, a first preset algorithm used here is an intention identification algorithm, the first calibration statement is preprocessed by the intention identification algorithm, the intention identification algorithm comprises removing punctuation marks, stop words and the like in the first calibration statement, after the first calibration statement is preprocessed, the first calibration statement is converted into data and a word vector is generated, the server adopts a long short-term memory network (LSTM) to carry out feature extraction on the word vector, and finally classifies the features of the word vector, namely, the intention similarities between the other standard words and the features of the word vector in the first calibration statement are calculated, when the similarity of the target intention exceeds a first threshold value, the fact that the first calibration statement comprises the similar meaning key words is indicated. The value of the first threshold may be set according to specific service data, and is not limited in this application.
When the server judges whether homophonic keywords exist in the first calibration sentence, the server firstly needs to calculate a plurality of pinyin similarity between other standard words and the first calibration sentence, the server converts Chinese in the first calibration sentence into pinyin corresponding to the Chinese in the first calibration sentence to obtain a calibrated pinyin sentence, then the server calculates the pinyin similarity between the calibrated pinyin sentence and other standard words, the second preset algorithm used here is a similarity calculation method, namely a distance algorithm is edited, whether the calibrated pinyin sentence comprises the homophonic keywords is judged through the distance algorithm and the calculated pinyin similarity, and when the target pinyin similarity exceeds a second threshold value, the first calibration sentence comprises the homophonic keywords. The value of the second threshold may be set according to specific service data, and is not limited in this application.
206. Respectively calculating a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement by adopting a similarity algorithm;
the server respectively calculates a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement by adopting a similarity algorithm. Specifically, the method comprises the following steps:
the server firstly extracts a basic standard word in a first calibration statement; secondly, the server calculates a first intention matching degree between the basic standard words and the first calibration sentences by adopting a similarity algorithm, wherein the first intention matching degree is used for indicating a matching value of the preset keywords according with the expression meaning of the first calibration sentences; then the server extracts other standard words in the second calibration statement; and finally, the server calculates second intention matching degrees between the other standard words and the second calibration statement by adopting a similarity algorithm.
After the base standard words or other base standard words are replaced, the server judges whether the replaced sentences conform to the logic, so that the server needs to calculate the first intention matching degree of the first calibration sentence and the second intention matching degree of the second calibration sentence. Here, the calculation of the first intention matching degree uses a cosine similarity algorithm, and the calculation formula of the first intention matching degree is as follows:
Figure BDA0002553258220000151
in the formula, cos (θ) represents a first intention matching degree, n represents the number of times the first intention matching degree is calculated, i represents an ith intention matching degree, wiDenotes the ith preset keyword, diRepresenting the ith first calibration statement. The server firstly inputs basic standard words and first calibration sentences into a Word2vec network model and a Doc2vec network model in sequence, and a first intention matching degree is obtained through calculation by a cosine similarity algorithm in two model networks. The method for calculating the second intention matching degree is the same as the method for calculating the first intention matching degree, and therefore, the description thereof is omitted.
After the first intention matching degree and the second intention matching degree are obtained, the server needs to compare the numerical relationship between the first intention matching degree and the second intention matching degree, check whether the second intention matching degree is higher than the first intention matching degree, and if the matching degree is higher, the higher the logic accuracy of the statement is, the more correct the step of replacing the keyword by the basic standard word is executed, otherwise, the step of replacing the keyword by the basic standard word is meaningless.
207. And if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than the matching threshold, determining the second calibration statement as the output statement, otherwise, determining the first calibration statement as the output statement.
And if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than the matching threshold, the server determines the second calibration statement as the output statement, and otherwise, determines the first calibration statement as the output statement.
After obtaining the first intention matching degree and the second intention matching degree, the server compares the first intention matching degree with the second intention matching degree, when the second intention matching degree is greater than the first intention matching degree, the second corrected sentence corresponding to the second intention matching degree is more consistent with the logical relationship of the language, the obtained second intention matching degree is also greater than a matching threshold, the matching threshold refers to a basic threshold that the sentence is consistent with the logical relationship of the language, if the intention matching degree is not less than or equal to the matching threshold, the basic language logic of the sentence is not passed, the sentence cannot be identified as a logically clear sentence, the server outputs the sentence without replacing the keyword as an output sentence, namely the second intention matching degree is greater than the first intention matching degree, and the second intention matching degree is greater than the matching threshold, the second corrected sentence is used as the output sentence, otherwise, the first calibration statement is used as an output statement. Such operation also ensures that the output sentence can restore the meaning expressed by the target voice as much as possible on the basis of considering the relevance of the target voice context.
In the embodiment of the invention, the target sentence is corrected according to the above sentence of the target voice, and then the output sentence is determined according to the preset threshold, so that the character error recognition rate during voice recognition is reduced, and the conversion efficiency of voice recognition is improved.
With reference to fig. 3, the method for calibrating a speech recognition result in the embodiment of the present invention is described above, and a device for calibrating a speech recognition result in the embodiment of the present invention is described below, where an embodiment of the device for calibrating a speech recognition result in the embodiment of the present invention includes:
the obtaining and converting module 301 is configured to obtain multiple pieces of target voices based on a voice recognition algorithm, and convert the multiple pieces of target voices into characters to obtain multiple pieces of initial sentences;
a screening and replacing module 302, configured to screen a plurality of keywords in a target sentence through a fuzzy matching algorithm, and replace the plurality of keywords with a plurality of basic standard words according to a conversion threshold, so as to obtain a first calibration sentence, where the target sentence is any one of a plurality of initial sentences, and the basic standard words are common words in service data;
a calibration module 303, configured to perform matching calibration on the first calibration statement according to other standard words in the previous statement to obtain a second calibration statement, where the previous statement is a previous statement of the first calibration statement, and the other standard words are common words in the service data except for the basic standard word;
a calculating module 304, configured to calculate a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement respectively by using a similarity algorithm;
the output module 305 is configured to determine the second calibration statement as the output statement if the second intention matching degree is greater than the first intention matching degree and the value of the second intention matching degree is greater than the matching threshold, otherwise, determine the first calibration statement as the output statement.
In the embodiment of the invention, the target sentence is corrected according to the above sentence of the target voice, and then the output sentence is determined according to the preset threshold, so that the character error recognition rate during voice recognition is reduced, and the conversion efficiency of voice recognition is improved.
Referring to fig. 4, another embodiment of the calibration apparatus for speech recognition results according to the embodiment of the present invention includes:
the obtaining and converting module 301 is configured to obtain multiple pieces of target voices based on a voice recognition algorithm, and convert the multiple pieces of target voices into characters to obtain multiple pieces of initial sentences;
a screening and replacing module 302, configured to screen a plurality of keywords in a target sentence through a fuzzy matching algorithm, and replace the plurality of keywords with a plurality of basic standard words according to a conversion threshold, so as to obtain a first calibration sentence, where the target sentence is any one of a plurality of initial sentences, and the basic standard words are common words in service data;
a calibration module 303, configured to perform matching calibration on the first calibration statement according to other standard words in the previous statement to obtain a second calibration statement, where the previous statement is a previous statement of the first calibration statement, and the other standard words are common words in the service data except for the basic standard word;
a calculating module 304, configured to calculate a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement respectively by using a similarity algorithm;
the output module 305 is configured to determine the second calibration statement as the output statement if the second intention matching degree is greater than the first intention matching degree and the value of the second intention matching degree is greater than the matching threshold, otherwise, determine the first calibration statement as the output statement.
Optionally, the obtaining and converting module 301 includes:
an extracting unit 3011, configured to obtain multiple pieces of target speech based on a speech recognition algorithm, and extract speech features in the multiple pieces of target speech;
a converting unit 3012, configured to convert the speech features into phoneme information through a preset acoustic model, where the phoneme information is used to indicate a minimum speech unit constituting a syllable;
a matching unit 3013, configured to match the corresponding text information with the phoneme information to obtain multiple initial sentences.
Optionally, the matching unit 3013 may be further specifically configured to:
matching character information corresponding to the phoneme information in a preset dictionary, wherein the character information comprises single characters or words;
acquiring the association probability of the character information from the preset association probability, and extracting the character information with the maximum association probability as a target character, wherein the preset association probability is used for indicating the probability of the mutual association between any two single characters or words;
and combining the target characters together according to the arrangement sequence to obtain a plurality of initial sentences, wherein the number of the plurality of initial sentences is the same as that of the plurality of target voices.
Optionally, the screening and replacing module 302 may be further specifically configured to:
converting the target sentence into a pinyin sentence through a fuzzy matching algorithm;
screening out a target phonetic symbol in the pinyin sentence, and converting the target phonetic symbol into a near phonetic symbol to obtain a converted pinyin sentence, wherein the target phonetic symbol comprises a simple or compound vowel and/or an initial consonant which are easy to be confused;
extracting a plurality of keywords with near phonetic symbols in the converted pinyin sentences, and calculating the similarity between the keywords and corresponding basic standard words, wherein the basic standard words are common words in the service data;
and when the numerical value of the target similarity is larger than the replacement threshold, replacing the keywords corresponding to the target similarity with the corresponding basic standard words to obtain a first calibration statement.
Optionally, the calibration module 303 includes:
a first determining unit 3031, configured to determine whether the above statement of the first calibration statement includes other standard words;
a second determining unit 3032, configured to determine whether the first calibration sentence includes keywords with similar properties corresponding to other standard words if the sentence includes other standard words, where the keywords with similar properties include a near-meaning keyword and a homophonic keyword;
the replacing unit 3033, if the first calibration sentence includes the keywords with similar properties, is configured to replace the keywords with similar properties with the corresponding other standard words, so as to obtain a second calibration sentence.
Optionally, the second determining unit 3032 may be further specifically configured to:
if the above sentence includes other standard words, calculating a plurality of intention similarities between the other standard words and the first calibration sentence;
judging whether the first calibration statement comprises a near sense keyword or not based on a first preset algorithm and the target intention similarity;
if the first calibration statement does not include the near-meaning key words, converting the first calibration statement into a calibration pinyin statement, and calculating a plurality of pinyin similarities between the calibration pinyin statement and pinyins of other standard words;
and judging whether the pinyin sentence is calibrated to include homophonic keywords or not based on a second preset algorithm and the target pinyin similarity.
Optionally, the calculating module 304 may be further specifically configured to:
extracting a basic standard word in the first calibration statement;
calculating a first intention matching degree between the basic standard words and the first calibration sentences by adopting a similarity algorithm, wherein the first intention matching degree is used for indicating a matching value of the preset keywords according with the expression meaning of the first calibration sentences;
extracting other standard words in the second calibration statement;
and calculating a second intention matching degree between the other standard words and the second calibration sentences by adopting a similarity algorithm.
In the embodiment of the invention, the target sentence is corrected according to the above sentence of the target voice, and then the output sentence is determined according to the preset threshold, so that the character error recognition rate during voice recognition is reduced, and the conversion efficiency of voice recognition is improved.
Fig. 3 and 4 describe the calibration apparatus for the speech recognition result in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the calibration apparatus for the speech recognition result in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of a calibration apparatus for speech recognition results 500 according to an embodiment of the present invention, which may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing applications 533 or data 532, and may generate relatively large differences due to different configurations or performances. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the calibration apparatus 500 for speech recognition results. Further, the processor 510 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the calibration apparatus 500 for the speech recognition result.
The calibration facility 500 for speech recognition results may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. It will be appreciated by those skilled in the art that the configuration of the calibration device for speech recognition results shown in fig. 5 does not constitute a limitation of the calibration device for speech recognition results, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the calibration method for speech recognition results.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for calibrating a speech recognition result, the method comprising:
acquiring a plurality of target voices based on a voice recognition algorithm, and converting the target voices into characters to obtain a plurality of initial sentences;
screening a plurality of keywords in a target sentence through a fuzzy matching algorithm, and replacing the plurality of keywords with a plurality of basic standard words according to a conversion threshold value to obtain a first calibration sentence, wherein the target sentence is any one of the plurality of initial sentences, and the basic standard words are common words in service data;
performing matching calibration on the first calibration statement according to other standard words in the previous statement to obtain a second calibration statement, wherein the previous statement is a previous statement of the first calibration statement, and the other standard words are common words except the basic standard words in the service data;
respectively calculating a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement by adopting a similarity algorithm;
and if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than a matching threshold, determining the second calibration statement as an output statement, otherwise, determining the first calibration statement as an output statement.
2. The method of calibrating speech recognition results of claim 1, wherein the obtaining a plurality of target speeches based on a speech recognition algorithm and converting the plurality of target speeches into text to obtain a plurality of initial sentences comprises:
acquiring a plurality of target voices based on a voice recognition algorithm, and extracting voice features in the target voices;
converting the voice features into phoneme information through a preset acoustic model, wherein the phoneme information is used for indicating the minimum voice unit forming the syllable;
and matching the corresponding text information by using the phoneme information to obtain a plurality of initial sentences.
3. The method of claim 2, wherein the obtaining a plurality of initial sentences by matching the phoneme information with the corresponding text information comprises:
matching character information corresponding to the phoneme information in a preset dictionary, wherein the character information comprises a single character or a word;
acquiring the association probability of the character information from preset association probabilities, and extracting the character information with the maximum association probability as a target character, wherein the preset association probability is used for indicating the probability of mutual association between any two single characters or words;
and combining the target characters together according to the arrangement sequence to obtain a plurality of initial sentences, wherein the number of the plurality of initial sentences is the same as that of the plurality of target voices.
4. The method of calibrating speech recognition results according to claim 1, wherein the screening a plurality of keywords in a target sentence by a fuzzy matching algorithm, and replacing the plurality of keywords with a plurality of basic standard words according to a conversion threshold to obtain a first calibration sentence comprises:
converting the target sentence into a pinyin sentence through a fuzzy matching algorithm;
screening out a target phonetic symbol in the pinyin sentence, and converting the target phonetic symbol into a near phonetic symbol to obtain a converted pinyin sentence, wherein the target phonetic symbol comprises a simple or compound vowel and/or an initial consonant which are easy to be confused;
extracting a plurality of keywords with near phonetic symbols in the converted pinyin sentences, and calculating the similarity between the keywords and corresponding basic standard words, wherein the basic standard words are common words in service data;
and when the numerical value of the target similarity is larger than the replacement threshold, replacing the keywords corresponding to the target similarity with the corresponding basic standard words to obtain a first calibration statement.
5. The method of calibrating speech recognition results according to claim 1, wherein the performing matching calibration on the first calibration sentence according to other standard words in the above sentence to obtain the second calibration sentence comprises:
judging whether the previous sentence of the first calibration sentence comprises other standard words or not;
if the above sentence comprises the other standard words, judging whether the first calibration sentence comprises keywords with similar properties corresponding to the other standard words, wherein the keywords with similar properties comprise similar meaning keywords and homophonic keywords;
and if the first calibration statement comprises the keywords with similar properties, replacing the keywords with similar properties with other corresponding standard words to obtain a second calibration statement.
6. The method of claim 5, wherein if the above sentence includes the other standard words, the determining whether the first calibration sentence includes keywords having similar properties to the other standard words comprises:
if the above sentence comprises the other standard words, calculating a plurality of intention similarities between the other standard words and the first calibration sentence;
judging whether the first calibration statement comprises a near sense keyword or not based on a first preset algorithm and the target intention similarity;
if the first calibration statement does not include the near-meaning keyword, converting the first calibration statement into a calibration pinyin statement, and calculating a plurality of pinyin similarities between the calibration pinyin statement and the pinyins of the other standard words;
and judging whether the calibrated pinyin sentence comprises homophonic keywords or not based on a second preset algorithm and the target pinyin similarity.
7. The method of calibrating speech recognition results according to claim 6, wherein said calculating a first degree of matching of intention of the first calibration sentence and a second degree of matching of intention of the second calibration sentence using a similarity algorithm comprises:
extracting a basic standard word in the first calibration statement;
calculating a first intention matching degree between the basic standard words and the first calibration sentences by adopting a similarity algorithm, wherein the first intention matching degree is used for indicating that the preset keywords accord with a matching value of the first calibration sentence expression meaning;
extracting other standard words in the second calibration statement;
and calculating a second intention matching degree between the other standard words and the second calibration sentences by adopting the similarity algorithm.
8. A device for calibrating a result of speech recognition, the device comprising:
the acquisition and conversion module is used for acquiring a plurality of target voices based on a voice recognition algorithm and converting the target voices into characters to obtain a plurality of initial sentences;
the system comprises a screening and replacing module, a first calibration statement and a second calibration statement, wherein the screening and replacing module is used for screening a plurality of keywords in a target statement through a fuzzy matching algorithm and replacing the keywords with a plurality of basic standard words according to a conversion threshold value to obtain the first calibration statement, the target statement is any one of the initial statements, and the basic standard words are common words in service data;
a calibration module, configured to perform matching calibration on the first calibration statement according to other standard words in an upper statement to obtain a second calibration statement, where the upper statement is a previous statement of the first calibration statement, and the other standard words are common words in the service data except for the basic standard word;
a calculation module, configured to calculate a first intention matching degree of the first calibration statement and a second intention matching degree of the second calibration statement respectively by using a similarity algorithm;
and the output module is used for determining the second calibration statement as an output statement if the second intention matching degree is greater than the first intention matching degree and the numerical value of the second intention matching degree is greater than a matching threshold, otherwise, determining the first calibration statement as an output statement.
9. A calibration apparatus for a speech recognition result, characterized by comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invokes the instructions in the memory to cause the calibration device of speech recognition results to perform the calibration method of speech recognition results according to any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of calibrating speech recognition results according to any one of claims 1 to 7.
CN202010581203.XA 2020-06-23 2020-06-23 Method, device and equipment for calibrating voice recognition result and storage medium Pending CN111696557A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010581203.XA CN111696557A (en) 2020-06-23 2020-06-23 Method, device and equipment for calibrating voice recognition result and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010581203.XA CN111696557A (en) 2020-06-23 2020-06-23 Method, device and equipment for calibrating voice recognition result and storage medium

Publications (1)

Publication Number Publication Date
CN111696557A true CN111696557A (en) 2020-09-22

Family

ID=72483500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010581203.XA Pending CN111696557A (en) 2020-06-23 2020-06-23 Method, device and equipment for calibrating voice recognition result and storage medium

Country Status (1)

Country Link
CN (1) CN111696557A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112114926A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Page operation method, device, equipment and medium based on voice recognition
CN112151014A (en) * 2020-11-04 2020-12-29 平安科技(深圳)有限公司 Method, device and equipment for evaluating voice recognition result and storage medium
CN112417102A (en) * 2020-11-26 2021-02-26 中国科学院自动化研究所 Voice query method, device, server and readable storage medium
CN112435512A (en) * 2020-11-12 2021-03-02 郑州大学 Voice behavior assessment and evaluation method for rail transit simulation training
CN112541342A (en) * 2020-12-08 2021-03-23 北京百度网讯科技有限公司 Text error correction method and device, electronic equipment and storage medium
CN112562684A (en) * 2020-12-08 2021-03-26 维沃移动通信有限公司 Voice recognition method and device and electronic equipment
CN112634903A (en) * 2020-12-15 2021-04-09 平安科技(深圳)有限公司 Quality inspection method, device, equipment and storage medium of service voice
CN112836039A (en) * 2021-01-27 2021-05-25 成都网安科技发展有限公司 Voice data processing method and device based on deep learning
CN113360623A (en) * 2021-06-25 2021-09-07 达闼机器人有限公司 Text matching method, electronic device and readable storage medium
CN113408274A (en) * 2021-07-13 2021-09-17 北京百度网讯科技有限公司 Method for training language model and label setting method
CN114328389A (en) * 2021-12-31 2022-04-12 浙江汇鼎华链科技有限公司 Big data file analysis processing system and method under cloud computing environment
CN114724544A (en) * 2022-04-13 2022-07-08 北京百度网讯科技有限公司 Voice chip, voice recognition method, device and equipment and intelligent automobile
CN115797878A (en) * 2023-02-13 2023-03-14 中建科技集团有限公司 Equipment operation safety detection method and system based on image processing and related equipment
CN116578675A (en) * 2023-07-11 2023-08-11 北京中关村科金技术有限公司 Statement intention correction method and device, electronic equipment and storage medium
CN117729677A (en) * 2023-12-20 2024-03-19 广州市安贝电子有限公司 Stage lamp calibration system, method, equipment and medium

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112114926A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Page operation method, device, equipment and medium based on voice recognition
CN112151014A (en) * 2020-11-04 2020-12-29 平安科技(深圳)有限公司 Method, device and equipment for evaluating voice recognition result and storage medium
CN112151014B (en) * 2020-11-04 2023-07-21 平安科技(深圳)有限公司 Speech recognition result evaluation method, device, equipment and storage medium
CN112435512A (en) * 2020-11-12 2021-03-02 郑州大学 Voice behavior assessment and evaluation method for rail transit simulation training
CN112435512B (en) * 2020-11-12 2023-01-24 郑州大学 Voice behavior assessment and evaluation method for rail transit simulation training
CN112417102A (en) * 2020-11-26 2021-02-26 中国科学院自动化研究所 Voice query method, device, server and readable storage medium
CN112417102B (en) * 2020-11-26 2024-03-22 中国科学院自动化研究所 Voice query method, device, server and readable storage medium
CN112541342A (en) * 2020-12-08 2021-03-23 北京百度网讯科技有限公司 Text error correction method and device, electronic equipment and storage medium
CN112562684A (en) * 2020-12-08 2021-03-26 维沃移动通信有限公司 Voice recognition method and device and electronic equipment
CN112634903A (en) * 2020-12-15 2021-04-09 平安科技(深圳)有限公司 Quality inspection method, device, equipment and storage medium of service voice
CN112634903B (en) * 2020-12-15 2023-09-29 平安科技(深圳)有限公司 Quality inspection method, device, equipment and storage medium for service voice
CN112836039A (en) * 2021-01-27 2021-05-25 成都网安科技发展有限公司 Voice data processing method and device based on deep learning
CN113360623A (en) * 2021-06-25 2021-09-07 达闼机器人有限公司 Text matching method, electronic device and readable storage medium
CN113408274A (en) * 2021-07-13 2021-09-17 北京百度网讯科技有限公司 Method for training language model and label setting method
CN114328389A (en) * 2021-12-31 2022-04-12 浙江汇鼎华链科技有限公司 Big data file analysis processing system and method under cloud computing environment
CN114328389B (en) * 2021-12-31 2022-06-17 浙江汇鼎华链科技有限公司 Big data file analysis processing system and method under cloud computing environment
CN114724544A (en) * 2022-04-13 2022-07-08 北京百度网讯科技有限公司 Voice chip, voice recognition method, device and equipment and intelligent automobile
CN115797878B (en) * 2023-02-13 2023-05-23 中建科技集团有限公司 Equipment operation safety detection method and system based on image processing and related equipment
CN115797878A (en) * 2023-02-13 2023-03-14 中建科技集团有限公司 Equipment operation safety detection method and system based on image processing and related equipment
CN116578675A (en) * 2023-07-11 2023-08-11 北京中关村科金技术有限公司 Statement intention correction method and device, electronic equipment and storage medium
CN117729677A (en) * 2023-12-20 2024-03-19 广州市安贝电子有限公司 Stage lamp calibration system, method, equipment and medium

Similar Documents

Publication Publication Date Title
CN111696557A (en) Method, device and equipment for calibrating voice recognition result and storage medium
CN109255113B (en) Intelligent proofreading system
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
US7496512B2 (en) Refining of segmental boundaries in speech waveforms using contextual-dependent models
US6836760B1 (en) Use of semantic inference and context-free grammar with speech recognition system
KR100277694B1 (en) Automatic Pronunciation Dictionary Generation in Speech Recognition System
CN113707125B (en) Training method and device for multi-language speech synthesis model
Qian et al. Capturing L2 segmental mispronunciations with joint-sequence models in computer-aided pronunciation training (CAPT)
CN112151014A (en) Method, device and equipment for evaluating voice recognition result and storage medium
KR101424193B1 (en) System And Method of Pronunciation Variation Modeling Based on Indirect data-driven method for Foreign Speech Recognition
US8219386B2 (en) Arabic poetry meter identification system and method
CN113744722A (en) Off-line speech recognition matching device and method for limited sentence library
CN110853669B (en) Audio identification method, device and equipment
US20080120108A1 (en) Multi-space distribution for pattern recognition based on mixed continuous and discrete observations
CN115240655A (en) Chinese voice recognition system and method based on deep learning
Deng et al. Transitional speech units and their representation by regressive Markov states: Applications to speech recognition
CN115312030A (en) Display control method and device of virtual role and electronic equipment
CN106297769A (en) A kind of distinctive feature extracting method being applied to languages identification
Azim et al. Large vocabulary Arabic continuous speech recognition using tied states acoustic models
Carofilis et al. Improvement of accent classification models through Grad-Transfer from Spectrograms and Gradient-weighted Class Activation Mapping
Mehra et al. Improving word recognition in speech transcriptions by decision-level fusion of stemming and two-way phoneme pruning
Biswas et al. Spoken language identification of Indian languages using MFCC features
Wang et al. A multi-space distribution (MSD) approach to speech recognition of tonal languages
US10783873B1 (en) Native language identification with time delay deep neural networks trained separately on native and non-native english corpora
Ijima et al. Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination