WO2022126969A1 - 业务语音的质检方法、装置、设备及存储介质 - Google Patents

业务语音的质检方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022126969A1
WO2022126969A1 PCT/CN2021/090410 CN2021090410W WO2022126969A1 WO 2022126969 A1 WO2022126969 A1 WO 2022126969A1 CN 2021090410 W CN2021090410 W CN 2021090410W WO 2022126969 A1 WO2022126969 A1 WO 2022126969A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
text
detected
service
declaration
Prior art date
Application number
PCT/CN2021/090410
Other languages
English (en)
French (fr)
Inventor
石英伦
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022126969A1 publication Critical patent/WO2022126969A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Definitions

  • the present application relates to the field of speech semantics in artificial intelligence, and in particular, to a quality inspection method, device, device and storage medium for business speech.
  • the telephone service voice quality inspection is mainly used to check whether there are phenomena such as inducement, abusive customers, and regulated liability statements in the communication process of telephone service personnel, so as to avoid customer complaints or legal risks caused by irregular voice behaviors during the communication process.
  • the existing quality inspection methods for the voice quality inspection of the telephone service are mainly to monitor the voice of the telephone to determine whether there is any illegal (risk) voice in the monitored content.
  • the present application provides a quality inspection method, device, equipment and storage medium for service voice, which are used to improve the accuracy and efficiency of quality inspection of service voice.
  • a first aspect of the present application provides a service voice quality inspection method, comprising: acquiring service voice data, encoding the service voice data with an encoder, obtaining the coded voice data, calculating the coded voice data and preset
  • the basic similarity between the declared coded data, according to the value of the basic similarity, the to-be-detected declared voice coded data is screened in the coded voice data; based on the voice recognition algorithm, the to-be-detected declared voice coded data is converted into the to-be-detected voice coded data Declaration text, use the bert network model to generate multiple to-be-detected declaration sentence vectors of the to-be-detected declaration text; calculate the basic similarity probability value between each to-be-detected declaration sentence vector and the standard declaration sentence vector, and set the basic similarity probability to be greater than the standard
  • the statement text to be detected corresponding to the threshold value is determined as the target text to be corrected; the fuzzy matching algorithm is used to correct the error of the target text to be corrected to obtain
  • a second aspect of the present application provides a service voice quality inspection device, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor executes the The following steps are implemented during the computer-readable instruction: obtaining service voice data, and utilizing an encoder to encode the service voice data, obtaining coded voice data, and calculating the basic similarity between the coded voice data and the preset declaration coded data , according to the numerical value of the basic similarity, the voice coding data to be detected is screened in the coded voice data; based on the voice recognition algorithm, the voice coding data to be detected is converted into the text to be detected, and the bert network model is used to generate all Describe multiple to-be-detected declaration sentence vectors of the declaration text to be detected; calculate the basic similarity probability value between each to-be-detected declaration sentence vector and the standard declaration sentence vector, and determine the to-be-detected declaration text corresponding to the basic similarity probability greater than the standard threshold is the target text to be corrected;
  • a third aspect of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer is caused to perform the following steps: acquiring service voice data, And utilize the encoder to encode the business voice data, obtain the encoded voice data, calculate the basic similarity between the encoded voice data and the preset declaration encoded data, and use the numerical value of the basic similarity in the encoded voice data.
  • the speech encoding data of the declaration to be detected is converted into the declaration text to be detected, and the bert network model is used to generate a plurality of declaration sentence vectors to be detected of the declaration text to be detected; Calculate the basic similarity probability value between each declarative sentence vector to be detected and the standard declarative sentence vector, and determine the declarative text to be detected whose basic similarity probability is greater than the standard threshold as the target text to be corrected; Perform error correction on the target text to be corrected to obtain the text to be determined, screen preset keywords in the text to be determined, and determine the service voice data if the preset keywords exist in the text to be determined There is risk voice data of declaration type; the business identification intent corresponding to the business voice data is generated by an intent recognition algorithm, and it is judged whether there is a preset risk intent in the business identification intent, if there is a preset risk intent in the business identification intent, Then it is determined that the service voice data has semantic risk voice
  • a fourth aspect of the present application provides a service voice quality inspection device, comprising: a screening module for acquiring service voice data, and using an encoder to encode the service voice data to obtain the coded voice data, and calculating the code
  • the basic similarity between the voice data and the preset declaration coded data, according to the numerical value of the basic similarity, the to-be-detected declaration voice coded data is screened in the coded voice data; the conversion module is used for converting the said voice recognition algorithm based on the voice recognition algorithm.
  • the speech coding data of the declaration to be detected is converted into the declaration text to be detected, and the bert network model is used to generate a plurality of declaration sentence vectors to be detected of the declaration text to be detected; the determination module is used to calculate each declarative sentence vector to be detected and the standard declaration sentence.
  • the basic similarity probability value between the vectors, and the statement text to be detected whose basic similarity probability is greater than the standard threshold is determined as the target text to be corrected; the first judgment module is used for using the fuzzy matching algorithm.
  • a second determination module configured to generate a business identification intent corresponding to the business voice data through an intent identification algorithm, and determine whether there is a preset risk intent in the business identification intent, and if there is a preset risk in the business identification intent Intention, it is determined that the service voice data has semantic risk voice data.
  • the service voice data is obtained, and an encoder is used to encode the service voice data to obtain the coded voice data, and the basic similarity between the coded voice data and the preset declaration coded data is calculated, and the basic similarity is calculated according to the basic similarity.
  • the numerical value of the coded speech data is used to screen the speech coding data of the statement to be detected; based on the speech recognition algorithm, the speech coded data of the statement to be detected is converted into the statement text to be detected, and the bert network model is used to generate a plurality of statement sentence vectors to be detected of the statement text to be detected.
  • the voice coding data of the declaration to be detected is filtered out after encoding the service voice data, the speech recognition algorithm and the bert network model are used to generate the vector of the declaration sentence to be detected corresponding to the voice coding data of the declaration to be detected, and the fuzzy matching algorithm is used.
  • the quality inspection result of the voice data improves the accuracy and efficiency of the quality inspection of the business voice.
  • FIG. 1 is a schematic diagram of an embodiment of a quality inspection method for service voice in an embodiment of the application
  • FIG. 2 is a schematic diagram of another embodiment of the quality inspection method for service voice in the embodiment of the present application.
  • FIG. 3 is a schematic diagram of an embodiment of a quality inspection apparatus for service voice in an embodiment of the present application
  • FIG. 4 is a schematic diagram of another embodiment of the apparatus for quality inspection of service voice in the embodiment of the present application.
  • FIG. 5 is a schematic diagram of an embodiment of a quality inspection device for service voice in an embodiment of the present application.
  • Embodiments of the present application provide a quality inspection method, device, device, and storage medium for service voice, which are used to improve the accuracy and efficiency of quality inspection of service voice.
  • an embodiment of the quality inspection method of the service voice in the embodiment of the present application includes:
  • the execution subject of the present application may be a quality inspection device for service voice, and may also be a terminal or a server, which is not specifically limited here.
  • the embodiments of the present application take the server as an execution subject as an example for description.
  • the main quality inspection contents of the voice quality inspection of telephone services are divided into voice quality inspection of declaration services and voice quality inspection of semantic services.
  • the corresponding quality inspection points of voice quality inspection of declaration services can be: when a user signs or agrees to an agreement, It is necessary to clarify the terms and items corresponding to the agreement, the corresponding fees, and the follow-up operation plan.
  • the quality inspection points of the declaration-type business voice usually have standard terms and keywords that have been confirmed by the legal compliance department.
  • the business voice data needs to have corresponding Only standard words and keywords can be used to prove that the business voice data is not declared risk voice data.
  • the encoder When processing the business voice data, it is first necessary to use the encoder to encode the business voice data to obtain the encoded voice data. This process converts the business voice data into digital encoding, so that the computer can directly process the encoded voice data. Then the server extreme encodes the basic similarity between the voice data and the preset statement encoded data.
  • the similarity algorithm is used here.
  • the preset statement encoded data refers to the terms, corresponding fees and subsequent operations that need to be specified in the agreement.
  • the encoded data of the voice data corresponding to the scheme, etc. the higher the value of the basic similarity between the two, it means that the two are similar, that is, it proves that there are clauses, corresponding fees and follow-up items in the business voice data that need to be clearly agreed. operation plan, etc.
  • the above-mentioned service voice data may also be stored in a node of a blockchain.
  • the server uses a speech recognition algorithm to convert the speech coded data of the statement to be detected into the text of the statement to be detected, that is, converts the voice data into text data, and can further detect the text of the statement to be detected.
  • the speech recognition algorithm is a conventional technical means in the technical field, so it is not repeated here.
  • the server uses the bert network model to generate multiple statement sentence vectors to be detected for the statement text to be detected.
  • BERT Bidirectional Encoder Representations from Transformers
  • the bert network model is used to generate a plurality of to-be-detected declaration sentence vectors of the to-be-detected declaration text.
  • the server needs to further calculate the basic similarity probability value between each declarative sentence vector to be detected and the standard declarative sentence vector, and determine the declarative text to be detected whose basic similarity probability is greater than the standard threshold as the target text to be corrected.
  • There are certain errors in the process of converting the text and there may be grammatical errors or wording errors in the converted text, for example: transcribing "agree” into “friend”, transcribing "interest” into “Li Xi”, etc. Transcription errors will affect the accuracy of the model. Therefore, the server needs to further determine which text data in the statement text to be detected is the text that requires the next operation to obtain the target text to be corrected.
  • the server After the server determines the target text to be corrected, it can use the fuzzy matching algorithm to correct the target text to be corrected.
  • the principle of the fuzzy matching algorithm is to convert the target text to be corrected into its corresponding pinyin.
  • the target phonetic symbols are converted into similar phonetic symbols, so that a variety of sentences that are close to the target text to be corrected can be obtained, and a variety of possibilities for recognizing the sentence can be obtained, and then a most standard sentence is selected from these possible sentences ( match with the standard text in the preset dictionary), the corrected text to be determined will be obtained.
  • the server After obtaining the corrected text to be determined, the server directly performs screening of preset keywords in the text to be determined. If there are preset keywords in the text to be determined, it is determined that the business voice data contains declaration-type risk voice data.
  • 105 Generate a business identification intent corresponding to the business voice data through an intent identification algorithm, and determine whether there is a preset risk intent in the business identification intent. If there is a preset risk intent in the business identification intent, determine that the business voice data has semantic risk voice data. .
  • the server When the server performs semantic quality inspection on the business voice data, it needs to generate the business identification intent corresponding to the business voice data through the intent identification algorithm, and then judge whether there is a preset risk intent in the business identification intent. If there is a preset risk in the business identification intent Intention, it means that there is risk voice data in the business voice data, and it is directly determined that the business voice data has semantic risk voice data.
  • the voice coding data of the declaration to be detected is filtered out after encoding the service voice data, the speech recognition algorithm and the bert network model are used to generate the vector of the declaration sentence to be detected corresponding to the voice coding data of the declaration to be detected, and the fuzzy matching algorithm is used.
  • the quality inspection result of the voice data improves the accuracy and efficiency of the quality inspection of the business voice.
  • another embodiment of the quality inspection method for service voice in the embodiment of the present application includes:
  • the server first obtains the service voice data, uses the Nyquist sampling algorithm to sample the service voice data, and obtains the service voice waveform; secondly, the server quantizes the service voice waveform to obtain the quantized voice data, and converts the quantized voice data into digital pulse to generate coded voice data; then the server uses the similarity algorithm to calculate the similarity value between the coded voice data and the preset declaration coded data to obtain the basic similarity; finally, the server calculates the coded voice corresponding to the basic similarity with the largest value
  • the data is determined to be the speech coding data of the declaration to be detected.
  • sampling is to take several representative sample values from an analog signal that changes continuously in time to represent the continuously changing analog signal.
  • Nyquist sampling theorem The sampling value sequence is completely restored to the original waveform, and the sampling frequency must be greater than twice the highest frequency of the original signal. Only when the sampling frequency is greater than twice the highest frequency of the signal can the aliasing phenomenon be avoided, and the service voice waveform can be obtained.
  • the quantization process is to divide the sampled signal into a set of finite sections according to the amplitude of the entire sound wave, and classify the samples that fall into a certain section into one category. , and assign the same quantization value, it is common to have 8bit and 16bit to divide the vertical axis.
  • the non-uniform quantization method is used here to quantize the service voice waveform, thereby obtaining quantized voice data.
  • the quantized speech data obtained after sampling and quantization is not a digital signal, so it is necessary to convert the quantized speech data into digital pulses, and this conversion process is encoding, thereby obtaining encoded speech data.
  • Sampling, quantization, and encoding are the basic processing of audio.
  • the server can use the similarity algorithm to calculate the similarity value between the encoded speech data and the preset statement encoded data to obtain the basic similarity.
  • the preset statement encoded data is the encoded speech corresponding to the standard vocabulary and keywords in the statement quality inspection. Data, calculate the similarity value between the two, can further determine the part of the business voice that needs to be declared quality inspection.
  • the encoded data of the preset statement here is the encoded data corresponding to the voice of the preset statement, and the content of the preset statement voice is the terms or instructions that the user needs to clarify, such as: the amount of your loan is XX, the loan period is XX, the repayment period is XX.
  • the repayment method is to repay the principal plus fees on a monthly basis, and the monthly repayment amount decreases month by month; you need to repay XX yuan in the first month and XX yuan in the last month.
  • the specific repayment amount is based on the actual monthly repayment notice as You can open the repayment schedule to view the monthly repayment amount.
  • the above-mentioned service voice data may also be stored in a node of a blockchain.
  • the server first obtains the voice data of the service to be detected corresponding to the voice coding data of the declaration to be detected based on the voice recognition algorithm, and extracts the voice features in the corresponding voice data of the service to be detected; then the server converts the voice features into phoneme information, wherein the phoneme The information is used to indicate the smallest phonetic unit that constitutes a syllable; finally, the server matches the text information that is the same as the phoneme information in the preset dictionary to obtain the statement text to be detected.
  • the speech recognition algorithm is used to convert the speech code of the statement to be detected into the text of the statement to be detected.
  • the server first extracts the speech features in the speech data of the service to be detected, and then converts the speech features into phoneme information, where the phoneme information is used to indicate the composition
  • the phoneme information is the smallest phonetic unit divided according to the natural attributes of the voice. It is analyzed according to the pronunciation action in the syllable. An action constitutes a phoneme.
  • the text information corresponding to the phoneme information is matched in the preset dictionary, and the to-be-detected declaration text corresponding to the voice coding data of the declaration to be detected is generated.
  • the preset dictionary includes standard words or sentences and their corresponding phonemes.
  • the corresponding text information to be detected can be obtained by matching the phoneme information with the corresponding text information.
  • the server first obtains multiple statement sequences of the statement text to be detected, and adds preset marker characters at the initial position of each statement sequence to obtain multiple first marker sequences; then the server selects between two adjacent first marker sequences. Preset interval characters are added in between to obtain multiple second marker sequences; finally, the server uses the bert network model to train multiple second marker sequences to generate multiple declarative sentence vectors to be detected.
  • the server uses the bert network model to generate multiple statement sentence vectors of the statement text to be detected.
  • the server obtains multiple statement sequences of the statement text to be detected, and adds a preset to the initial position of each statement sequence.
  • the marker character, the preset marker character is [CLS], which is mainly used to store the semantic information of the entire input sequence, and then obtain multiple first marker sequences, and then the server adds a preset between two adjacent first marker sequences.
  • the preset interval character is [SEP], this character is mainly used to store the declaration sentence vectors to be detected with different intervals, obtain multiple second marker sequences, and use the bert network model to train multiple second marker sequences , the corresponding multiple declarative sentence vectors to be detected can be generated.
  • the server needs to further calculate the basic similarity probability value between each declarative sentence vector to be detected and the standard declarative sentence vector, and determine the declarative text to be detected whose basic similarity probability is greater than the standard threshold as the target text to be corrected.
  • There are certain errors in the process of converting the text and there may be grammatical errors or wording errors in the converted text, for example: transcribing "agree” into “friend”, transcribing "interest” into “Li Xi”, etc. Transcription errors will affect the accuracy of the model. Therefore, the server needs to further determine which text data in the statement text to be detected is the text that requires the next operation to obtain the target text to be corrected.
  • the server first uses a fuzzy matching algorithm to convert the target text to be corrected into a phonetic sentence to be corrected, selects the target phonetic symbol from the phonetic sentence to be corrected, converts the target phonetic symbol into a similar phonetic symbol, and generates a converted pinyin sentence, wherein , the target phonetic symbol includes the consonant and/or initial consonant; secondly, the server extracts the error correction text corresponding to the similar phonetic symbol in the converted phonetic sentence, calculates the matching value between the error correction text and the standard text in the preset dictionary, and obtains multiple Then, when the target matching value is greater than the error correction threshold, the server replaces the error correction text corresponding to the target matching value with the corresponding standard text to obtain the text to be determined; finally, the server presets keywords in the text to be determined Screening, if there are preset keywords in the text to be determined, it is determined that the business voice data contains declaration-type risk voice data.
  • the server first uses the fuzzy matching algorithm to convert the target text to be corrected into pinyin sentences to be corrected, screen out the target phonetic symbols with confusing finals and/or initials in the pinyin sentences to be corrected, and convert the target phonetic symbols into corresponding easy-to-correct phonetic symbols. Confused similar phonetic symbols are generated to convert pinyin sentences.
  • the target phonetic symbols with easily confused phonetic symbols and their corresponding similar phonetic symbols are: consonants are easily confused: n/l; front and rear nasal sounds are easily confused: an/ang; flat tongue is easily confused: c/ ch.
  • the error correction text corresponding to the similar phonetic symbols is extracted from the converted pinyin sentence, and the matching value between the error correction text and the standard text in the preset dictionary is calculated to obtain a plurality of basic matching values.
  • the purpose of calculating the basic matching value is to detect error correction. Whether the text is standard text (words with actual meaning) in the preset dictionary, if the calculated basic matching value is greater than the error correction threshold, the corresponding error correction text is replaced with the corresponding standard text to obtain the text to be determined.
  • the server After obtaining the text to be determined, the server directly performs preset keyword screening in the text to be determined, and determines whether there are preset keywords in the text to be determined.
  • the preset keywords here are words that must be mentioned in sales words. , taking insurance sales as an example, the corresponding preset keywords may be "annual interest rate", "monthly service fee”, “monthly insurance premium”, etc., and the content of the preset keywords is not specifically limited. If there is a preset keyword in the text to be determined, it is directly determined that there is declaration-type risk voice data in the business voice data.
  • the server first inputs the service speech data into the language model, performs word embedding processing on the service speech data, generates a service word vector, and processes the service word vector in descending order according to the sentence length to obtain the word vector to be recognized;
  • the bidirectional long-term and short-term memory network in the intent recognition algorithm performs feature extraction on the word vector to be recognized and generates corresponding eigenvalues; then the server assigns the length of the word vector to be recognized, and performs a weighted sum of the assigned length and the eigenvalue to obtain The feature weight parameter, multiplied by the feature weight parameter and the word vector to be recognized, to obtain the recognized text vector;
  • the server searches the preset intent list for the basic intent corresponding to the recognized text vector, and determines the basic intent as the business identification of the to-be-recognized word vector If there is a preset risk intent in the service identification intent, the server determines that the service voice data contains semantic risk voice data.
  • the server When the server performs intent recognition on the service voice data, it needs to convert the service voice data into corresponding text, and then perform intent recognition on it.
  • the server first loads the pre-trained language model, inputs the business speech data into the language model, and performs word embedding processing on the input business speech data to generate the corresponding business word vector. Descending processing, and packaging the processed business word vectors to obtain the word vectors to be identified, and then the server inputs the word vectors to be identified into a bidirectional long short-term memory (LSTM) network, and treats them through the LSTM network. Identify the word vector and perform feature extraction to generate the corresponding feature value. After that, the server starts to identify the intent.
  • LSTM bidirectional long short-term memory
  • the server assigns the length of the word vector to be identified, and performs a weighted sum of the assigned length and the feature value to obtain the feature weight parameter.
  • the feature weight parameter is multiplied by the word vector to be recognized to obtain the recognized text vector, and then the recognized text vector is spliced with the standard text vector list in the preset dictionary, and the corresponding text vector that is the same as the recognized text vector is directly queried in the standard text vector list.
  • the basic intent that is, determining the basic intent as the business identification intent of the word vector to be recognized, completes the intent identification of the business voice data.
  • the server can determine whether the business voice data is semantic risk voice data. For example, the following is the text data corresponding to the two business voice data, of which the first paragraph is the first telemarketer Broadcast the voice data, the second segment broadcasts the voice data for the second telemarketer:
  • the server finds that the semantics are the same after the intent recognition, and both belong to the intent of the commitment amount.
  • the server finds that the semantics are the same after the intent recognition, and both belong to the intent of the commitment amount.
  • the voice coding data of the declaration to be detected is filtered out after encoding the service voice data, the speech recognition algorithm and the bert network model are used to generate the vector of the declaration sentence to be detected corresponding to the voice coding data of the declaration to be detected, and the fuzzy matching algorithm is used.
  • the quality inspection result of the voice data improves the accuracy and efficiency of the quality inspection of the business voice.
  • the quality inspection method for service voice in the embodiment of the present application is described above.
  • the following describes the quality inspection device for service voice in the embodiment of the present application.
  • a screening module 301 used to obtain service voice data, and use an encoder to encode the service voice data to obtain the coded voice data, calculate the basic similarity between the coded voice data and the preset declaration coded data, according to the basic similarity
  • the numerical value of the coded speech data is used to screen the speech encoding data to be detected to be detected;
  • the conversion module 302 is used to convert the speech encoded data of the announcement to be detected into the declaration text to be detected based on the speech recognition algorithm, and utilizes the bert network model to generate a multiplicity of declaration texts to be detected.
  • a determination module 303 is used to calculate the basic similarity probability value between each to-be-detected declaration sentence vector and the standard declaration sentence vector, and determine the to-be-detected declaration text corresponding to the basic similarity probability greater than the standard threshold as The target text to be corrected; the first determination module 304 is used to correct the target text to be corrected by using a fuzzy matching algorithm to obtain the text to be determined, and screen the preset keywords in the text to be determined. If there is a preset keyword, it is determined that the service voice data has declaration-type risk voice data; the second determination module 305 is used to generate a service recognition intention corresponding to the service voice data through an intention recognition algorithm, and determine whether there is a preset risk in the service recognition intention. If there is a preset risk intention in the service identification intention, it is determined that the service voice data contains semantic risk voice data.
  • another embodiment of the apparatus for quality inspection of service voice in the embodiment of the present application includes: a screening module 301, configured to acquire service voice data, and use an encoder to encode the service voice data to obtain the coded voice data, Calculate the basic similarity between the encoded speech data and the preset statement encoded data, and filter the speech encoded data of the statement to be detected in the encoded speech data according to the value of the basic similarity; the conversion module 302 is used for converting the statement to be detected based on the speech recognition algorithm.
  • the speech coding data is converted into the statement text to be detected, and the bert network model is used to generate a plurality of statement sentence vectors to be detected of the statement text to be detected; the determining module 303 is used to calculate the difference between each statement sentence vector to be detected and the standard statement sentence vector.
  • the basic similarity probability value, the statement text to be detected corresponding to the basic similarity probability greater than the standard threshold is determined as the target text to be corrected; the first determination module 304 is used to correct the target text to be corrected by using a fuzzy matching algorithm, and obtain For the text to be determined, the preset keywords are screened in the text to be determined.
  • the second determination module 305 is used to identify by intention The algorithm generates the business identification intent corresponding to the business voice data, and determines whether there is a preset risk intent in the business identification intent. If there is a preset risk intent in the business identification intent, it is determined that the business voice data contains semantic risk voice data.
  • the screening module 301 is specifically configured to: obtain service voice data, use the Nyquist sampling algorithm to sample the service voice data, and obtain a service voice waveform; perform quantization processing on the service voice waveform to obtain quantized voice data, and quantify the quantized voice data.
  • the voice data is converted into digital pulses to generate coded voice data; the similarity value between the coded voice data and the preset declaration coded data is calculated by the similarity algorithm to obtain the basic similarity; the coded voice corresponding to the basic similarity with the largest value is calculated.
  • the data is determined to be the speech coding data of the declaration to be detected.
  • the conversion module 302 includes: a matching unit 3021, which is used to obtain the to-be-detected service voice data corresponding to the to-be-detected statement voice coding data based on the speech recognition algorithm, extract the corresponding voice features in the to-be-detected service voice data, and according to the voice features
  • the statement text to be detected is obtained by matching;
  • the generating unit 3022 is configured to generate a plurality of statement sentence vectors to be detected of the statement text to be detected by using the bert network model.
  • the matching unit 3021 is specifically configured to: obtain the service voice data to be detected corresponding to the voice coding data of the declaration to be detected based on the voice recognition algorithm, and extract the corresponding voice features of the service voice data to be detected; convert the voice features into phoneme information , wherein the phoneme information is used to indicate the smallest phonetic unit that constitutes a syllable; the text information that is the same as the phoneme information is matched in the preset dictionary to obtain the statement text to be detected.
  • the generating unit 3022 is specifically configured to: obtain multiple statement sequences of the statement text to be detected, add preset marker characters at the initial position of each statement sequence, and obtain multiple first marker sequences; A preset interval character is added between a marker sequence to obtain multiple second marker sequences; the multiple second marker sequences are trained by using the bert network model to generate multiple declarative sentence vectors to be detected.
  • the first determination module 304 is specifically used for: using a fuzzy matching algorithm to convert the target text to be corrected into a phonetic sentence to be corrected, screen out the target phonetic symbol in the phonetic sentence to be corrected, and convert the target phonetic symbol into a similar phonetic symbol.
  • Phonetic symbols are generated to convert the phonetic symbols, wherein the target phonetic symbols include final consonants and/or initials that are easily confused; the error correction text corresponding to the similar phonetic symbols is extracted in the converted phonetic symbols, and the difference between the error correction text and the standard text in the preset dictionary is calculated.
  • the second determination module 305 is specifically configured to: input the service voice data into the language model, perform word embedding processing on the service voice data, generate a service word vector, and perform descending processing on the service word vector according to the sentence length, to obtain: The word vector to be recognized; the feature extraction of the to-be-recognized word vector is carried out through the bidirectional long short-term memory network in the intention recognition algorithm to generate the corresponding eigenvalue; the length of the to-be-recognized word vector is assigned, and the length after the assignment is weighted with the eigenvalue Sum up to get the feature weight parameter, multiply the feature weight parameter with the word vector to be recognized, and get the recognized text vector; query the basic intent corresponding to the recognized text vector in the preset intent list, and determine the basic intent as the word vector to be recognized If there is a preset risk intent in the business identification intent, it is determined that the service voice data contains semantic risk voice data.
  • the service voice quality inspection device 500 may vary greatly due to different configurations or performances, and may include one or more processors (central processing units, CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store application programs 533 or data 532.
  • the memory 520 and the storage medium 530 may be short-term storage or persistent storage.
  • the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the quality inspection apparatus 500 for the service voice.
  • the processor 510 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the service voice quality inspection device 500.
  • the quality inspection device 500 for service voice may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and more.
  • operating systems 531 such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and more.
  • the present application also provides a quality inspection device for service voice
  • the computer device includes a memory and a processor
  • the memory stores computer-readable instructions
  • the processor causes the processor to execute the above embodiments The steps in the quality inspection method of the service voice.
  • the present application also provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer performs the following steps:
  • Acquiring service voice data and using an encoder to encode the service voice data to obtain coded voice data, calculating the basic similarity between the coded voice data and the preset declaration coded data, according to the numerical value of the basic similarity Screen the speech encoding data of the statement to be detected in the encoded speech data; convert the speech encoded data of the statement to be detected into the statement text to be detected based on the speech recognition algorithm, and use the bert network model to generate a plurality of to-be-detected statement texts.
  • Detect the declaration sentence vector calculate the basic similarity probability value between each declarative sentence vector to be detected and the standard declaration sentence vector, and determine the declarative text to be detected corresponding to the basic similarity probability greater than the standard threshold as the target text to be corrected; use fuzzy
  • the matching algorithm performs error correction on the target text to be corrected to obtain the text to be determined, and screens preset keywords in the text to be determined. If the preset keyword exists in the text to be determined, determine the text to be determined.
  • the business voice data contains declaration-type risk voice data; the business identification intent corresponding to the business voice data is generated by an intent recognition algorithm, and it is judged whether there is a preset risk intent in the business identification intent, and if there is a preset risk intent in the business identification intent If the risk intention is preset, it is determined that the service voice data contains semantic risk voice data.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种业务语音的质检方法、装置、设备及存储介质,涉及人工智能领域,用于提高对业务语音进行质检的准确率和质检效率。业务语音的质检方法包括:根据业务语音数据与预置声明编码数据之间的基础相似度筛选待检测声明语音编码数据(101);生成待检测声明文本的多个待检测声明句向量(102);根据待检测声明句向量确定目标待纠错文本(103);对目标待纠错文本进行纠错并筛查预置关键词,若存在预置关键词,则判定业务语音数据存在声明类风险语音数据(104);通过意图识别算法生成业务语音数据对应的业务识别意图,若业务识别意图中存在预置风险意图,则判定业务语音数据存在语义类风险语音数据(105)。该方法还涉及区块链技术,业务语音数据可存储于区块链中。

Description

业务语音的质检方法、装置、设备及存储介质
本申请要求于2020年12月15日提交中国专利局、申请号为202011476012.3、发明名称为“业务语音的质检方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及人工智能中的语音语义领域,尤其涉及一种业务语音的质检方法、装置、设备及存储介质。
背景技术
电话业务语音质检主要是用于检查电话业务人员在沟通过程中是否存在诱导、辱骂客户以及规避免责声明等现象,避免沟通过程中因不规范的语音行为导致客户投诉或法律风险。现有对电话业务语音质检的质检方式主要是对电话语音进行监听,判断监听内容中是否存在违规(风险)语音,若存在违规语音则说明该电话业务语音违规。
但是在利用这种质检方式对电话业务语音进行质检时,发明人意识到,需要进行质检的业务语音繁多冗杂,导致对业务语音进行质检的准确率以及质检效率低下。
发明内容
本申请提供了一种业务语音的质检方法、装置、设备及存储介质,用于提高对业务语音进行质检的准确率以及质检效率。
本申请第一方面提供了一种业务语音的质检方法,包括:获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量;计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
本申请第二方面提供了一种业务语音的质检设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所 述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量;计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
本申请的第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量;计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
本申请第四方面提供了一种业务语音的质检装置,包括:筛选模块,用于获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;转化模块,用于基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量;确定模块,用于计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;第一判定模块,用于利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;第二判定模块,用于通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据 存在语义类风险语音数据。
本申请提供的技术方案中,获取业务语音数据,并利用编码器对业务语音数据进行编码,得到编码语音数据,计算编码语音数据与预置声明编码数据之间的基础相似度,根据基础相似度的数值在编码语音数据中筛选待检测声明语音编码数据;基于语音识别算法将待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成待检测声明文本的多个待检测声明句向量;计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;利用模糊匹配算法对目标待纠错文本进行纠错,得到待判定文本,在待判定文本中筛查预置关键词,若待判定文本中存在预置关键词,则判定业务语音数据存在声明类风险语音数据;通过意图识别算法生成业务语音数据对应的业务识别意图,判断业务识别意图中是否存在预置风险意图,若业务识别意图中存在预置风险意图,则判定业务语音数据存在语义类风险语音数据。本申请实施例中,通过对业务语音数据进行编码后筛选出待检测声明语音编码数据,利用语音识别算法与bert网络模型生成待检测声明语音编码数据对应的待检测声明句向量,利用模糊匹配算法对待检测声明句向量进行文本纠错,并对纠错后的文本进行声明风险判定,然后通过意图识别算法生成业务语音数据对应的业务识别意图,对业务识别意图进行语义风险判定,最终得到对业务语音数据的质检结果,提高了对业务语音进行质检的准确率以及质检效率。
附图说明
图1为本申请实施例中业务语音的质检方法的一个实施例示意图;
图2为本申请实施例中业务语音的质检方法的另一个实施例示意图;
图3为本申请实施例中业务语音的质检装置的一个实施例示意图;
图4为本申请实施例中业务语音的质检装置的另一个实施例示意图;
图5为本申请实施例中业务语音的质检设备的一个实施例示意图。
具体实施方式
本申请实施例提供了一种业务语音的质检方法、装置、设备及存储介质,用于提高对业务语音进行质检的准确率以及质检效率。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例 中业务语音的质检方法的一个实施例包括:
101、获取业务语音数据,并利用编码器对业务语音数据进行编码,得到编码语音数据,计算编码语音数据与预置声明编码数据之间的基础相似度,根据基础相似度的数值在编码语音数据中筛选待检测声明语音编码数据;
可以理解的是,本申请的执行主体可以为业务语音的质检装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。
电话业务语音质检的主要质检内容分为声明类业务语音质检和语义类业务语音质检,声明类业务语音质检对应的质检点可以为:用户在签约或同意某项协议时,需要明确协议对应的条款项、对应的费用和后续操作方案等,声明类业务语音的质检点通常存在经过法律合规部门确认过后的标准的话术和关键词,业务语音数据中需要存在相应的标准话术和关键词,才能证明该业务语音数据不为声明类风险语音数据。
语义类业务语音质检对应的质检点没有标准话术,可以存在多种语言表达方式,例如:引导教唆用户填写虚假信息、误导用户明确错误的流程或谎报错误身份信息等。在业务语音数据中存在了该种意图的语音,即证明业务语音数据为语义类风险语音数据。
在对业务语音数据进行处理时,首先需要利用编码器对业务语音数据进行编码,得到编码语音数据,这个过程即将业务语音数据转化为数字编码的过程,使得计算机可以直接对编码语音数据进行处理,然后服务器极端编码语音数据与预置声明编码数据之间的基础相似度,这里利用到的是相似度算法,预置声明编码数据指的是需要明确协议对应的条款项、对应的费用和后续操作方案等对应的语音数据的编码数据,两者之间的基础相似度的数值越高,则说明两者相似,也就是证明业务语音数据中存在需要明确协议对应的条款项、对应的费用和后续操作方案等。
需要强调的是,为进一步保证上述业务语音数据的私密和安全性,上述业务语音数据还可以存储于一区块链的节点中。
102、基于语音识别算法将待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成待检测声明文本的多个待检测声明句向量;
这里服务器利用语音识别算法将待检测声明语音编码数据转化为待检测声明文本,即将语音数据转化为文本数据,可以对待检测声明文本进行进一步的检测。语音识别算法为本技术领域中的惯用技术手段,故在此并不赘述。这里得到待检测声明文本之后,服务器利用bert网络模型生成待检测声明文本的多个待检测声明句向量,BERT(Bidirectional Encoder Representations from Transformers)是一种预训练语言表示的方法,可以作为Word2Vec的替代者,在进行预训练的过程中,可以将文本文字转化为对应的句向量,在本技术方案中,利用bert网络模型生成待检测声明文本的多个待检测声明句向量。
103、计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;
服务器需要进一步计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本,由于由语音转化为文本的过程中存在一定的误差,转化后的文本中可能存在语法错误或字词错误,例如:将“同意”转写成“朋友”,将“利息”转写成“李西”等。转写的错误会影响模型的准确率,因此需要服务器进一步确定待检测声明文本中哪些文本数据为需要进行下一步操作的文本,得到目标待纠错文本。
104、利用模糊匹配算法对目标待纠错文本进行纠错,得到待判定文本,在待判定文本中筛查预置关键词,若待判定文本中存在预置关键词,则判定业务语音数据存在声明类风险语音数据;
服务器在确定目标待纠错文本之后,即可以利用模糊匹配算法对目标待纠错文本进行纠错,模糊匹配算法的原理是将目标待纠错文本转化为与其对应的拼音,将具有易混淆音标的目标音标转化为相似音标,这样就可以得到多种与目标待纠错文本近音的语句,得到识别出语句的多种可能性,再从这些可能性的语句中选择一个最标准的语句(与预置字典中的标准文本进行匹配),即会得到纠正后的待判定文本。
服务器在得到纠正后的待判定文本之后,直接在待判定文本中进行预置关键词的筛查,若待判定文本中存在预置关键词,则判定业务语音数据存在声明类风险语音数据。
105、通过意图识别算法生成业务语音数据对应的业务识别意图,判断业务识别意图中是否存在预置风险意图,若业务识别意图中存在预置风险意图,则判定业务语音数据存在语义类风险语音数据。
服务器在对业务语音数据进行语义类质检时,需要通过意图识别算法生成业务语音数据对应的业务识别意图,进而判断业务识别意图中是否存在预置风险意图,若业务识别意图中存在预置风险意图,则说明将业务语音数据中存在风险语音数据,则直接判定业务语音数据存在语义类风险语音数据。
本申请实施例中,通过对业务语音数据进行编码后筛选出待检测声明语音编码数据,利用语音识别算法与bert网络模型生成待检测声明语音编码数据对应的待检测声明句向量,利用模糊匹配算法对待检测声明句向量进行文本纠错,并对纠错后的文本进行声明风险判定,然后通过意图识别算法生成业务语音数据对应的业务识别意图,对业务识别意图进行语义风险判定,最终得到对业务语音数据的质检结果,提高了对业务语音进行质检的准确率以及质检效率。
请参阅图2,本申请实施例中业务语音的质检方法的另一个实施例包括:
201、获取业务语音数据,并利用编码器对业务语音数据进行编码,得到编码语音数据,计算编码语音数据与预置声明编码数据之间的基础相似度,根据基础相似度的数值在编码语音数据中筛选待检测声明语音编码数据;
具体的,服务器首先获取业务语音数据,利用奈奎斯特采样算法对业务语音数据进行 采样,得到业务语音波形;其次服务器对业务语音波形进行量化处理,得到量化语音数据,将量化语音数据转化为数字脉冲,生成编码语音数据;然后服务器利用相似度算法计算编码语音数据与预置声明编码数据之间的相似度数值,得到基础相似度;最后服务器将数值最大的基础相似度所对应的编码语音数据确定为待检测声明语音编码数据。
首先对业务语音数据进行采样,采样即为从一个时间上连续变化的模拟信号中取出若干个有代表性的样本值,来代表这个连续变化的模拟信号,按照奈奎斯特采样定理:要从采样值序列中完全恢复成原始波形,采样频率必须大于原始信号最高频率的2倍,只有当采样频率大于两倍的信号最高频率,才能避免混叠现象的发生,进而得到业务语音波形。
在采样之后需要进一步对业务语音波形进行量化处理,量化的过程就是将采样后的信号按整个声波的幅度划分成有限个区段的集合,把落入某个区段的样值归为一类,并赋予相同的量化值,常见有8bit和16bit来划分纵轴。需要说明的是,这里采用的是非均匀量化方式对业务语音波形进行量化,进而得到量化语音数据。采样、量化后得到量化语音数据并不为数字信号,因此需要将量化语音数据转化成数字脉冲,这个转化过程即为编码,从而得到编码语音数据。
采样、量化和编码是对音频的基础处理,将处理过后的编码语音数据与对应预置声明编码数据进行相似度的计算即可明确业务语音中具体的需要进行声明质检的部分在哪里。服务器可以利用相似度算法计算编码语音数据与预置声明编码数据之间的相似度数值,得到基础相似度,预置声明编码数据为声明类质检中的标准话术和关键词对应的编码语音数据,计算两者之间的相似度数值,可以进一步确定业务语音中需要进行声明质检的部分。
需要说明的是,这里预置声明编码数据为预置声明语音对应编码数据,预置声明语音内容为用户需要明确的条款或须知,如:您本次借款金额为XX、借款期限XX期、还款方式为本金加费用按月还款,月还款金额逐月递减;您首月需还款XX元,最后一个月需还款XX元,具体还款金额以每月实际还款通知为准,您可打开还款计划表查看每月还款额。
需要强调的是,为进一步保证上述业务语音数据的私密和安全性,上述业务语音数据还可以存储于一区块链的节点中。
202、基于语音识别算法获取待检测声明语音编码数据对应的待检测业务语音数据,提取对应的待检测业务语音数据中的语音特征,根据语音特征匹配得到待检测声明文本;
具体的,服务器首先基于语音识别算法获取待检测声明语音编码数据对应的待检测业务语音数据,提取对应的待检测业务语音数据中的语音特征;然后服务器将语音特征转化为音素信息,其中,音素信息用于指示构成音节的最小语音单位;最后服务器在预置字典中匹配与音素信息相同的文字信息,得到待检测声明文本。
这里利用的是语音识别算法将待检测声明语音编码转化为待检测声明文本,服务器首先提取待检测业务语音数据中的语音特征,然后将语音特征转化为音素信息,其中,音素信息用于指示构成音节的最小语音单位,音素信息是根据语音的自然属性划分出来的最小 语音单位,其依据音节里的发音动作来分析,一个动作构成一个音素。最后在预置字典中匹配音素信息对应的文字信息,生成待检测声明语音编码数据对应的待检测声明文本,预置字典中包括标准词语或语句以及两者对应的音素,因此直接在预置字典中对音素信息匹配对应的文字信息即可得到对应的待检测声明文本。
203、利用bert网络模型生成待检测声明文本的多个待检测声明句向量;
具体的,服务器首先获取待检测声明文本的多个语句序列,在每个语句序列的初始位置添加预置标记字符,得到多个第一标记序列;然后服务器在相邻两个第一标记序列之间添加预置间隔字符,得到多个第二标记序列;最后服务器利用bert网络模型对多个第二标记序列进行训练,生成多个待检测声明句向量。
在得到待检测声明文本之后服务器利用bert网络模型生成待检测声明文本的多个待检测声明句向量,首先服务器获取待检测声明文本的多个语句序列,在每个语句序列的初始位置添加预置标记字符,预置标记字符为[CLS],该字符主要是用于存储整个输入序列的语义信息,进而得到多个第一标记序列,然后服务器在相邻两个第一标记序列之间添加预置间隔字符,预置间隔字符为[SEP],该字符主要是用于存间隔不同的待检测声明句向量,得到多个第二标记序列,利用bert网络模型对多个第二标记序列进行训练,即可生成对应的多个待检测声明句向量。
204、计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;
服务器需要进一步计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本,由于由语音转化为文本的过程中存在一定的误差,转化后的文本中可能存在语法错误或字词错误,例如:将“同意”转写成“朋友”,将“利息”转写成“李西”等。转写的错误会影响模型的准确率,因此需要服务器进一步确定待检测声明文本中哪些文本数据为需要进行下一步操作的文本,得到目标待纠错文本。
205、利用模糊匹配算法对目标待纠错文本进行纠错,得到待判定文本,在待判定文本中筛查预置关键词,若待判定文本中存在预置关键词,则判定业务语音数据存在声明类风险语音数据;
具体的,服务器首先利用模糊匹配算法将目标待纠错文本转化为待纠错拼音语句,在待纠错拼音语句中筛选出目标音标,并将目标音标转化为相似音标,生成转化拼音语句,其中,目标音标包括具有易混淆的韵母和/或声母;其次服务器在转化拼音语句中提取出相似音标对应的纠错文本,计算纠错文本与预置字典中标准文本之间的匹配值,得到多个基础匹配值;然后服务器当目标匹配值大于纠错阈值时,将目标匹配值对应的纠错文本替换为对应的标准文本,得到待判定文本;最后服务器在待判定文本中进行预置关键词筛查,若待判定文本中存在预置关键词,则判定业务语音数据存在声明类风险语音数据。
服务器首先利用模糊匹配算法将目标待纠错文本转化为待纠错拼音语句,在待纠错拼音语句中筛选出存在易混淆的韵母和/或声母的目标音标,并将目标音标转化为对应易混淆的相似音标,生成转化拼音语句,具有易混淆音标的目标音标以及其对应的相似音标为:辅音易混淆:n/l;前后鼻音易混淆:an/ang;平翘舌易混淆:c/ch。然后在转化拼音语句中提取出相似音标对应纠错文本,计算纠错文本与预置字典中标准文本之间的匹配值,得到多个基础匹配值,计算基础匹配值的目的是为了检测纠错文本是否是预置字典中的标准文本(存在实际意义的词语),若计算出来的基础匹配值大于纠错阈值,则将对应的纠错文本替换为对应的标准文本,得到待判定文本。
服务器在得到待判定文本之后,直接在待判定文本中进行预置关键词筛查,判断待判定文本中是否存在预置关键词,这里的预置关键词为销售话术中必须提到的词汇,以销售保险为例,对应的预置关键词可以为“年利率”、“月服务费”、“月保险费”等,具体并不对预置关键词的内容进行限定。若待判定文本中存在预置关键词,则直接判定业务语音数据中存在声明类风险语音数据。
206、通过意图识别算法生成业务语音数据对应的业务识别意图,判断业务识别意图中是否存在预置风险意图,若业务识别意图中存在预置风险意图,则判定业务语音数据存在语义类风险语音数据。
具体的,服务器首先将业务语音数据输入至语言模型中,对业务语音数据作字嵌入处理,生成业务字向量,并对业务字向量按照语句长度进行降序处理,得到待识别字向量;其次服务器通过意图识别算法中的双向长短期记忆网络对待识别字向量进行特征提取,生成对应的特征值;然后服务器对待识别字向量的长度进行赋值,并将赋值后的长度与特征值进行加权求和,得到特征权重参数,将特征权重参数与待识别字向量相乘,得到识别文本向量;服务器在预置意图列表中查询识别文本向量对应的基础意图,并将基础意图确定为待识别字向量的业务识别意图,判断业务识别意图中是否存在预置风险意图;若业务识别意图中存在预置风险意图,则服务器判定业务语音数据存在语义类风险语音数据。
服务器在进行对业务语音数据进行意图识别时,需要将业务语音数据转化为对应的文本,然后在对其进行意图识别。服务器首先加载预训练的语言模型,并将业务语音数据输入至语言模型中,对输入的业务语音数据进行字嵌入处理,生成对应的业务字向量,其次服务器对业务字向量按照语句长度的长短进行降序处理,并对处理后的业务字向量进行打包处理,得到待识别字向量,然后服务器将待识别字向量输入至双向长短期记忆网络(long short-term memory,LSTM)中,通过LSTM网络对待识别字向量进行特征提取,生成对应的特征值,之后服务器开始进行意图识别,服务器对待识别字向量的长度进行赋值,并将赋值后的长度与特征值进行加权求和,得到特征权重参数,将特征权重参数与待识别字向量相乘,得到识别文本向量,然后将识别文本向量与预置字典中的标准文本向量列表进行拼接,直接在标准文本向量列表中查询与识别文本向量相同所对应的基础意图,即将基础 意图确定为待识别字向量的业务识别意图,完成了对业务语音数据的意图识别。
获取到业务语音数据对应的意图之后,服务器即可判断业务语音数据是否为语义类风险语音数据,举例说明:下方为两个业务语音数据对应的文本数据,其中第一段为第一电话销售人员播报语音数据,第二段为第二电话销售人员播报语音数据:
1、因为你初始额度所有的客户初始额度都是1万,你有保险可以帮你提的能够给到你保险年缴保费的20倍到40倍,那你想如果你是3万的保险那你就算20倍也可以最高的额度帮你提升这个额度的。
2、如果您再需要的时候,那么您在APP上再来申请的时候,您是不是属于老客户了对不对?那么老客户的话,您这次的额度是20万零4千,那么您下一次来借的时候呢,您的额度就会在20万零4千的基础上帮您提升。
上述两段话虽然文字表达完全不相同,但是服务器经过意图识别之后发现语义相同,均属于承诺额度的意图,当业务语音数据中存在必要的预置标准意图,而没有预置风险意图时,则说明业务语音数据为正常,不为语义类风险语音数据也不为声明类风险语音数据。
本申请实施例中,通过对业务语音数据进行编码后筛选出待检测声明语音编码数据,利用语音识别算法与bert网络模型生成待检测声明语音编码数据对应的待检测声明句向量,利用模糊匹配算法对待检测声明句向量进行文本纠错,并对纠错后的文本进行声明风险判定,然后通过意图识别算法生成业务语音数据对应的业务识别意图,对业务识别意图进行语义风险判定,最终得到对业务语音数据的质检结果,提高了对业务语音进行质检的准确率以及质检效率。
上面对本申请实施例中业务语音的质检方法进行了描述,下面对本申请实施例中业务语音的质检装置进行描述,请参阅图3,本申请实施例中业务语音的质检装置一个实施例包括:筛选模块301,用于获取业务语音数据,并利用编码器对业务语音数据进行编码,得到编码语音数据,计算编码语音数据与预置声明编码数据之间的基础相似度,根据基础相似度的数值在编码语音数据中筛选待检测声明语音编码数据;转化模块302,用于基于语音识别算法将待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成待检测声明文本的多个待检测声明句向量;确定模块303,用于计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;第一判定模块304,用于利用模糊匹配算法对目标待纠错文本进行纠错,得到待判定文本,在待判定文本中筛查预置关键词,若待判定文本中存在预置关键词,则判定业务语音数据存在声明类风险语音数据;第二判定模块305,用于通过意图识别算法生成业务语音数据对应的业务识别意图,判断业务识别意图中是否存在预置风险意图,若业务识别意图中存在预置风险意图,则判定业务语音数据存在语义类风险语音数据。
请参阅图4,本申请实施例中业务语音的质检装置的另一个实施例包括:筛选模块301, 用于获取业务语音数据,并利用编码器对业务语音数据进行编码,得到编码语音数据,计算编码语音数据与预置声明编码数据之间的基础相似度,根据基础相似度的数值在编码语音数据中筛选待检测声明语音编码数据;转化模块302,用于基于语音识别算法将待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成待检测声明文本的多个待检测声明句向量;确定模块303,用于计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;第一判定模块304,用于利用模糊匹配算法对目标待纠错文本进行纠错,得到待判定文本,在待判定文本中筛查预置关键词,若待判定文本中存在预置关键词,则判定业务语音数据存在声明类风险语音数据;第二判定模块305,用于通过意图识别算法生成业务语音数据对应的业务识别意图,判断业务识别意图中是否存在预置风险意图,若业务识别意图中存在预置风险意图,则判定业务语音数据存在语义类风险语音数据。
可选的,筛选模块301具体用于:获取业务语音数据,利用奈奎斯特采样算法对业务语音数据进行采样,得到业务语音波形;对业务语音波形进行量化处理,得到量化语音数据,将量化语音数据转化为数字脉冲,生成编码语音数据;利用相似度算法计算编码语音数据与预置声明编码数据之间的相似度数值,得到基础相似度;将数值最大的基础相似度所对应的编码语音数据确定为待检测声明语音编码数据。
可选的,转化模块302包括:匹配单元3021,用于基于语音识别算法获取待检测声明语音编码数据对应的待检测业务语音数据,提取对应的待检测业务语音数据中的语音特征,根据语音特征匹配得到待检测声明文本;生成单元3022,用于利用bert网络模型生成待检测声明文本的多个待检测声明句向量。
可选的,匹配单元3021具体用于:基于语音识别算法获取待检测声明语音编码数据对应的待检测业务语音数据,提取对应的待检测业务语音数据中的语音特征;将语音特征转化为音素信息,其中,音素信息用于指示构成音节的最小语音单位;在预置字典中匹配与音素信息相同的文字信息,得到待检测声明文本。
可选的,生成单元3022具体用于:获取待检测声明文本的多个语句序列,在每个语句序列的初始位置添加预置标记字符,得到多个第一标记序列;在相邻两个第一标记序列之间添加预置间隔字符,得到多个第二标记序列;利用bert网络模型对多个第二标记序列进行训练,生成多个待检测声明句向量。
可选的,第一判定模块304具体用于:利用模糊匹配算法将目标待纠错文本转化为待纠错拼音语句,在待纠错拼音语句中筛选出目标音标,并将目标音标转化为相似音标,生成转化拼音语句,其中,目标音标包括具有易混淆的韵母和/或声母;在转化拼音语句中提取出相似音标对应的纠错文本,计算纠错文本与预置字典中标准文本之间的匹配值,得到多个基础匹配值;当目标匹配值大于纠错阈值时,将目标匹配值对应的纠错文本替换为对应的标准文本,得到待判定文本;在待判定文本中进行预置关键词筛查,若待判定文本中 存在预置关键词,则判定业务语音数据存在声明类风险语音数据。
可选的,第二判定模块305具体用于:将业务语音数据输入至语言模型中,对业务语音数据作字嵌入处理,生成业务字向量,并对业务字向量按照语句长度进行降序处理,得到待识别字向量;通过意图识别算法中的双向长短期记忆网络对待识别字向量进行特征提取,生成对应的特征值;对待识别字向量的长度进行赋值,并将赋值后的长度与特征值进行加权求和,得到特征权重参数,将特征权重参数与待识别字向量相乘,得到识别文本向量;在预置意图列表中查询识别文本向量对应的基础意图,并将基础意图确定为待识别字向量的业务识别意图,判断业务识别意图中是否存在预置风险意图;若业务识别意图中存在预置风险意图,则判定业务语音数据存在语义类风险语音数据。
上面图3和图4从模块化功能实体的角度对本申请实施例中的业务语音的质检装置进行详细描述,下面从硬件处理的角度对本申请实施例中业务语音的质检设备进行详细描述。
图5是本申请实施例提供的一种业务语音的质检设备的结构示意图,该业务语音的质检设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)510(例如,一个或一个以上处理器)和存储器520,一个或一个以上存储应用程序533或数据532的存储介质530(例如一个或一个以上海量存储设备)。其中,存储器520和存储介质530可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对业务语音的质检设备500中的一系列指令操作。更进一步地,处理器510可以设置为与存储介质530通信,在业务语音的质检设备500上执行存储介质530中的一系列指令操作。
业务语音的质检设备500还可以包括一个或一个以上电源540,一个或一个以上有线或无线网络接口550,一个或一个以上输入输出接口560,和/或,一个或一个以上操作系统531,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5示出的业务语音的质检设备结构并不构成对业务语音的质检设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请还提供一种业务语音的质检设备,所述计算机设备包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述各实施例中的所述业务语音的质检方法的步骤。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的 多个待检测声明句向量;计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种业务语音的质检方法,所述业务语音的质检方法包括:
    获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;
    基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量;
    计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;
    利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;
    通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
  2. 根据权利要求1所述的业务语音的质检方法,其中,所述获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据包括:
    获取业务语音数据,利用奈奎斯特采样算法对所述业务语音数据进行采样,得到业务语音波形;
    对所述业务语音波形进行量化处理,得到量化语音数据,将所述量化语音数据转化为数字脉冲,生成编码语音数据;
    利用相似度算法计算所述编码语音数据与预置声明编码数据之间的相似度数值,得到基础相似度;
    将数值最大的基础相似度所对应的编码语音数据确定为待检测声明语音编码数据。
  3. 根据权利要求1所述的业务语音的质检方法,其中,所述基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量包括:
    基于语音识别算法获取所述待检测声明语音编码数据对应的待检测业务语音数据,提取所述对应的待检测业务语音数据中的语音特征,根据所述语音特征匹配得到待检测声明文本;
    利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量。
  4. 根据权利要求3所述的业务语音的质检方法,其中,所述基于语音识别算法获取所述待检测声明语音编码数据对应的待检测业务语音数据,提取所述对应的待检测业务语音数据中的语音特征,根据所述语音特征匹配得到待检测声明文本包括:
    基于语音识别算法获取所述待检测声明语音编码数据对应的待检测业务语音数据,提取所述对应的待检测业务语音数据中的语音特征;
    将所述语音特征转化为音素信息,其中,所述音素信息用于指示构成音节的最小语音单位;
    在预置字典中匹配与所述音素信息相同的文字信息,得到待检测声明文本。
  5. 根据权利要求3所述的业务语音的质检方法,其中,所述利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量包括:
    获取所述待检测声明文本的多个语句序列,在每个语句序列的初始位置添加预置标记字符,得到多个第一标记序列;
    在相邻两个第一标记序列之间添加预置间隔字符,得到多个第二标记序列;
    利用bert网络模型对所述多个第二标记序列进行训练,生成多个待检测声明句向量。
  6. 根据权利要求4所述的业务语音的质检方法,其中,所述利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据包括:
    利用模糊匹配算法将所述目标待纠错文本转化为待纠错拼音语句,在所述待纠错拼音语句中筛选出目标音标,并将所述目标音标转化为相似音标,生成转化拼音语句,其中,所述目标音标包括具有易混淆的韵母和/或声母;
    在所述转化拼音语句中提取出所述相似音标对应的纠错文本,计算所述纠错文本与预置字典中标准文本之间的匹配值,得到多个基础匹配值;
    当目标匹配值大于纠错阈值时,将所述目标匹配值对应的纠错文本替换为对应的标准文本,得到待判定文本;
    在所述待判定文本中进行预置关键词筛查,若所述待判定文本中存在预置关键词,则判定所述业务语音数据存在声明类风险语音数据。
  7. 根据权利要求1-5中任一项所述的业务语音的质检方法,其中,所述通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置 风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据包括:
    将所述业务语音数据输入至语言模型中,对所述业务语音数据作字嵌入处理,生成业务字向量,并对所述业务字向量按照语句长度进行降序处理,得到待识别字向量;
    通过意图识别算法中的双向长短期记忆网络对所述待识别字向量进行特征提取,生成对应的特征值;
    对所述待识别字向量的长度进行赋值,并将赋值后的长度与所述特征值进行加权求和,得到特征权重参数,将所述特征权重参数与所述待识别字向量相乘,得到识别文本向量;
    在预置意图列表中查询所述识别文本向量对应的基础意图,并将所述基础意图确定为所述待识别字向量的业务识别意图,判断所述业务识别意图中是否存在预置风险意图;
    若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
  8. 一种业务语音的质检设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;
    基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量;
    计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;
    利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;
    通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
  9. 根据权利要求8所述的业务语音的质检设备,其中,所述处理器执行所述计算机可读指令实现所述获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据时,包括以下步骤:
    获取业务语音数据,利用奈奎斯特采样算法对所述业务语音数据进行采样,得到业务 语音波形;
    对所述业务语音波形进行量化处理,得到量化语音数据,将所述量化语音数据转化为数字脉冲,生成编码语音数据;
    利用相似度算法计算所述编码语音数据与预置声明编码数据之间的相似度数值,得到基础相似度;
    将数值最大的基础相似度所对应的编码语音数据确定为待检测声明语音编码数据。
  10. 根据权利要求8所述的业务语音的质检设备,其中,所述处理器执行所述计算机可读指令实现所述对所述基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量时,包括以下步骤:
    基于语音识别算法获取所述待检测声明语音编码数据对应的待检测业务语音数据,提取所述对应的待检测业务语音数据中的语音特征,根据所述语音特征匹配得到待检测声明文本;
    利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量。
  11. 根据权利要求10所述的业务语音的质检设备,其中,所述处理器执行所述计算机可读指令实现所述基于语音识别算法获取所述待检测声明语音编码数据对应的待检测业务语音数据,提取所述对应的待检测业务语音数据中的语音特征,根据所述语音特征匹配得到待检测声明文本时,包括以下步骤:
    基于语音识别算法获取所述待检测声明语音编码数据对应的待检测业务语音数据,提取所述对应的待检测业务语音数据中的语音特征;
    将所述语音特征转化为音素信息,其中,所述音素信息用于指示构成音节的最小语音单位;
    在预置字典中匹配与所述音素信息相同的文字信息,得到待检测声明文本。
  12. 根据权利要求10所述的业务语音的质检设备,其中,所述处理器执行所述计算机可读指令实现所述利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量时,包括以下步骤:
    获取所述待检测声明文本的多个语句序列,在每个语句序列的初始位置添加预置标记字符,得到多个第一标记序列;
    在相邻两个第一标记序列之间添加预置间隔字符,得到多个第二标记序列;
    利用bert网络模型对所述多个第二标记序列进行训练,生成多个待检测声明句向量。
  13. 根据权利要求11所述的业务语音的质检设备,其中,所述处理器执行所述计算机可读指令实现所述利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据时,还包括以下步骤:
    利用模糊匹配算法将所述目标待纠错文本转化为待纠错拼音语句,在所述待纠错拼音语句中筛选出目标音标,并将所述目标音标转化为相似音标,生成转化拼音语句,其中,所述目标音标包括具有易混淆的韵母和/或声母;
    在所述转化拼音语句中提取出所述相似音标对应的纠错文本,计算所述纠错文本与预置字典中标准文本之间的匹配值,得到多个基础匹配值;
    当目标匹配值大于纠错阈值时,将所述目标匹配值对应的纠错文本替换为对应的标准文本,得到待判定文本;
    在所述待判定文本中进行预置关键词筛查,若所述待判定文本中存在预置关键词,则判定所述业务语音数据存在声明类风险语音数据。
  14. 根据权利要求8-12中任一项所述的业务语音的质检设备,所述处理器执行所述计算机可读指令实现所述通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据时,包括以下步骤:
    将所述业务语音数据输入至语言模型中,对所述业务语音数据作字嵌入处理,生成业务字向量,并对所述业务字向量按照语句长度进行降序处理,得到待识别字向量;
    通过意图识别算法中的双向长短期记忆网络对所述待识别字向量进行特征提取,生成对应的特征值;
    对所述待识别字向量的长度进行赋值,并将赋值后的长度与所述特征值进行加权求和,得到特征权重参数,将所述特征权重参数与所述待识别字向量相乘,得到识别文本向量;
    在预置意图列表中查询所述识别文本向量对应的基础意图,并将所述基础意图确定为所述待识别字向量的业务识别意图,判断所述业务识别意图中是否存在预置风险意图;
    若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
    获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;
    基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量;
    计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;
    利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;
    通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
  16. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    获取业务语音数据,利用奈奎斯特采样算法对所述业务语音数据进行采样,得到业务语音波形;
    对所述业务语音波形进行量化处理,得到量化语音数据,将所述量化语音数据转化为数字脉冲,生成编码语音数据;
    利用相似度算法计算所述编码语音数据与预置声明编码数据之间的相似度数值,得到基础相似度;
    将数值最大的基础相似度所对应的编码语音数据确定为待检测声明语音编码数据。
  17. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    基于语音识别算法获取所述待检测声明语音编码数据对应的待检测业务语音数据,提取所述对应的待检测业务语音数据中的语音特征,根据所述语音特征匹配得到待检测声明文本;
    利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量。
  18. 根据权利要求17所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    基于语音识别算法获取所述待检测声明语音编码数据对应的待检测业务语音数据,提取所述对应的待检测业务语音数据中的语音特征;
    将所述语音特征转化为音素信息,其中,所述音素信息用于指示构成音节的最小语音单位;
    在预置字典中匹配与所述音素信息相同的文字信息,得到待检测声明文本。
  19. 根据权利要求17所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    获取所述待检测声明文本的多个语句序列,在每个语句序列的初始位置添加预置标记字符,得到多个第一标记序列;
    在相邻两个第一标记序列之间添加预置间隔字符,得到多个第二标记序列;
    利用bert网络模型对所述多个第二标记序列进行训练,生成多个待检测声明句向量。
  20. 一种业务语音的质检装置,所述业务语音的质检装置包括:
    筛选模块,用于获取业务语音数据,并利用编码器对所述业务语音数据进行编码,得到编码语音数据,计算所述编码语音数据与预置声明编码数据之间的基础相似度,根据所述基础相似度的数值在所述编码语音数据中筛选待检测声明语音编码数据;
    转化模块,用于基于语音识别算法将所述待检测声明语音编码数据转化为待检测声明文本,利用bert网络模型生成所述待检测声明文本的多个待检测声明句向量;
    确定模块,用于计算每个待检测声明句向量与标准声明句向量之间的基础相似概率值,将基础相似概率大于标准阈值所对应的待检测声明文本确定为目标待纠错文本;
    第一判定模块,用于利用模糊匹配算法对所述目标待纠错文本进行纠错,得到待判定文本,在所述待判定文本中筛查预置关键词,若所述待判定文本中存在所述预置关键词,则判定所述业务语音数据存在声明类风险语音数据;
    第二判定模块,用于通过意图识别算法生成所述业务语音数据对应的业务识别意图,判断所述业务识别意图中是否存在预置风险意图,若所述业务识别意图中存在预置风险意图,则判定所述业务语音数据存在语义类风险语音数据。
PCT/CN2021/090410 2020-12-15 2021-04-28 业务语音的质检方法、装置、设备及存储介质 WO2022126969A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011476012.3A CN112634903B (zh) 2020-12-15 2020-12-15 业务语音的质检方法、装置、设备及存储介质
CN202011476012.3 2020-12-15

Publications (1)

Publication Number Publication Date
WO2022126969A1 true WO2022126969A1 (zh) 2022-06-23

Family

ID=75313574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090410 WO2022126969A1 (zh) 2020-12-15 2021-04-28 业务语音的质检方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112634903B (zh)
WO (1) WO2022126969A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634903B (zh) * 2020-12-15 2023-09-29 平安科技(深圳)有限公司 业务语音的质检方法、装置、设备及存储介质
CN114049890A (zh) * 2021-11-03 2022-02-15 杭州逗酷软件科技有限公司 语音控制方法、装置以及电子设备

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108737667A (zh) * 2018-05-03 2018-11-02 平安科技(深圳)有限公司 语音质检方法、装置、计算机设备及存储介质
CN109389971A (zh) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 基于语音识别的保险录音质检方法、装置、设备和介质
CN110176252A (zh) * 2019-05-08 2019-08-27 江西尚通科技发展股份有限公司 基于风险管控模式的智能语音质检方法及系统
CN110378562A (zh) * 2019-06-17 2019-10-25 中国平安人寿保险股份有限公司 语音质检方法、装置、计算机设备及存储介质
CN111445928A (zh) * 2020-03-31 2020-07-24 深圳前海微众银行股份有限公司 语音质检方法、装置、设备及存储介质
CN111696528A (zh) * 2020-06-20 2020-09-22 龙马智芯(珠海横琴)科技有限公司 一种语音质检方法、装置、质检设备及可读存储介质
CN111883115A (zh) * 2020-06-17 2020-11-03 马上消费金融股份有限公司 语音流程质检的方法及装置
CN112036705A (zh) * 2020-08-05 2020-12-04 苏宁金融科技(南京)有限公司 一种质检结果数据获取方法、装置及设备
CN112634903A (zh) * 2020-12-15 2021-04-09 平安科技(深圳)有限公司 业务语音的质检方法、装置、设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016095399A (ja) * 2014-11-14 2016-05-26 日本電信電話株式会社 音声認識結果整形装置、方法及びプログラム
US10176798B2 (en) * 2015-08-28 2019-01-08 Intel Corporation Facilitating dynamic and intelligent conversion of text into real user speech
US9865249B2 (en) * 2016-03-22 2018-01-09 GM Global Technology Operations LLC Realtime assessment of TTS quality using single ended audio quality measurement
CN109658923B (zh) * 2018-10-19 2024-01-30 平安科技(深圳)有限公司 基于人工智能的语音质检方法、设备、存储介质及装置
CN110597964B (zh) * 2019-09-27 2023-04-07 神州数码融信软件有限公司 一种双录质检语义分析方法、装置及双录质检系统
CN111405128B (zh) * 2020-03-24 2022-02-18 中国—东盟信息港股份有限公司 一种基于语音转文字的通话质检系统
CN111696557A (zh) * 2020-06-23 2020-09-22 深圳壹账通智能科技有限公司 语音识别结果的校准方法、装置、设备及存储介质
CN112069796B (zh) * 2020-09-03 2023-08-04 阳光保险集团股份有限公司 一种语音质检方法、装置,电子设备及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108737667A (zh) * 2018-05-03 2018-11-02 平安科技(深圳)有限公司 语音质检方法、装置、计算机设备及存储介质
CN109389971A (zh) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 基于语音识别的保险录音质检方法、装置、设备和介质
CN110176252A (zh) * 2019-05-08 2019-08-27 江西尚通科技发展股份有限公司 基于风险管控模式的智能语音质检方法及系统
CN110378562A (zh) * 2019-06-17 2019-10-25 中国平安人寿保险股份有限公司 语音质检方法、装置、计算机设备及存储介质
CN111445928A (zh) * 2020-03-31 2020-07-24 深圳前海微众银行股份有限公司 语音质检方法、装置、设备及存储介质
CN111883115A (zh) * 2020-06-17 2020-11-03 马上消费金融股份有限公司 语音流程质检的方法及装置
CN111696528A (zh) * 2020-06-20 2020-09-22 龙马智芯(珠海横琴)科技有限公司 一种语音质检方法、装置、质检设备及可读存储介质
CN112036705A (zh) * 2020-08-05 2020-12-04 苏宁金融科技(南京)有限公司 一种质检结果数据获取方法、装置及设备
CN112634903A (zh) * 2020-12-15 2021-04-09 平安科技(深圳)有限公司 业务语音的质检方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN112634903B (zh) 2023-09-29
CN112634903A (zh) 2021-04-09

Similar Documents

Publication Publication Date Title
CN110263322B (zh) 用于语音识别的音频语料筛选方法、装置及计算机设备
CN108737667B (zh) 语音质检方法、装置、计算机设备及存储介质
WO2020200178A1 (zh) 语音合成方法、装置和计算机可读存储介质
JP5014785B2 (ja) 表音ベース音声認識システム及び方法
CN109979432B (zh) 一种方言翻译方法及装置
KR102625184B1 (ko) 고유 음성 사운드를 생성하기 위한 음성 합성 트레이닝
WO2022126969A1 (zh) 业务语音的质检方法、装置、设备及存储介质
JP2020027193A (ja) 音声変換学習装置、音声変換装置、方法、及びプログラム
CN114038447A (zh) 语音合成模型的训练方法、语音合成方法、装置及介质
CA3162378A1 (en) A text-to-speech synthesis method and system, a method of training a text-to-speech synthesis system, and a method of calculating an expressivity score
CN110503956B (zh) 语音识别方法、装置、介质及电子设备
JP2016062069A (ja) 音声認識方法、及び音声認識装置
JP2017032738A (ja) 発話意図モデル学習装置、発話意図抽出装置、発話意図モデル学習方法、発話意図抽出方法、プログラム
CN114360557B (zh) 语音音色转换方法、模型训练方法、装置、设备和介质
CN113450757A (zh) 语音合成方法、装置、电子设备及计算机可读存储介质
Kurian et al. Continuous speech recognition system for Malayalam language using PLP cepstral coefficient
Sahu et al. A study on automatic speech recognition toolkits
JP2021039219A (ja) 音声信号処理装置、音声信号処理方法、音声信号処理プログラム、学習装置、学習方法及び学習プログラム
CN112885335A (zh) 语音识别方法及相关装置
CN112669810A (zh) 语音合成的效果评估方法、装置、计算机设备及存储介质
Tarján et al. A bilingual study on the prediction of morph-based improvement.
Chen et al. Mismatched crowdsourcing based language perception for under-resourced languages
Kurian et al. Connected digit speech recognition system for Malayalam language
Rahim et al. Robust numeric recognition in spoken language dialogue
Sabu et al. Improving the Noise Robustness of Prominence Detection for Children's Oral Reading Assessment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21904908

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21904908

Country of ref document: EP

Kind code of ref document: A1