WO2021004128A1 - 语音质检的方法、装置、计算机设备和存储介质 - Google Patents

语音质检的方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021004128A1
WO2021004128A1 PCT/CN2020/086625 CN2020086625W WO2021004128A1 WO 2021004128 A1 WO2021004128 A1 WO 2021004128A1 CN 2020086625 W CN2020086625 W CN 2020086625W WO 2021004128 A1 WO2021004128 A1 WO 2021004128A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
audio data
detected
keyword set
customer
Prior art date
Application number
PCT/CN2020/086625
Other languages
English (en)
French (fr)
Inventor
熊玮
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021004128A1 publication Critical patent/WO2021004128A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02087Noise filtering the noise being separate speech, e.g. cocktail party

Definitions

  • This application relates to the technical field of speech processing in artificial intelligence, and in particular to a method, device, computer equipment and storage medium for speech quality inspection.
  • monitoring the business service process includes: simultaneous recording and recording of the service process. After the business service is over, the business service video is obtained, and the dialogue content in the business service video is manually listened to and quality checked repeatedly in the background. When a problem is found in a certain dialogue through the quality inspection, the salesperson and the customer are notified to make supplementary records.
  • a method for voice quality inspection includes:
  • the audio to be detected is divided into multiple audio segments according to the preset voice segmentation algorithm, and the audio segments belonging to the same speaker from the multiple audio segments are merged according to the preset voice clustering algorithm to obtain the salesperson audio data and the customer Audio data
  • the second keyword set includes a set of words and keywords and a set of illegal keywords
  • a device for voice quality inspection includes:
  • the acquiring module is used to acquire in real time the to-be-detected video of each recording node in the video recording process and the number threshold corresponding to the to-be-detected video, and extract the to-be-detected audio of each recording node from the to-be-detected video;
  • the extraction module is used to divide the to-be-detected audio into multiple audio segments according to the preset voice segmentation algorithm, and merge the audio segments belonging to the same speaker among the multiple audio segments according to the preset voice clustering algorithm to obtain the service Staff audio data and customer audio data;
  • the detection module is used to detect the customer audio data according to the preset first keyword set, and to detect the salesperson audio data according to the preset second keyword set.
  • the second keyword set includes the speech keyword set and violations Keyword collection;
  • the processing module is used for when the number of occurrences of the mandatory keywords in the first keyword set in the customer audio data is not equal to the threshold of the number of times, or there are no words in the speech keyword set in the salesperson audio data, or When there are illegal keywords in the illegal keyword set in the audio data of the salesperson, it is determined that the detection result of the audio to be detected is a failure detection, and a supplementary recording prompt is generated.
  • the detection module is also used to obtain multiple required-read keywords from the preset first keyword set, convert customer audio data into customer text data, and traverse the customer text data according to each required-read keyword , Count the number of occurrences of each mandatory keyword in the customer text data, and determine the number of occurrences of each mandatory keyword in the customer audio data according to the number of occurrences of each mandatory keyword in the customer text data.
  • a computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
  • the audio to be detected is divided into multiple audio segments according to the preset voice segmentation algorithm, and the audio segments belonging to the same speaker from the multiple audio segments are merged according to the preset voice clustering algorithm to obtain the salesperson audio data and the customer Audio data
  • the second keyword set includes a set of words and keywords and a set of illegal keywords
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:
  • the audio to be detected is divided into multiple audio segments according to the preset voice segmentation algorithm, and the audio segments belonging to the same speaker from the multiple audio segments are merged according to the preset voice clustering algorithm to obtain the salesperson audio data and the customer Audio data
  • the second keyword set includes a set of words and keywords and a set of illegal keywords
  • the detection result of the audio to be detected is not passed the detection, and a supplementary recording prompt is generated.
  • the above voice quality inspection method, device, computer equipment and storage medium detect the customer audio data according to the preset first keyword set, and detect the salesperson audio data according to the preset second keyword set, so as to realize the Customer audio data and salesperson audio data are detected separately, and the detection result of the audio to be detected is determined according to the detection result.
  • a supplementary recording prompt is generated.
  • FIG. 1 is a schematic flowchart of a method for voice quality inspection in an embodiment
  • FIG. 2 is a schematic diagram of a sub-flow of step S106 in FIG. 1 in an embodiment
  • FIG. 3 is a schematic diagram of a sub-flow of step S102 in FIG. 2 in an embodiment
  • FIG. 4 is a schematic diagram of a sub-process of step S106 in FIG. 1 in an embodiment
  • FIG. 5 is a schematic flowchart of a voice quality inspection method in another embodiment
  • Fig. 6 is a schematic diagram of a sub-flow of step S104 in Fig. 1 in an embodiment
  • Figure 7 is a structural block diagram of a voice quality inspection device in an embodiment
  • Figure 8 is an internal structure diagram of a computer device in an embodiment.
  • a method for voice quality inspection which includes the following steps:
  • S102 Acquire in real time the to-be-detected video of each recording node in the video recording process and the number threshold corresponding to the to-be-detected video, and extract the to-be-detected audio of each recording node from the to-be-detected video.
  • the video to be detected refers to the video data collected by the terminal and sent to each recording node of the server during the video recording process.
  • the video recording process includes multiple recording links, and each recording link has a corresponding recording node.
  • the server After obtaining the video to be detected, the server will strip the audio and image in the video to be detected, and extract the audio to be detected of each recording node.
  • the threshold of the number of times corresponding to the video to be detected refers to the threshold of the number of times the mandatory keyword corresponding to the video to be detected must appear.
  • the required-read keywords refer to the words that the customer must mention in the recording link corresponding to the recording node, which is used to detect the customer's audio data.
  • S104 Segment the audio to be detected into multiple audio segments according to the preset voice segmentation algorithm, and merge the audio segments belonging to the same speaker among the multiple audio segments according to the preset voice clustering algorithm to obtain salesperson audio data And customer audio data.
  • the audio data to be detected includes salesperson audio data and customer audio data.
  • the server needs to separate the salesperson audio data and customer audio data.
  • the audio segmentation algorithm and the audio clustering algorithm can be used to process the audio to be detected. The method of segmentation and then clustering is adopted.
  • the audio segmentation algorithm is used to divide the audio to be detected into multiple audio segments. Using a voice clustering algorithm, the audio clips belonging to the same speaker from multiple audio clips are merged to obtain salesperson audio data and customer audio data.
  • S106 Detect the customer audio data according to the preset first keyword set, and detect the salesperson audio data according to the preset second keyword set.
  • the second keyword set includes a speech keyword set and a set of illegal keywords .
  • the preset first keyword set includes multiple mandatory keywords.
  • the mandatory keywords refer to the words that the customer must mention in the recording link corresponding to the recording node.
  • the offending keywords refer to the Words that cannot be mentioned by the salesperson in the recording link corresponding to the recording node.
  • Words keywords refer to words that the salesperson must mention in the recording link corresponding to the recording node.
  • the server detects the customer audio data according to the preset first keyword set, counts the number of times the must-read keywords appear in the customer audio data, and determines the customer audio data by comparing the statistical result of the number of times and the number threshold corresponding to the video to be detected The test results.
  • the server can determine whether the salesperson mentions the verbal keywords and whether the illegal keywords are not mentioned by detecting the salesperson's audio data, and then determines the detection result of the salesperson's audio data according to the mention in the salesperson's audio data .
  • the detection result of the customer audio data is a failure detection.
  • the detection result of the customer's audio data or the salesperson's audio data is a failure, the detection result of the audio to be detected is a failure, the server will generate a supplementary recording prompt, and the supplementary recording prompt will prompt the customer and the salesperson to fail the detection.
  • the voice quality inspection method described above detects the customer audio data according to the preset first keyword set, and detects the salesperson audio data according to the preset second keyword set, and realizes the detection of customer audio data and salesperson audio data
  • the detection is performed separately, and the detection result of the to-be-detected audio is determined according to the detection result.
  • a supplementary recording prompt is generated. In this way, during the video recording process, real-time quality inspection of the to-be-detected audio of each recording node is realized, realizing timely monitoring of the conversations in each link in the video recording process, and improving the monitoring of the business service process. effectiveness.
  • S106 includes:
  • S206 traverse the customer text data according to each required-reading keyword, and count the number of times each required-reading keyword appears in the customer's text data;
  • S208 Obtain the number of occurrences of each mandatory keyword in the customer audio data according to the number of occurrences of each mandatory keyword in the customer text data.
  • the required-reading keywords refer to the words that the customer must mention in the recording link corresponding to the recording node.
  • the server can obtain multiple required-reading keywords from the preset first keyword set.
  • the keyword set includes multiple required-reading keywords.
  • the salesperson will ask the customer questions, and the customer will reply to the salesperson's questions by mentioning the required keywords, so customer audio data can be detected based on the required keywords .
  • the number of times the customer mentions the mandatory keywords will also be different, so determine the number of times the customer should mention the mandatory keywords in the recording link corresponding to the recording node , That is, the threshold of the number of times corresponding to the video to be detected, and then compare the threshold of the number of times with the number of times that each required keyword appears in the customer audio data to determine the detection result of the customer audio data, only when each required keyword is in the customer audio data Only when the number of occurrences in is equal to the number threshold, the detection result of the customer audio data can be considered as passing detection.
  • the threshold of times can be determined according to the dialogue template of the recording node.
  • the number of occurrences of each mandatory keyword in the client text data is obtained, so that the server can appear in the client audio data according to each mandatory keyword The number of times to determine the detection result of customer audio data, and realize the detection of customer audio data.
  • S102 includes:
  • S302 Acquire a dialogue template of the recording node corresponding to the video to be detected in real time
  • the server can obtain the dialogue template corresponding to the recording node in real time from the preset dialogue template database through the node identification carried by the recording node, and obtain multiple mandatory keywords according to the first keyword set, according to each mandatory keyword Traverse the dialog template, count the number of occurrences of each mandatory keyword in the dialog template, the number of occurrences of each mandatory keyword in the dialog template, is the number of times the customer should mention the required keyword in the recording link corresponding to the recording node. That is, the threshold of times.
  • the dialog template of the recording node corresponding to the video to be detected is obtained in real time, and the number of occurrences of each required keyword in the dialog template is counted according to the first keyword set, and the number of occurrences of each required keyword in the dialog template is counted , Get the frequency threshold, so that the server can detect the client audio data according to the frequency threshold.
  • S106 includes:
  • S404 Obtain the speech technique template of the recording node corresponding to the video to be detected, and extract corresponding speech technique information from the salesperson's text data according to the speech technique template;
  • S408 Obtain the illegal keywords from the second keyword set, and traverse the salesperson text data according to the illegal keywords.
  • the server When the server detects the audio data of the salesperson, it needs to convert the audio data of the salesperson into text data of the salesperson, obtain the speech template of the recording node corresponding to the video to be detected, and extract the corresponding from the salesperson text data according to the speech template
  • the speech technique information is obtained from the second keyword set.
  • the speech technique keywords refer to the words that the salesperson must mention in the recording link corresponding to the recording node.
  • the server In addition to detecting the audio data of the salesperson based on the speech keywords, the server also needs to detect the audio data of the salesperson through the illegal keywords.
  • the illegal keywords can be obtained from the second keyword set.
  • the illegal keywords refer to In the recording link corresponding to the recording node, for the words that the salesperson cannot mention, the salesperson’s audio data is checked to determine whether the salesperson does not mention the illegal keywords. When the salesperson does not mention the illegal keywords, the business is determined
  • the second detection result of the operator audio data is a pass detection. Only when the first detection result and the second detection result are both passed detection, can it be determined that the detection result of the salesperson's audio data is a pass detection.
  • the audio data of the salesperson is detected based on the speech keywords and the illegal keywords, so as to realize the detection of the audio data of the salesperson.
  • the method further includes:
  • the server may determine that the detection result of the client audio data is a pass detection.
  • the salesperson’s audio data contains speech keywords in the speech keyword set, and the salesperson’s audio data does not contain any illegal keywords in the violation keyword set, the server can determine that the detection result of the salesperson’s audio data is a pass.
  • the server can determine that the detection result of the audio to be detected is pass detection.
  • the detection result of the audio to be detected is determined based on the detection result of the customer audio data and the audio data of the salesperson, thereby realizing the determination of the detection result of the audio to be detected.
  • S104 includes:
  • S602 Perform filtering processing on the audio to be detected to filter out noise and environmental sounds in the audio to be detected;
  • S604 Split the filtered audio to be detected into multiple audio segments according to a preset voice segmentation algorithm
  • S606 Combine audio segments belonging to the same speaker among multiple audio segments according to a preset voice clustering algorithm to obtain salesperson audio data and customer audio data.
  • the server when processing the audio to be detected, the server first needs to filter the audio to be detected, filter out the noise and environmental sounds in the audio to be detected, and then use the voice segmentation algorithm and voice
  • the clustering algorithm processes the filtered audio to be detected to obtain the salesperson audio data and the customer audio data.
  • the voice segmentation algorithm refers to speaker change point detection, that is, to locate the point where the speaker identity changes in the voice data.
  • Common speech segmentation algorithms are usually based on the window shift segmentation point detection algorithm of the Gaussian model, observe and calculate the distance between adjacent speech windows, and determine whether the two speeches are from the same speaker based on a threshold or penalty factor.
  • the threshold or penalty factor can be obtained by collecting training set data.
  • the voice segmentation algorithm can split the audio to be detected into multiple audio segments, each of which contains only one person's audio data.
  • the voice clustering algorithm is based on the voice segmentation algorithm, which combines the audio clips belonging to the same speaker.
  • Common voice clustering algorithms can be divided into two categories: top-down clustering and bottom-up clustering. Treat each audio segment obtained after segmentation as one category, and then merge the two most adjacent categories continuously according to the BIC (Bayesian Information Criterions) distance, until the merging of the voice segments no longer causes the value of BIC to increase So far, two types of audio data have been obtained.
  • the server will further analyze the two types of audio data and extract the voiceprint features of the two types of audio data.
  • the voiceprint features of the two types of audio data are matched with the business in the preset salesperson information database.
  • the voiceprint characteristics of the clerk determine the clerk audio data in the two types of audio data, and the other is the customer audio data.
  • the audio to be detected is filtered to filter out noise and environmental sounds in the audio to be detected.
  • the audio segmentation algorithm is adopted to divide the filtered audio to be detected into multiple audio segments.
  • the audio segments are clustered into salesperson audio data and customer audio data, which realizes the extraction of salesperson audio data and customer audio data.
  • a voice quality inspection device which includes: an acquisition module 702, an extraction module 704, a detection module 706, and a processing module 708, wherein:
  • the obtaining module 702 is configured to obtain the to-be-detected video of each recording node in the video recording process and the number threshold corresponding to the to-be-detected video in real time, and extract the to-be-detected audio of each recording node from the to-be-detected video;
  • the extraction module 704 is used to divide the to-be-detected audio into multiple audio segments according to a preset voice segmentation algorithm, and merge the audio segments belonging to the same speaker among the multiple audio segments according to the preset voice clustering algorithm to obtain Salesperson audio data and customer audio data;
  • the detection module 706 is configured to detect customer audio data according to a preset first keyword set, and detect salesperson audio data according to a preset second keyword set.
  • the second keyword set includes a set of verbal keywords and A collection of illegal keywords;
  • the processing module 708 is configured to: when the number of occurrences of the mandatory keywords in the first keyword set in the customer audio data is not equal to the number threshold, or there is no verbal keyword in the verbal keyword set in the salesperson audio data, Or when there are illegal keywords in the illegal keyword set in the audio data of the salesperson, it is determined that the detection result of the audio to be detected is not passed the detection, and a supplementary recording prompt is generated.
  • the above voice quality inspection device detects the customer audio data according to the preset first keyword set, and detects the salesperson audio data according to the preset second keyword set, thereby realizing the detection of customer audio data and salesperson audio data The detection is performed separately, and the detection result of the to-be-detected audio is determined according to the detection result.
  • a supplementary recording prompt is generated.
  • real-time quality inspection of the to-be-detected audio of each recording node is realized, realizing timely monitoring of the conversations in each link in the video recording process, and improving the monitoring of the business service process. effectiveness.
  • the detection module is also used to obtain multiple required-read keywords from the preset first keyword set, convert customer audio data into customer text data, and traverse the customer text data according to each required-read keyword , Count the number of occurrences of each mandatory keyword in the customer text data, and get the number of occurrences of each mandatory keyword in the customer audio data according to the number of occurrences of each mandatory keyword in the customer text data.
  • the acquisition module is also used to acquire the dialogue template of the recording node corresponding to the video to be detected in real time. According to the first keyword set, count the number of occurrences of each mandatory keyword in the dialogue template. The number of occurrences of each mandatory keyword is obtained, and the threshold of the number of times is obtained.
  • the detection module is also used to convert the salesperson audio data into salesperson text data, obtain the speech template of the recording node corresponding to the video to be detected, and extract the corresponding speech data from the salesperson text data according to the speech template
  • the verbal technique information is obtained from the second keyword set, the verbal technique information is matched according to the verbal keyword, the illegal keyword is obtained from the second keyword set, and the salesperson text data is traversed according to the illegal keyword.
  • the detection module is also used for when the number of occurrences of the mandatory keywords in the first keyword set in the customer audio data reaches the threshold, and there are words in the speech keyword set in the salesperson audio data If there is no illegal keyword in the illegal keyword set in the salesperson’s audio data, it is determined that the detection result of the audio to be detected is a pass detection.
  • the extraction module is also used to filter the audio to be detected, filter out noise and environmental sounds in the audio to be detected, and divide the filtered audio to be detected into multiple audio according to a preset speech segmentation algorithm Segments, according to a preset voice clustering algorithm, merge audio segments belonging to the same speaker among multiple audio segments to obtain salesperson audio data and customer audio data.
  • Each module in the above-mentioned voice quality inspection device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store mandatory-read keyword data, illegal keyword data, and dialogue template data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a voice quality inspection method.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
  • the audio to be detected is divided into multiple audio segments according to the preset voice segmentation algorithm, and the audio segments belonging to the same speaker from the multiple audio segments are merged according to the preset voice clustering algorithm to obtain the salesperson audio data and the customer Audio data
  • the second keyword set includes a set of words and keywords and a set of illegal keywords
  • the above-mentioned computer equipment for voice quality inspection detects the customer audio data according to the preset first keyword set, and detects the salesperson audio data according to the preset second keyword set, thereby realizing the detection of customer audio data and salesperson audio The data is detected separately, and the detection result of the to-be-detected audio is determined according to the detection result.
  • a supplementary recording prompt is generated.
  • real-time quality inspection of the to-be-detected audio of each recording node is realized, realizing timely monitoring of the conversations in each link in the video recording process, and improving the monitoring of the business service process. effectiveness.
  • the processor further implements the following steps when executing the computer program:
  • each mandatory keyword traverse the customer text data and count the number of times each mandatory keyword appears in the customer text data;
  • the number of occurrences of each mandatory keyword in the customer text data is obtained.
  • the processor further implements the following steps when executing the computer program:
  • the threshold of the number of times is obtained.
  • the processor further implements the following steps when executing the computer program:
  • the processor further implements the following steps when executing the computer program:
  • the detection result of the audio to be detected is determined to pass the detection.
  • the processor further implements the following steps when executing the computer program:
  • the audio clips belonging to the same speaker from the multiple audio clips are merged to obtain salesperson audio data and customer audio data.
  • a computer-readable storage medium is provided.
  • the storage medium is a volatile storage medium or a non-volatile storage medium, and a computer program is stored thereon.
  • the computer program is executed by a processor, the following steps are implemented :
  • the audio to be detected is divided into multiple audio segments according to the preset voice segmentation algorithm, and the audio segments belonging to the same speaker from the multiple audio segments are merged according to the preset voice clustering algorithm to obtain the salesperson audio data and the customer Audio data
  • the second keyword set includes a set of words and keywords and a set of illegal keywords
  • the storage medium for the above-mentioned voice quality inspection detects the customer audio data according to the preset first keyword set, and detects the salesperson audio data according to the preset second keyword set, thereby realizing the detection of customer audio data and salesperson audio The data is detected separately, and the detection result of the to-be-detected audio is determined according to the detection result.
  • a supplementary recording prompt is generated.
  • real-time quality inspection of the to-be-detected audio of each recording node is realized, realizing timely monitoring of the conversations in each link in the video recording process, and improving the monitoring of the business service process. effectiveness.
  • the computer program further implements the following steps when being executed by the processor:
  • each mandatory keyword traverse the customer text data and count the number of times each mandatory keyword appears in the customer text data;
  • the number of occurrences of each mandatory keyword in the customer text data is obtained.
  • the computer program further implements the following steps when being executed by the processor:
  • the threshold of the number of times is obtained.
  • the computer program further implements the following steps when being executed by the processor:
  • the computer program further implements the following steps when being executed by the processor:
  • the detection result of the audio to be detected is determined to pass the detection.
  • the computer program further implements the following steps when being executed by the processor:
  • the audio clips belonging to the same speaker from the multiple audio clips are merged to obtain salesperson audio data and customer audio data.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

提供了一种语音质检的方法、装置、计算机设备和存储介质,涉及人工智能中的语音处理技术领域。该语音质检方法包括:根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合(S106),当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示(S108)。采用本方法能够实时对各录制节点的待检测音频进行质检,从而提高了对业务服务过程进行监控的效率。

Description

语音质检的方法、装置、计算机设备和存储介质
本申请要求于2019年7月9日提交中国专利局、申请号为201910616721.8,发明名称为“语音质检的方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能中的语音处理技术领域,特别是涉及一种语音质检的方法、装置、计算机设备和存储介质。
背景技术
随着服务行业的发展,越来越多的企业在对客户进行业务服务时均需要对业务服务过程进行监控,传统地,对业务服务过程进行监控包括:对服务过程同步进行录音和录像,在业务服务结束后,得到业务服务视频,人工在后台对业务服务视频中的对话内容进行反复的收听和质检,当通过质检发现某段对话存在问题时,通知业务员以及客户进行补录。
然而发明人意识到,传统地对业务服务过程进行监控的方式,直至在最后的质检过程中才能查找到各环节中的对话问题并进行补录,存在监控效率低的问题。
发明内容
基于此,有必要针对上述技术问题,提供一种能够提高监控效率的语音质检的方法、装置、计算机设备和存储介质。
一种语音质检的方法,所述方法包括:
实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。
一种语音质检的装置,所述装置包括:
获取模块,用于实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
提取模块,用于根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
检测模块,用于根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
处理模块,用于当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。在其中一个实施例中,检测模块还用于从预设第一关键字集合中获取多个必读关键字,将客户音频数据转换为客户文字数据,根据各必读关键字,遍历客户文字数据,统计各必读关键字在客户文字数据中出现的次数,根据各必读关键字在客户文字数据中出现的次数,确定各必读关键字在客户音频数据中出现的次数。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:
实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设 的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。上述语音质检的方法、装置、计算机设备和存储介质,根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,实现了对客户音频数据以及业务员音频数据分别进行检测,根据检测结果确定待检测音频的检测结果,当待检测音频的检测结果为未通过检测时,生成补录提示。
通过这种方式,在视频录制过程中,实时对各录制节点的待检测音频进行质检,实现了及时对视频录制过程中的各个环节中的对话进行监控,提高了对业务服务过程进行监控的效率。
附图说明
图1为一个实施例中语音质检的方法的流程示意图;
图2为一个实施例中图1中步骤S106的子流程示意图;
图3为一个实施例中图2中步骤S102的子流程示意图;
图4为一个实施例中图1中步骤S106的子流程示意图;
图5为另一个实施例中语音质检的方法的流程示意图;
图6为一个实施例中图1中步骤S104的子流程示意图;
图7为一个实施例中语音质检的装置的结构框图;
图8为一个实施例中计算机设备的内部结构图。
具体实施方式
在一个实施例中,如图1所示,提供了一种语音质检的方法,包括以下步骤:
S102:实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频。
待检测视频指的是在视频录制过程中,终端采集并发送至服务器的各录制节点的视频数据。视频录制过程中包含多个录制环节,各录制环节都有对应的录制节点。在得到待检测视频后,服务器会将待检测视频中的音频和图像进行剥离,提取出各录制节点的待检测音频。与待检测视频对应的次数阈值指的是 与待检测视频对应的必读关键字必须出现的次数阈值。必读关键字指的是在与录制节点对应的录制环节中,客户必须要提及到的词语,用于对客户音频数据进行检测。
S104:根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
由于待检测音频中可能会存在噪音以及环境音,所以在对待检测音频进行分析前,要先对其进行滤波处理,滤除掉其中的噪音以及环境音。待检测音频中包括了业务员音频数据以及客户音频数据,在对待检测音频进行检测时,服务器需要将业务员音频数据以及客户音频数据分离开来。在对待检测音频进行分离时,可以采用语音分割算法以及语音聚类算法对待检测音频进行处理,采用先分割再聚类的方式,先采用语音分割算法将待检测音频分割为多个音频片段,再采用语音聚类算法,将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
S106:根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合。
预设第一关键字集合中包括多个必读关键字,必读关键字指的是在与录制节点对应的录制环节中,客户必须要提及到的词语,违规关键字指的是在与录制节点对应的录制环节中,业务员不能提及到的词语。话术关键字指的是业务员在与录制节点对应的录制环节中,必须要提及到的词语。服务器根据预设第一关键字集合对客户音频数据进行检测,统计必读关键字在客户音频数据中出现的次数,通过比对次数统计结果和与待检测视频对应的次数阈值,确定客户音频数据的检测结果。服务器通过检测业务员音频数据,可以确定业务员是否有提及话术关键字,以及是否未提及违规关键字,进而根据业务员音频数据中的提及情况,确定业务员音频数据的检测结果。
S108:当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。
当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值时,客户音频数据的检测结果为未通过检测。当业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定业务员音频数据的检测结果为未通过检测。当客户音频数据的检测结果或业务员音频数据的检测结果为未通过检测时,待检测音频的检测结果就为未通过检测,服务器会生成补录提示,补录提示会提示客户和 业务员未通过录制的原因,以便客户和业务员在进行现场补录时,避免再犯同样的错误。
上述语音质检的方法,根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,实现了对客户音频数据以及业务员音频数据分别进行检测,根据检测结果确定待检测音频的检测结果,当待检测音频的检测结果为未通过检测时,生成补录提示。通过这种方式,在视频录制过程中,实时对各录制节点的待检测音频进行质检,实现了及时对视频录制过程中的各个环节中的对话进行监控,提高了对业务服务过程进行监控的效率。
在其中一个实施例中,如图2所示,S106包括:
S202:从预设第一关键字集合中获取多个必读关键字;
S204:将客户音频数据转换为客户文字数据;
S206:根据各必读关键字,遍历客户文字数据,统计各必读关键字在客户文字数据中出现的次数;
S208:根据各必读关键字在客户文字数据中出现的次数,得到各必读关键字在客户音频数据中出现的次数。
必读关键字指的是在与录制节点对应的录制环节中,客户必须要提及到的词语,服务器可以从预设第一关键字集合中获取多个必读关键字,在预设第一关键字集合中包括多个必读关键字,在根据必读关键字对客户音频数据进行检测时,需要先将客户音频数据转换为客户文字数据,然后再根据各必读关键字,遍历客户文字数据,统计各必读关键字在客户文字数据中出现的次数。最后根据各必读关键字在客户文字数据中出现的次数,得到各必读关键字在客户音频数据中出现的次数。
因为在与各录制节点对应的录制环节中,业务员会对客户进行提问,客户会通过提及必读关键字对业务员的提问进行回复,所以可根据必读关键字对客户音频数据进行检测,根据每个录制环节中业务员提问次数的不同,客户提及必读关键字的次数也会不相同,所以要确定与录制节点对应的录制环节中客户应提及的必读关键字的次数,即与待检测视频对应的次数阈值,进而比对次数阈值和各必读关键字在客户音频数据中出现的次数,确定客户音频数据的检测结果,只有当各必读关键字在客户音频数据中出现的次数等于次数阈值时,才可认为客户音频数据的检测结果为通过检测。其中,次数阈值可根据录制节点的对话模板确定。
上述实施例,根据各必读关键字在客户文字数据中出现的次数,得到各必读关键字在客户音频数据中出现的次数,从而使得服务器可以根据各必读关键字在客户音频数据中出现的次数,确定客户音频数据的检测结果,实现了对客户音频数据的检测。
在其中一个实施例中,如图3所示,S102包括:
S302:实时获取与待检测视频对应的录制节点的对话模板;
S304:根据第一关键字集合,统计对话模板中各必读关键字出现的次数;
S306:根据对话模板中各必读关键字出现的次数,得到次数阈值。
服务器可通过录制节点携带的节点标识,从预设的对话模板数据库中,实时获取与录制节点对应的对话模板,并根据第一关键字集合获取多个必读关键字,根据各必读关键字遍历对话模板,统计对话模板中各必读关键字出现的次数,对话模板中各必读关键字出现的次数,就是客户在与录制节点对应的录制环节应提及的必读关键字的次数,即次数阈值。
上述实施例,实时获取与待检测视频对应的录制节点的对话模板,根据第一关键字集合,统计对话模板中各必读关键字出现的次数,根据对话模板中各必读关键字出现的次数,得到次数阈值,从而使得服务器可以根据次数阈值实现对客户音频数据的检测。
在其中一个实施例中,如图4所示,S106包括:
S402:将业务员音频数据转换为业务员文字数据;
S404:获取与待检测视频对应的录制节点的话术模板,根据话术模板从业务员文字数据中提取出对应的话术信息;
S406:从第二关键字集合中获取话术关键字,根据话术关键字匹配话术信息;
S408:从第二关键字集合中获取违规关键字,并根据违规关键字遍历业务员文字数据。
服务器在对业务员音频数据进行检测时,需要将业务员音频数据转换为业务员文字数据,获取与待检测视频对应的录制节点的话术模板,根据话术模板从业务员文字数据中提取出对应的话术信息,从第二关键字集合中获取话术关键字,话术关键字指的是业务员在与录制节点对应的录制环节中,必须要提及到的词语,通过检测业务员音频数据,确定业务员是否有提及话术关键字,当业务员有提及话术关键字时,确定业务员音频数据的第一检测结果为通过检测。
除了根据话术关键字对业务员音频数据进行检测之外,服务器还需要通过违规关键字对业务员音频数据进行检测,违规关键字可以从第二关键字集合中获取,违规关键字指的是在与录制节点对应的录制环节中,业务员不能提及到的词语,通过检测业务员音频数据,确定业务员是否未提及违规关键字,当业务员未提及违规关键字时,确定业务员音频数据的第二检测结果为通过检测。只有当第一检测结果和第二检测结果都为通过检测时,才能确定业务员音频数据的检测结果为通过检测。
上述实施例,根据话术关键字以及违规关键字对业务员音频数据进行检测,实现了对业务员音频数据的检测。
在其中一个实施例中,如图5所示,S106之后,还包括:
S502:当第一关键字集合中的必读关键字在客户音频数据中出现的次数达到次数阈值,且业务员音频数据中存在话术关键字集合中的话术关键字,且业务员音频数据中不存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为通过检测。
当第一关键字集合中的必读关键字在客户音频数据中出现的次数达到次数阈值时,服务器可以确定客户音频数据的检测结果为通过检测。当业务员音频数据中存在话术关键字集合中的话术关键字,且业务员音频数据中不存在违规关键字集合中的违规关键字时,服务器可以确定业务员音频数据的检测结果为通过检测。当客户音频数据和业务员音频数据的检测结果都为通过检测时,服务器即可确定待检测音频的检测结果为通过检测。
上述实施例,通过客户音频数据和业务员音频数据的检测结果,确定待检测音频的检测结果,实现了对待检测音频的检测结果的确定。
在其中一个实施例中,如图6所示,S104包括:
S602:对待检测音频进行滤波处理,滤除待检测音频中的噪音以及环境音;
S604:根据预设的语音分割算法将滤波后的待检测音频分割为多个音频片段;
S606:根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
因为待检测音频中可能包括噪音以及环境音,所以服务器在对待检测音频进行处理时,首先需要对待检测音频进行滤波处理,滤除待检测音频中的噪音以及环境音,再采用语音分割算法以及语音聚类算法对滤波后的待检测音频进行处理,得到业务员音频数据以及客户音频数据。其中,语音分割算法指的是说话人改变点检测,即定位语音数据中说话人身份发生改变的点。常见的语音分割算法通常以高斯模型的窗移分割点检测算法为基础,观测并计算相邻语音窗之间的距离,基于阈值或惩罚因子来决定这两段语音是否来自于同一个说话人。其中,阈值或惩罚因子可以通过采集训练集数据获得。通过语音分割算法可以将待检测音频分割成多个音频片段,每个音频片段中只包含一个人的音频数据。
语音聚类算法是在语音分割算法的基础上,将属于同一个说话人的音频片段合并起来,常见的语音聚类算法可分为两类:自顶向下聚类以及自底向上聚类,将分割后得到的每个音频片段当成一类,然后根据BIC(Bayesian Information Criterions,贝叶斯信息规则)距离连续地合并最相邻的两类,直到语音片段的合并不再导致BIC的值增加为止,以此得到两类音频数据。在得到两类音频数据后,服务器会进一步对两类音频数据进行分析,提取出两类音频数据的声纹特征,通过两类音频数据的声纹特征匹配预设的业务员信息数据库中的业务员 声纹特征,确定两类音频数据中的业务员音频数据,另一个即为客户音频数据。
上述实施例,对待检测音频进行滤波处理,滤除待检测音频中的噪音以及环境音,采用语音分割算法,将滤波后的待检测音频分割为多个音频片段,采用语音聚类算法,将多个音频片段聚类为业务员音频数据以及客户音频数据,实现了对业务员音频数据以及客户音频数据的提取。
应该理解的是,虽然图1-6的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1-6中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图7所示,提供了一种语音质检的装置,包括:获取模块702、提取模块704、检测模块706和处理模块708,其中:
获取模块702,用于实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
提取模块704,用于根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
检测模块706,用于根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
处理模块708,用于当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。上述语音质检的装置,根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,实现了对客户音频数据以及业务员音频数据分别进行检测,根据检测结果确定待检测音频的检测结果,当待检测音频的检测结果为未通过检测时,生成补录提示。通过这种方式,在视频录制过程中,实时对各录制节点的待检测音频进行质检,实现了及时对视频录制过程中的各个环节中的对话进行监控,提高了对业务服务过程进行监控的效率。
在其中一个实施例中,检测模块还用于从预设第一关键字集合中获取多个必读关键字,将客户音频数据转换为客户文字数据,根据各必读关键字,遍历客户文字数据,统计各必读关键字在客户文字数据中出现的次数,根据各必读 关键字在客户文字数据中出现的次数,得到各必读关键字在客户音频数据中出现的次数。
在其中一个实施例中,获取模块还用于实时获取与待检测视频对应的录制节点的对话模板,根据第一关键字集合,统计对话模板中各必读关键字出现的次数,根据对话模板中各必读关键字出现的次数,得到次数阈值。
在其中一个实施例中,检测模块还用于将业务员音频数据转换为业务员文字数据,获取与待检测视频对应的录制节点的话术模板,根据话术模板从业务员文字数据中提取出对应的话术信息,从第二关键字集合中获取话术关键字,根据话术关键字匹配话术信息,从第二关键字集合中获取违规关键字,并根据违规关键字遍历业务员文字数据。
在其中一个实施例中,检测模块还用于当第一关键字集合中的必读关键字在客户音频数据中出现的次数达到次数阈值,且业务员音频数据中存在话术关键字集合中的话术关键字,且业务员音频数据中不存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为通过检测。
在其中一个实施例中,提取模块还用于对待检测音频进行滤波处理,滤除待检测音频中的噪音以及环境音,根据预设的语音分割算法将滤波后的待检测音频分割为多个音频片段,根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
关于语音质检的装置的具体限定可以参见上文中对于语音质检的方法的限定,在此不再赘述。上述语音质检的装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储必读关键字数据、违规关键字数据以及对话模板数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种语音质检的方法。
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,该存储器存储有计算机程序,该处理器执行计算机程序时实现以下步骤:
实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。上述语音质检的计算机设备,根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,实现了对客户音频数据以及业务员音频数据分别进行检测,根据检测结果确定待检测音频的检测结果,当待检测音频的检测结果为未通过检测时,生成补录提示。通过这种方式,在视频录制过程中,实时对各录制节点的待检测音频进行质检,实现了及时对视频录制过程中的各个环节中的对话进行监控,提高了对业务服务过程进行监控的效率。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
从预设第一关键字集合中获取多个必读关键字;
将客户音频数据转换为客户文字数据;
根据各必读关键字,遍历客户文字数据,统计各必读关键字在客户文字数据中出现的次数;
根据各必读关键字在客户文字数据中出现的次数,得到各必读关键字在客户音频数据中出现的次数。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
实时获取与待检测视频对应的录制节点的对话模板;
根据第一关键字集合,统计对话模板中各必读关键字出现的次数;
根据对话模板中各必读关键字出现的次数,得到次数阈值。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
将业务员音频数据转换为业务员文字数据;
获取与待检测视频对应的录制节点的话术模板,根据话术模板从业务员文字数据中提取出对应的话术信息;
从第二关键字集合中获取话术关键字,根据话术关键字匹配话术信息;
从第二关键字集合中获取违规关键字,并根据违规关键字遍历业务员文字数据。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
当第一关键字集合中的必读关键字在客户音频数据中出现的次数达到次数阈值,且业务员音频数据中存在话术关键字集合中的话术关键字,且业务员音频数据中不存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为通过检测。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
对待检测音频进行滤波处理,滤除待检测音频中的噪音以及环境音;
根据预设的语音分割算法将滤波后的待检测音频分割为多个音频片段;
根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
在一个实施例中,提供了一种计算机可读存储介质,该存储介质为易失性存储介质或非易失性存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。上述语音质检的存储介质,根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,实现了对客户音频数据以及业务员音频数据分别进行检测,根据检测结果确定待检测音频的检测结果,当待检测音频的检测结果为未通过检测时,生成补录提示。通过这种方式,在视频录制过程中,实时对各录制节点的待检测音频进行质检,实现了及时对视频录制过程中的各个环节中的对话进行监控,提高了对业务服务过程进行监控的效率。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
从预设第一关键字集合中获取多个必读关键字;
将客户音频数据转换为客户文字数据;
根据各必读关键字,遍历客户文字数据,统计各必读关键字在客户文字数据中出现的次数;
根据各必读关键字在客户文字数据中出现的次数,得到各必读关键字在客户音频数据中出现的次数。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
实时获取与待检测视频对应的录制节点的对话模板;
根据第一关键字集合,统计对话模板中各必读关键字出现的次数;
根据对话模板中各必读关键字出现的次数,得到次数阈值。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
将业务员音频数据转换为业务员文字数据;
获取与待检测视频对应的录制节点的话术模板,根据话术模板从业务员文字数据中提取出对应的话术信息;
从第二关键字集合中获取话术关键字,根据话术关键字匹配话术信息;
从第二关键字集合中获取违规关键字,并根据违规关键字遍历业务员文字数据。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
当第一关键字集合中的必读关键字在客户音频数据中出现的次数达到次数阈值,且业务员音频数据中存在话术关键字集合中的话术关键字,且业务员音频数据中不存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为通过检测。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
对待检测音频进行滤波处理,滤除待检测音频中的噪音以及环境音;
根据预设的语音分割算法将滤波后的待检测音频分割为多个音频片段;
根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus) 直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。

Claims (20)

  1. 一种语音质检的方法,其中,所述方法包括:
    实时获取视频录制过程中各录制节点的待检测视频以及与所述待检测视频对应的次数阈值,从所述待检测视频中提取出各录制节点的待检测音频;
    根据预设的语音分割算法将所述待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
    根据预设第一关键字集合对所述客户音频数据进行检测,并根据预设第二关键字集合对所述业务员音频数据进行检测,所述第二关键字集合包括话术关键字集合以及违规关键字集合;
    当所述第一关键字集合中的必读关键字在所述客户音频数据中出现的次数不等于所述次数阈值,或所述业务员音频数据中不存在所述话术关键字集合中的话术关键字,或所述业务员音频数据中存在所述违规关键字集合中的违规关键字时,确定所述待检测音频的检测结果为未通过检测,生成补录提示。
  2. 根据权利要求1所述的方法,其中,所述根据预设第一关键字集合对所述客户音频数据进行检测包括:
    从预设第一关键字集合中获取多个必读关键字;
    将所述客户音频数据转换为客户文字数据;
    根据各所述必读关键字,遍历所述客户文字数据,统计各所述必读关键字在所述客户文字数据中出现的次数;
    根据各所述必读关键字在所述客户文字数据中出现的次数,得到各所述必读关键字在所述客户音频数据中出现的次数。
  3. 根据权利要求1所述的方法,其中,所述实时获取与所述待检测视频对应的次数阈值包括:
    实时获取与所述待检测视频对应的录制节点的对话模板;
    根据所述第一关键字集合,统计所述对话模板中各必读关键字出现的次数;
    根据所述对话模板中各必读关键字出现的次数,得到次数阈值。
  4. 根据权利要求1所述的方法,其中,所述根据预设第二关键字集合对所述业务员音频数据进行检测,所述第二关键字集合包括话术关键字集合以及违规关键字集合包括:
    将所述业务员音频数据转换为业务员文字数据;
    获取与所述待检测视频对应的录制节点的话术模板,根据所述话术模板从所述业务员文字数据中提取出对应的话术信息;
    从所述第二关键字集合中获取话术关键字,根据所述话术关键字匹配所述话术信息;
    从所述第二关键字集合中获取违规关键字,并根据所述违规关键字遍历所述业务员文字数据。
  5. 根据权利要求1所述的方法,其中,所述根据预设第一关键字集合对所述客户音频数据进行检测,并根据预设第二关键字集合对所述业务员音频数据进行检测之后,还包括:
    当所述第一关键字集合中的必读关键字在所述客户音频数据中出现的次数达到所述次数阈值,且所述业务员音频数据中存在所述话术关键字集合中的话术关键字,且所述业务员音频数据中不存在所述违规关键字集合中的违规关键字时,确定所述待检测音频的检测结果为通过检测。
  6. 根据权利要求1所述的方法,其中,所述根据预设的语音分割算法将所述待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据包括:
    对所述待检测音频进行滤波处理,滤除所述待检测音频中的噪音以及环境音;
    根据预设的语音分割算法将滤波后的待检测音频分割为多个音频片段;
    根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
  7. 一种语音质检的装置,其中,所述装置包括:
    获取模块,用于实时获取视频录制过程中各录制节点的待检测视频以及与所述待检测视频对应的次数阈值,从所述待检测视频中提取出各录制节点的待检测音频;
    提取模块,用于根据预设的语音分割算法将所述待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
    检测模块,用于根据预设第一关键字集合对所述客户音频数据进行检测,并根据预设第二关键字集合对所述业务员音频数据进行检测,所述第二关键字集合包括话术关键字集合以及违规关键字集合;
    处理模块,用于当所述第一关键字集合中的必读关键字在所述客户音频数据中出现的次数不等于所述次数阈值,或所述业务员音频数据中不存在所述话术关键字集合中的话术关键字,或所述业务员音频数据中存在所述违规关键字集合中的违规关键字时,确定所述待检测音频的检测结果为未通过检测,生成补录提示。
  8. 根据权利要求7所述的装置,其中,所述检测模块还用于从预设第一关键字集合中获取多个必读关键字,将所述客户音频数据转换为客户文字数据,根据各所述必读关键字,遍历所述客户文字数据,统计各所述必读关键字在所述客户文字数据中出现的次数,根据各所述必读关键字在所述客户文字数据中出现的次数,确定各所述必读关键字在所述客户音频数据中出现的次数。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:
    实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
    根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
    根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
    当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。
  10. 根据权利要求9所述的计算机设备,其中,所述根据预设第一关键字集合对所述客户音频数据进行检测包括:从预设第一关键字集合中获取多个必读关键字;将所述客户音频数据转换为客户文字数据;根据各所述必读关键字,遍历所述客户文字数据,统计各所述必读关键字在所述客户文字数据中出现的次数;根据各所述必读关键字在所述客户文字数据中出现的次数,得到各所述必读关键字在所述客户音频数据中出现的次数。
  11. 根据权利要求9所述的计算机设备,其中,所述实时获取与所述待检测视频对应的次数阈值包括:
    实时获取与所述待检测视频对应的录制节点的对话模板;
    根据所述第一关键字集合,统计所述对话模板中各必读关键字出现的次数;
    根据所述对话模板中各必读关键字出现的次数,得到次数阈值。
  12. 根据权利要求9所述的计算机设备,其中,所述根据预设第二关键字集合对所述业务员音频数据进行检测,所述第二关键字集合包括话术关键字集合以及违规关键字集合包括:
    将所述业务员音频数据转换为业务员文字数据;
    获取与所述待检测视频对应的录制节点的话术模板,根据所述话术模板从所述业务员文字数据中提取出对应的话术信息;
    从所述第二关键字集合中获取话术关键字,根据所述话术关键字匹配所述话术信息;
    从所述第二关键字集合中获取违规关键字,并根据所述违规关键字遍历所述业务员文字数据。
  13. 根据权利要求9所述的计算机设备,其中,所述根据预设第一关键字 集合对所述客户音频数据进行检测,并根据预设第二关键字集合对所述业务员音频数据进行检测之后,还包括:
    当所述第一关键字集合中的必读关键字在所述客户音频数据中出现的次数达到所述次数阈值,且所述业务员音频数据中存在所述话术关键字集合中的话术关键字,且所述业务员音频数据中不存在所述违规关键字集合中的违规关键字时,确定所述待检测音频的检测结果为通过检测。
  14. 根据权利要求9所述的计算机设备,其中,所述根据预设的语音分割算法将所述待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据包括:
    对所述待检测音频进行滤波处理,滤除所述待检测音频中的噪音以及环境音;
    根据预设的语音分割算法将滤波后的待检测音频分割为多个音频片段;
    根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现以下步骤:
    实时获取视频录制过程中各录制节点的待检测视频以及与待检测视频对应的次数阈值,从待检测视频中提取出各录制节点的待检测音频;
    根据预设的语音分割算法将待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据;
    根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,第二关键字集合包括话术关键字集合以及违规关键字集合;
    当第一关键字集合中的必读关键字在客户音频数据中出现的次数不等于次数阈值,或业务员音频数据中不存在话术关键字集合中的话术关键字,或业务员音频数据中存在违规关键字集合中的违规关键字时,确定待检测音频的检测结果为未通过检测,生成补录提示。上述语音质检的方法、装置、计算机设备和存储介质,根据预设第一关键字集合对客户音频数据进行检测,并根据预设第二关键字集合对业务员音频数据进行检测,实现了对客户音频数据以及业务员音频数据分别进行检测,根据检测结果确定待检测音频的检测结果,当待检测音频的检测结果为未通过检测时,生成补录提示。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述根据预设第一关键字集合对所述客户音频数据进行检测包括:
    从预设第一关键字集合中获取多个必读关键字;将所述客户音频数据转换 为客户文字数据;根据各所述必读关键字,遍历所述客户文字数据,统计各所述必读关键字在所述客户文字数据中出现的次数;根据各所述必读关键字在所述客户文字数据中出现的次数,得到各所述必读关键字在所述客户音频数据中出现的次数。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述实时获取与所述待检测视频对应的次数阈值包括:实时获取与所述待检测视频对应的录制节点的对话模板;根据所述第一关键字集合,统计所述对话模板中各必读关键字出现的次数;根据所述对话模板中各必读关键字出现的次数,得到次数阈值。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述根据预设第二关键字集合对所述业务员音频数据进行检测,所述第二关键字集合包括话术关键字集合以及违规关键字集合包括:
    将所述业务员音频数据转换为业务员文字数据;
    获取与所述待检测视频对应的录制节点的话术模板,根据所述话术模板从所述业务员文字数据中提取出对应的话术信息;
    从所述第二关键字集合中获取话术关键字,根据所述话术关键字匹配所述话术信息;
    从所述第二关键字集合中获取违规关键字,并根据所述违规关键字遍历所述业务员文字数据。
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述根据预设第一关键字集合对所述客户音频数据进行检测,并根据预设第二关键字集合对所述业务员音频数据进行检测之后,还包括:
    当所述第一关键字集合中的必读关键字在所述客户音频数据中出现的次数达到所述次数阈值,且所述业务员音频数据中存在所述话术关键字集合中的话术关键字,且所述业务员音频数据中不存在所述违规关键字集合中的违规关键字时,确定所述待检测音频的检测结果为通过检测。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述根据预设的语音分割算法将所述待检测音频分割为多个音频片段,并根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据包括:对所述待检测音频进行滤波处理,滤除所述待检测音频中的噪音以及环境音;根据预设的语音分割算法将滤波后的待检测音频分割为多个音频片段;根据预设的语音聚类算法将多个音频片段中属于同一个说话人的音频片段合并,得到业务员音频数据以及客户音频数据。
PCT/CN2020/086625 2019-07-09 2020-04-24 语音质检的方法、装置、计算机设备和存储介质 WO2021004128A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910616721.8A CN110364183A (zh) 2019-07-09 2019-07-09 语音质检的方法、装置、计算机设备和存储介质
CN201910616721.8 2019-07-09

Publications (1)

Publication Number Publication Date
WO2021004128A1 true WO2021004128A1 (zh) 2021-01-14

Family

ID=68218251

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086625 WO2021004128A1 (zh) 2019-07-09 2020-04-24 语音质检的方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN110364183A (zh)
WO (1) WO2021004128A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112911180A (zh) * 2021-01-28 2021-06-04 中国建设银行股份有限公司 一种视频录制方法、装置、电子设备及可读存储介质
CN113035201A (zh) * 2021-03-16 2021-06-25 广州佰锐网络科技有限公司 一种金融业务质检方法和系统
CN113240436A (zh) * 2021-04-22 2021-08-10 北京沃东天骏信息技术有限公司 在线客服话术质检的方法和装置
CN113506585A (zh) * 2021-09-09 2021-10-15 深圳市一号互联科技有限公司 一种语音通话的质量评估方法及系统
CN113571048A (zh) * 2021-07-21 2021-10-29 腾讯科技(深圳)有限公司 一种音频数据检测方法、装置、设备及可读存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110364183A (zh) * 2019-07-09 2019-10-22 深圳壹账通智能科技有限公司 语音质检的方法、装置、计算机设备和存储介质
CN111639529A (zh) * 2020-04-24 2020-09-08 深圳壹账通智能科技有限公司 基于多层次逻辑的语音话术检测方法、装置及计算机设备
CN111696527B (zh) * 2020-06-15 2020-12-22 龙马智芯(珠海横琴)科技有限公司 语音质检区域的定位方法、装置、定位设备及存储介质
CN111723204B (zh) * 2020-06-15 2021-04-02 龙马智芯(珠海横琴)科技有限公司 语音质检区域的校正方法、装置、校正设备及存储介质
CN111883139A (zh) * 2020-07-24 2020-11-03 北京字节跳动网络技术有限公司 用于筛选目标语音的方法、装置、设备和介质
CN113158662A (zh) * 2021-04-27 2021-07-23 中国工商银行股份有限公司 音频数据的实时监测方法及装置
CN113641795A (zh) * 2021-08-20 2021-11-12 上海明略人工智能(集团)有限公司 用于话术统计的方法及装置、电子设备、存储介质
CN115883760A (zh) * 2022-01-11 2023-03-31 北京中关村科金技术有限公司 音视频的实时质检方法、装置及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543063A (zh) * 2011-12-07 2012-07-04 华南理工大学 基于说话人分割与聚类的多说话人语速估计方法
CN105261362A (zh) * 2015-09-07 2016-01-20 科大讯飞股份有限公司 一种通话语音监测方法及系统
JP2018120640A (ja) * 2018-05-09 2018-08-02 株式会社野村総合研究所 コンプライアンスチェックシステムおよびコンプライアンスチェックプログラム
CN108737667A (zh) * 2018-05-03 2018-11-02 平安科技(深圳)有限公司 语音质检方法、装置、计算机设备及存储介质
CN109660744A (zh) * 2018-10-19 2019-04-19 深圳壹账通智能科技有限公司 基于大数据的智能双录方法、设备、存储介质及装置
CN109729383A (zh) * 2019-01-04 2019-05-07 深圳壹账通智能科技有限公司 双录视频质量检测方法、装置、计算机设备和存储介质
CN109767335A (zh) * 2018-12-15 2019-05-17 深圳壹账通智能科技有限公司 双录质检方法、装置、计算机设备及存储介质
CN109783338A (zh) * 2019-01-02 2019-05-21 深圳壹账通智能科技有限公司 基于业务信息的录制处理方法、装置和计算机设备
CN110364183A (zh) * 2019-07-09 2019-10-22 深圳壹账通智能科技有限公司 语音质检的方法、装置、计算机设备和存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108091328B (zh) * 2017-11-20 2021-04-16 北京百度网讯科技有限公司 基于人工智能的语音识别纠错方法、装置及可读介质
CN108962282B (zh) * 2018-06-19 2021-07-13 京北方信息技术股份有限公司 语音检测分析方法、装置、计算机设备及存储介质
CN109711996A (zh) * 2018-08-17 2019-05-03 深圳壹账通智能科技有限公司 保单双录文件质检方法、装置、设备及可读存储介质
CN109599093B (zh) * 2018-10-26 2021-11-26 北京中关村科金技术有限公司 智能质检的关键词检测方法、装置、设备及可读存储介质
CN109587360B (zh) * 2018-11-12 2021-07-13 平安科技(深圳)有限公司 电子装置、应对话术推荐方法和计算机可读存储介质
CN109327632A (zh) * 2018-11-23 2019-02-12 深圳前海微众银行股份有限公司 客服录音的智能质检系统、方法及计算机可读存储介质
CN109767765A (zh) * 2019-01-17 2019-05-17 平安科技(深圳)有限公司 话术匹配方法及装置、存储介质、计算机设备
CN109819128A (zh) * 2019-01-23 2019-05-28 平安科技(深圳)有限公司 一种电话录音的质检方法和装置
CN109830246B (zh) * 2019-01-25 2019-10-29 北京海天瑞声科技股份有限公司 音频质量评估方法、装置、电子设备及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543063A (zh) * 2011-12-07 2012-07-04 华南理工大学 基于说话人分割与聚类的多说话人语速估计方法
CN105261362A (zh) * 2015-09-07 2016-01-20 科大讯飞股份有限公司 一种通话语音监测方法及系统
CN108737667A (zh) * 2018-05-03 2018-11-02 平安科技(深圳)有限公司 语音质检方法、装置、计算机设备及存储介质
JP2018120640A (ja) * 2018-05-09 2018-08-02 株式会社野村総合研究所 コンプライアンスチェックシステムおよびコンプライアンスチェックプログラム
CN109660744A (zh) * 2018-10-19 2019-04-19 深圳壹账通智能科技有限公司 基于大数据的智能双录方法、设备、存储介质及装置
CN109767335A (zh) * 2018-12-15 2019-05-17 深圳壹账通智能科技有限公司 双录质检方法、装置、计算机设备及存储介质
CN109783338A (zh) * 2019-01-02 2019-05-21 深圳壹账通智能科技有限公司 基于业务信息的录制处理方法、装置和计算机设备
CN109729383A (zh) * 2019-01-04 2019-05-07 深圳壹账通智能科技有限公司 双录视频质量检测方法、装置、计算机设备和存储介质
CN110364183A (zh) * 2019-07-09 2019-10-22 深圳壹账通智能科技有限公司 语音质检的方法、装置、计算机设备和存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112911180A (zh) * 2021-01-28 2021-06-04 中国建设银行股份有限公司 一种视频录制方法、装置、电子设备及可读存储介质
CN113035201A (zh) * 2021-03-16 2021-06-25 广州佰锐网络科技有限公司 一种金融业务质检方法和系统
CN113240436A (zh) * 2021-04-22 2021-08-10 北京沃东天骏信息技术有限公司 在线客服话术质检的方法和装置
CN113571048A (zh) * 2021-07-21 2021-10-29 腾讯科技(深圳)有限公司 一种音频数据检测方法、装置、设备及可读存储介质
CN113571048B (zh) * 2021-07-21 2023-06-23 腾讯科技(深圳)有限公司 一种音频数据检测方法、装置、设备及可读存储介质
CN113506585A (zh) * 2021-09-09 2021-10-15 深圳市一号互联科技有限公司 一种语音通话的质量评估方法及系统

Also Published As

Publication number Publication date
CN110364183A (zh) 2019-10-22

Similar Documents

Publication Publication Date Title
WO2021004128A1 (zh) 语音质检的方法、装置、计算机设备和存储介质
CN112804400B (zh) 客服呼叫语音质检方法、装置、电子设备及存储介质
US8078463B2 (en) Method and apparatus for speaker spotting
US10049661B2 (en) System and method for analyzing and classifying calls without transcription via keyword spotting
US11769014B2 (en) Classifying digital documents in multi-document transactions based on signatory role analysis
US7912714B2 (en) Method for segmenting communication transcripts using unsupervised and semi-supervised techniques
US9472195B2 (en) Systems and methods for detecting fraud in spoken tests using voice biometrics
CN110533288A (zh) 业务办理流程检测方法、装置、计算机设备和存储介质
US11503158B2 (en) Method and system for fraud clustering by content and biometrics analysis
CN107798047B (zh) 重复工单检测方法、装置、服务器和介质
CN105187674B (zh) 服务录音的合规检查方法及装置
CN110598008B (zh) 录制数据的数据质检方法及装置、存储介质
CN109766474A (zh) 审讯信息审核方法、装置、计算机设备和存储介质
CN111696528B (zh) 一种语音质检方法、装置、质检设备及可读存储介质
CN109831677B (zh) 视频脱敏方法、装置、计算机设备和存储介质
CN110310127B (zh) 录音获取方法、装置、计算机设备及存储介质
CN110713088B (zh) 电梯投诉的预警方法、装置、设备及介质
CN113095204B (zh) 双录数据质检方法、装置及系统
CN110378587A (zh) 智能质检方法、系统、介质以及设备
CN110457394A (zh) 车辆信息管理方法、装置、计算机设备和存储介质
CN116864050A (zh) 一种方案偏离半定量评估的临床试验质量控制方法和设备
US8407177B2 (en) System and associated method for determining and applying sociocultural characteristics
US12020711B2 (en) System and method for detecting fraudsters
Ghate et al. Optimized intelligent speech signal verification system for identifying authorized users.
Ali et al. A distance metric based outliers detection for robust automatic speaker recognition applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20836952

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 17/05/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20836952

Country of ref document: EP

Kind code of ref document: A1