CN110364183A

CN110364183A - Method, apparatus, computer equipment and the storage medium of voice quality inspection

Info

Publication number: CN110364183A
Application number: CN201910616721.8A
Authority: CN
Inventors: 熊玮
Original assignee: OneConnect Smart Technology Co Ltd
Current assignee: OneConnect Smart Technology Co Ltd
Priority date: 2019-07-09
Filing date: 2019-07-09
Publication date: 2019-10-22
Also published as: WO2021004128A1

Abstract

This application involves Business Process Optimizing technical fields, provide method, apparatus, computer equipment and the storage medium of a kind of voice quality inspection.The described method includes: being detected according to default first set of keywords to client audio data, and business personnel's audio data is detected according to default second set of keywords, second set of keywords includes words art set of keywords and violation set of keywords, when the number that keyword occurs in client audio data that must read in the first set of keywords is not equal to frequency threshold value, or there is no art keywords in words art set of keywords in business personnel's audio data, or there are when the violation keyword in violation set of keywords in business personnel's audio data, the testing result for determining audio to be detected is not pass through detection, generate amended record prompt.Quality inspection can be carried out to each audio to be detected for recording node in real time using this method, to improve the efficiency being monitored to business service process.

Description

Method, apparatus, computer equipment and the storage medium of voice quality inspection

Technical field

This application involves Business Process Optimizing technical fields, method, apparatus, calculating more particularly to a kind of voice quality inspection Machine equipment and storage medium.

Background technique

With the development of service industry, more and more enterprises are required to take business when carrying out business service to client Business process is monitored, and traditionally, being monitored to business service process includes: to synchronize to be recorded and recorded to service process Picture obtains business service video after business service, manually carries out on backstage to the conversation content in business service video Repeatedly listen to and quality inspection, when finding certain section of dialogue by quality inspection, there are when problem, notify business personnel and client to carry out amended record.

However, the mode that traditionally business service process is monitored, until can just be looked into last quality check process It finds the dialogue problem in each link and carries out amended record, there is a problem of that monitoring efficiency is low.

Summary of the invention

Based on this, it is necessary in view of the above technical problems, provide a kind of side of voice quality inspection that can be improved monitoring efficiency Method, device, computer equipment and storage medium.

A kind of method of voice quality inspection, which comprises

Each video to be detected for recording node and corresponding with video to be detected time are obtained during video record in real time Number threshold value extracts each audio to be detected for recording node from video to be detected；

It by audio segmentation to be detected is multiple audio fragments according to preset voice partitioning algorithm, and according to preset voice Clustering algorithm will belong in multiple audio fragments the same speaker audio fragment merge, obtain business personnel's audio data and Client audio data；

Client audio data are detected according to default first set of keywords, and according to default second set of keywords Business personnel's audio data is detected, the second set of keywords includes words art set of keywords and violation set of keywords；

When the number that keyword occurs in client audio data that must read in the first set of keywords is not equal to number threshold There is no deposit in art keyword or business personnel's audio data in words art set of keywords in value or business personnel's audio data When violation keyword in violation set of keywords, determine that the testing result of audio to be detected is to generate and mend not by detection Record prompt.Carrying out detection to client audio data according to default first set of keywords in one of the embodiments, includes:

Obtaining from default first set of keywords multiple must read keyword；

Client audio data are converted into client's lteral data；

According to must respectively read keyword, client's lteral data is traversed, statistics must respectively read keyword and go out in client's lteral data Existing number；

The number occurred in client's lteral data according to must respectively read keyword obtains respectively reading keyword in client audio The number occurred in data.

Obtaining frequency threshold value corresponding with video to be detected in real time in one of the embodiments, includes:

The dialog template for recording node corresponding with video to be detected is obtained in real time；

According to the first set of keywords, the number that keyword appearance must be respectively read in dialog template is counted；

According to the number that must respectively read keyword appearance in dialog template, frequency threshold value is obtained.

Business personnel's audio data is detected according to default second set of keywords in one of the embodiments, the Two set of keywords include words art set of keywords and violation set of keywords includes:

Business personnel's audio data is converted into business personnel's lteral data；

Art template if corresponding with video to be detected recording node is obtained, according to talking about art template from business personnel's lteral data In extract it is corresponding if art information；

Words art keyword is obtained from the second set of keywords, and art information is talked about according to words art keyword match；

Violation keyword is obtained from the second set of keywords, and business personnel's lteral data is traversed according to violation keyword.

Client audio data are detected according to default first set of keywords in one of the embodiments, and root After being detected according to default second set of keywords to business personnel's audio data, further includes:

When the number that keyword occurs in client audio data that must read in the first set of keywords reaches frequency threshold value, And there is art keyword in words art set of keywords in business personnel's audio data, and there is no disobey in business personnel's audio data When advising the violation keyword in set of keywords, determine that the testing result of audio to be detected is to pass through detection.

It is in one of the embodiments, multiple audio pieces by audio segmentation to be detected according to preset voice partitioning algorithm Section, and merged the audio fragment for belonging to the same speaker in multiple audio fragments according to preset voice clustering algorithm, it obtains Include: to business personnel's audio data and client audio data

Audio to be detected is filtered, the noise and ambient sound in audio to be detected are filtered out；

According to preset voice partitioning algorithm by filtered audio segmentation to be detected be multiple audio fragments；

The audio fragment for belonging to the same speaker in multiple audio fragments is merged according to preset voice clustering algorithm, Obtain business personnel's audio data and client audio data.

A kind of device of voice quality inspection, described device include:

Obtain module, for obtain during video record in real time each video to be detected for recording node and with it is to be detected The corresponding frequency threshold value of video extracts each audio to be detected for recording node from video to be detected；

Extraction module is used to according to preset voice partitioning algorithm be multiple audio fragments by audio segmentation to be detected, and The audio fragment for belonging to the same speaker in multiple audio fragments is merged according to preset voice clustering algorithm, obtains business Member's audio data and client audio data；

Detection module, for being detected according to default first set of keywords to client audio data, and according to default Second set of keywords detects business personnel's audio data, and the second set of keywords includes words art set of keywords and disobeys Advise set of keywords；

Processing module, for when the number that must be read keyword and occur in client audio data in the first set of keywords Not equal in frequency threshold value or business personnel's audio data, there is no art keyword or business personnels in words art set of keywords There are when the violation keyword in violation set of keywords, determine that the testing result of audio to be detected is not pass through in audio data Detection generates amended record prompt.Detection module is also used to obtain from default first set of keywords in one of the embodiments, It is multiple to read keyword, client audio data are converted into client's lteral data, according to must respectively read keyword, traverse client's text Data, statistics must respectively read the number that keyword occurs in client's lteral data, according to must respectively read keyword in client's text number According to the number of middle appearance, determination must respectively read the number that keyword occurs in client audio data.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device performs the steps of when executing the computer program

When the number that keyword occurs in client audio data that must read in the first set of keywords is not equal to number threshold There is no deposit in art keyword or business personnel's audio data in words art set of keywords in value or business personnel's audio data When violation keyword in violation set of keywords, determine that the testing result of audio to be detected is to generate and mend not by detection Record prompt.A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is executed by processor When perform the steps of

When the number that keyword occurs in client audio data that must read in the first set of keywords is not equal to number threshold There is no deposit in art keyword or business personnel's audio data in words art set of keywords in value or business personnel's audio data When violation keyword in violation set of keywords, determine that the testing result of audio to be detected is to generate and mend not by detection Record prompt.Method, apparatus, computer equipment and the storage medium of above-mentioned voice quality inspection, according to default first set of keywords pair Client audio data are detected, and are detected according to default second set of keywords to business personnel's audio data, are realized Client audio data and business personnel's audio data are detected respectively, determine the detection of audio to be detected according to testing result As a result, generating amended record prompt when the testing result of audio to be detected is not pass through detection.In this way, it is recorded in video During system, quality inspection is carried out to each audio to be detected for recording node in real time, is realized in time to each during video record Dialogue in a link is monitored, and improves the efficiency being monitored to business service process.

Detailed description of the invention

Fig. 1 is the flow diagram of the method for voice quality inspection in one embodiment；

The sub-process schematic diagram that Fig. 2 is step S106 in Fig. 1 in one embodiment；

The sub-process schematic diagram that Fig. 3 is step S102 in Fig. 1 in one embodiment；

The sub-process schematic diagram that Fig. 4 is step S106 in Fig. 1 in one embodiment；

Fig. 5 is the flow diagram of the method for voice quality inspection in another embodiment；

The sub-process schematic diagram that Fig. 6 is step S104 in Fig. 1 in one embodiment；

Fig. 7 is the structural block diagram of the device of voice quality inspection in one embodiment；

Fig. 8 is the internal structure chart of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

In one embodiment, as shown in Figure 1, providing a kind of method of voice quality inspection, comprising the following steps:

S102: each video to be detected for recording node and corresponding with video to be detected is obtained during video record in real time Frequency threshold value, extracted from video to be detected it is each record node audio to be detected.

Video to be detected refers to that during video record, terminal acquires and be sent to each recording node of server Video data.It include multiple recording links during video record, each link of recording has corresponding recording node.Obtain to Detect video after, server can by video to be detected audio and image remove, extract it is each record node it is to be checked Acoustic frequency.And the corresponding frequency threshold value of video to be detected refers to corresponding with video to be detected to read what keyword must occur Frequency threshold value.It must read keyword and refer to that in recording link corresponding with node is recorded, client has to the word being mentioned to, For being detected to client audio data.

S104: by audio segmentation to be detected being multiple audio fragments according to preset voice partitioning algorithm, and according to default Voice clustering algorithm will belong in multiple audio fragments the same speaker audio fragment merge, obtain business personnel's audio number Accordingly and client audio data.

Due to that may have noise and ambient sound in audio to be detected, so being analyzed to audio to be detected Before, first it is filtered, filter out noise and ambient sound therein.It include business personnel's sound in audio to be detected Frequency accordingly and client audio data, when being detected to audio to be detected, server need by business personnel's audio data with And client audio data separating comes.When separating to audio to be detected, voice partitioning algorithm and voice can be used Clustering algorithm handles audio to be detected, first will be to be checked using voice partitioning algorithm by the way of first dividing and clustering again Acoustic frequency division is segmented into multiple audio fragments, then uses voice clustering algorithm, and the same speaker will be belonged in multiple audio fragments Audio fragment merge, obtain business personnel's audio data and client audio data.

S106: detecting client audio data according to default first set of keywords, and crucial according to default second Word set detects business personnel's audio data, and the second set of keywords includes words art set of keywords and violation keyword Set.

Include in default first set of keywords it is multiple must read keyword, must read keyword refer to record node pair In the recording link answered, client has to the word being mentioned to, and violation keyword is referred in recording corresponding with node is recorded In link, word that business personnel cannot be mentioned to.Words art keyword refers to business personnel in recording ring corresponding with node is recorded In section, it is necessary to the word being mentioned to.Server detects client audio data according to default first set of keywords, system Meter must read the number that keyword occurs in client audio data, by comparing number statistical result and corresponding with video to be detected Frequency threshold value, determine the testing result of client audio data.Server can determine industry by detection business personnel's audio data Whether business person refers to words art keyword, and whether does not refer to violation keyword, and then according in business personnel's audio data It refers to situation, determines the testing result of business personnel's audio data.

S108: when in the first set of keywords must read number that keyword occurs in client audio data not equal to time There is no art keyword or business personnel's audio datas in words art set of keywords in number threshold value or business personnel's audio data When the middle violation keyword there are in violation set of keywords, determine audio to be detected testing result be not by detection, it is raw It is prompted at amended record.

When the number that keyword occurs in client audio data that must read in the first set of keywords is not equal to number threshold When value, the testing result of client audio data is not pass through detection.When there is no words art set of keywords in business personnel's audio data Industry is determined there are when the violation keyword in violation set of keywords in art keyword or business personnel's audio data in conjunction The testing result of business person's audio data is not pass through detection.When the testing result or business personnel audio data of client audio data Testing result is when not passing through detection, and the testing result of audio to be detected is just not by detection, and server can generate amended record and mention Show, amended record prompts the reason of client and business personnel can be prompted not to pass through recording, so that client and business personnel are carrying out live amended record When, it avoids making a same mistake again.

The method of above-mentioned voice quality inspection detects client audio data according to default first set of keywords, and root Business personnel's audio data is detected according to default second set of keywords, is realized to client audio data and business personnel's sound Frequency determines the testing result of audio to be detected according to being detected respectively according to testing result, when the detection knot of audio to be detected Fruit is to generate amended record prompt when not passing through detection.In this way, during video record, in real time to each recording node Audio to be detected carry out quality inspection, realize and the dialogue in the links during video record be monitored in time, mention The high efficiency that business service process is monitored.

In one of the embodiments, as shown in Fig. 2, S106 includes:

S202: obtaining from default first set of keywords multiple must read keyword；

S204: client audio data are converted into client's lteral data；

S206: according to must respectively read keyword, client's lteral data is traversed, statistics must respectively read keyword in client's lteral data The number of middle appearance；

S208: according to the number that must respectively read keyword and occur in client's lteral data, obtain respectively reading keyword in visitor The number occurred in the audio data of family.

It must read keyword and refer to that in recording link corresponding with node is recorded, client has to the word being mentioned to, Server can be obtained from default first set of keywords it is multiple must read keyword, include in default first set of keywords It is multiple to read keyword, when according to that must read keyword and detected to client audio data, need client audio data first Client's lteral data is converted to, then further according to must respectively read keyword, traverses client's lteral data, statistics must respectively read keyword and exist The number occurred in client's lteral data.Last basis must respectively read the number that keyword occurs in client's lteral data, obtain The number that keyword occurs in client audio data must respectively be read.

Because business personnel can put question to client, and client can pass through in recording link corresponding with each recording node Keyword must be read by, which referring to, replys the enquirement of business personnel, so can examine according to that must read keyword to client audio data It surveying, the difference of number is putd question to according to business personnel in each recording link, client refers to that the number that must read keyword also can not be identical, So to determine the number that must read keyword that should refer to of client in recording link corresponding with node is recorded, i.e., with view to be detected Frequently corresponding frequency threshold value, and then comparison frequency threshold value and Ge Bi read the number that keyword occurs in client audio data, really Determine the testing result of client audio data, is only equal to number when must respectively read the number that keyword occurs in client audio data When threshold value, just it is believed that the testing result of client audio data is to pass through detection.Wherein, frequency threshold value can be according to recording node Dialog template determines.

Above-described embodiment obtains respectively reading key according to the number that must respectively read keyword and occur in client's lteral data The number that word occurs in client audio data, so that server can be according to must respectively read keyword in client audio data The number of middle appearance determines the testing result of client audio data, realizes the detection to client audio data.

In one of the embodiments, as shown in figure 3, S102 includes:

S302: the dialog template for recording node corresponding with video to be detected is obtained in real time；

S304: according to the first set of keywords, the number that keyword appearance must be respectively read in dialog template is counted；

S306: according to the number that must respectively read keyword appearance in dialog template, frequency threshold value is obtained.

The node identification that server can be carried by recording node obtains in real time from preset dialog template database With record the corresponding dialog template of node, and according to the first set of keywords obtain it is multiple must read keyword, according to must respectively read to close Key word traverses dialog template, counts the number that must respectively read keyword appearance in dialog template, must respectively read keyword in dialog template The number of appearance is exactly client in the number that must read keyword recording link and should referring to corresponding with node is recorded, i.e. number Threshold value.

Above-described embodiment obtains the dialog template for recording node corresponding with video to be detected in real time, according to the first key Word set, count dialog template in must respectively read keyword appearance number, according to must respectively be read in dialog template keyword appearance Number obtains frequency threshold value, so that server can realize the detection to client audio data according to frequency threshold value.

In one of the embodiments, as shown in figure 4, S106 includes:

S402: business personnel's audio data is converted into business personnel's lteral data；

S404: obtaining art template if recording node corresponding with video to be detected, literary from business personnel according to words art template Art information if corresponding to is extracted in digital data；

S406: obtaining words art keyword from the second set of keywords, talks about art information according to words art keyword match；

S408: violation keyword is obtained from the second set of keywords, and business personnel's text is traversed according to violation keyword Data.

Server needs to be converted to business personnel's audio data business personnel's text when detecting to business personnel's audio data Digital data obtains art template if corresponding with video to be detected recording node, according to talking about art template from business personnel's lteral data In extract it is corresponding if art information, words art keyword is obtained from the second set of keywords, words art keyword refers to business Member is in recording link corresponding with node is recorded, it is necessary to which the word being mentioned to is determined by detecting business personnel's audio data Whether business personnel refers to words art keyword, when business personnel refers to words art keyword, determines the of business personnel's audio data One testing result is to pass through detection.

Other than being detected according to words art keyword to business personnel's audio data, server also needs to close by violation Key word detects business personnel's audio data, and violation keyword can be obtained from the second set of keywords, violation keyword It refers in recording link corresponding with node is recorded, the word that business personnel cannot be mentioned to, by detecting business personnel's audio Data, determine whether business personnel does not refer to violation keyword, when business personnel does not refer to violation keyword, determine business personnel's audio Second testing result of data is to pass through detection.Only when the first testing result and the second testing result are all to pass through detection, The testing result that just can determine that business personnel's audio data is to pass through detection.

Above-described embodiment detects business personnel's audio data according to words art keyword and violation keyword, realizes Detection to business personnel's audio data.

In one of the embodiments, as shown in figure 5, after S106, further includes:

S502: when the number that keyword occurs in client audio data that must read in the first set of keywords reaches number Threshold value, and there is art keyword in words art set of keywords in business personnel's audio data, and in business personnel's audio data not There are when violation keyword in violation set of keywords, determine that the testing result of audio to be detected is to pass through detection.

When the number that keyword occurs in client audio data that must read in the first set of keywords reaches frequency threshold value When, server can determine that the testing result of client audio data is to pass through detection.Art is talked about when existing in business personnel's audio data Art keyword in set of keywords, and there is no the violation keywords in violation set of keywords in business personnel's audio data When, server can determine that the testing result of business personnel's audio data is to pass through detection.When client audio data and business personnel's sound The testing result of frequency evidence is all when passing through detection, and server is that can determine that the testing result of audio to be detected is to pass through detection.

Above-described embodiment determines audio to be detected by the testing result of client audio data and business personnel's audio data Testing result, realize the determination of the testing result to audio to be detected.

In one of the embodiments, as shown in fig. 6, S104 includes:

S602: being filtered audio to be detected, filters out noise and ambient sound in audio to be detected；

S604: according to preset voice partitioning algorithm by filtered audio segmentation to be detected be multiple audio fragments；

S606: the audio fragment of the same speaker will be belonged in multiple audio fragments according to preset voice clustering algorithm Merge, obtains business personnel's audio data and client audio data.

Because may include noise and ambient sound in audio to be detected, server is to audio to be detected When reason, it is necessary first to be filtered to audio to be detected, filter out the noise and ambient sound in audio to be detected, then use Voice partitioning algorithm and voice clustering algorithm handle filtered audio to be detected, obtain business personnel's audio data with And client audio data.Wherein, voice partitioning algorithm refers to that speaker changes detection of change-point, i.e. speaker in positioning voice data The point that identity changes.Based on common voice partitioning algorithm usually moves cut-point detection algorithm by the window of Gauss model, The distance between observe and calculate adjacent voice window, determine this two sections of voices whether from same based on threshold value or penalty factor One speaker.Wherein, threshold value or penalty factor can be obtained by acquisition training set data.It can be with by voice partitioning algorithm It only include the audio data of a people by audio segmentation to be detected at multiple audio fragments, in each audio fragment.

Voice clustering algorithm is to merge the audio fragment for belonging to the same speaker on the basis of voice partitioning algorithm Get up, common voice clustering algorithm can be divided into two classes: top-down cluster and bottom-up cluster, by what is obtained after segmentation Each audio fragment is as one kind, then according to BIC (Bayesian Information Criterions, Bayesian Information rule Then) distance continuously merges two most adjacent classes, until the merging of sound bite is no longer result in the value increase of BIC, with this Obtain two class audio frequency data.After obtaining two class audio frequency data, server can further be analyzed two class audio frequency data, be mentioned The vocal print feature for taking out two class audio frequency data matches preset business personnel's information data by the vocal print feature of two class audio frequency data Business personnel's vocal print feature in library determines business personnel's audio data in two class audio frequency data, another is client audio number According to.

Above-described embodiment is filtered audio to be detected, filters out noise and ambient sound in audio to be detected, It is that multiple audio fragments will be more using voice clustering algorithm by filtered audio segmentation to be detected using voice partitioning algorithm A audio fragment cluster is business personnel's audio data and client audio data, is realized to business personnel's audio data and client The extraction of audio data.

It should be understood that although each step in the flow chart of Fig. 1-6 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 1-6 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.

In one embodiment, as shown in fig. 7, providing a kind of device of voice quality inspection, comprising: obtain module 702, mention Modulus block 704, detection module 706 and processing module 708, in which:

Obtain module 702, for obtain during video record in real time each video to be detected for recording node and with to The corresponding frequency threshold value of video is detected, each audio to be detected for recording node is extracted from video to be detected；

Extraction module 704 is used to according to preset voice partitioning algorithm be multiple audio fragments by audio segmentation to be detected, And merged the audio fragment for belonging to the same speaker in multiple audio fragments according to preset voice clustering algorithm, obtain industry Business person's audio data and client audio data；

Detection module 706, for being detected according to default first set of keywords to client audio data, and according to pre- If the second set of keywords detects business personnel's audio data, the second set of keywords include words art set of keywords and Violation set of keywords；

Processing module 708 must read what keyword occurred in client audio data for working as in the first set of keywords There is no art keyword or industry in words art set of keywords not equal in frequency threshold value or business personnel's audio data for number There are when the violation keyword in violation set of keywords, determine that the testing result of audio to be detected is not in business person's audio data By detection, amended record prompt is generated.The device of above-mentioned voice quality inspection, according to default first set of keywords to client audio data It is detected, and business personnel's audio data is detected according to default second set of keywords, realized to client audio number Accordingly and business personnel's audio data detects respectively, the testing result of audio to be detected is determined according to testing result, when to be checked The testing result of acoustic frequency is to generate amended record prompt when not passing through detection.In this way, real during video record When to it is each record node audio to be detected carry out quality inspection, realize in time to pair in the links during video record Words are monitored, and improve the efficiency being monitored to business service process.

Detection module is also used to obtain from default first set of keywords and multiple must read to close in one of the embodiments, Client audio data are converted to client's lteral data by key word, according to must respectively read keyword, traverse client's lteral data, statistics The number that keyword occurs in client's lteral data must be respectively read, according to respectively must reading what keyword occurred in client's lteral data Number obtains respectively reading the number that keyword occurs in client audio data.

Module is obtained in one of the embodiments, is also used to obtain recording node corresponding with video to be detected in real time Dialog template counts the number that keyword appearance must be respectively read in dialog template, according to dialog template according to the first set of keywords In must respectively read keyword appearance number, obtain frequency threshold value.

Detection module is also used to being converted to business personnel's audio data into business personnel's text number in one of the embodiments, According to art template if acquisition recording node corresponding with video to be detected is mentioned from business personnel's lteral data according to words art template Art information if corresponding to is taken out, words art keyword is obtained from the second set of keywords, is believed according to words art keyword match words art Breath obtains violation keyword from the second set of keywords, and traverses business personnel's lteral data according to violation keyword.

Detection module, which is also used to work as in the first set of keywords, in one of the embodiments, must read keyword in client The number occurred in audio data reaches frequency threshold value, and there is art in words art set of keywords in business personnel's audio data Keyword, and audio to be detected is determined there is no when the violation keyword in violation set of keywords in business personnel's audio data Testing result be pass through detection.

Extraction module is also used to be filtered audio to be detected in one of the embodiments, filters out to be detected Noise and ambient sound in audio, according to preset voice partitioning algorithm by filtered audio segmentation to be detected be multiple sounds Frequency segment merges the audio fragment for belonging to the same speaker in multiple audio fragments according to preset voice clustering algorithm, Obtain business personnel's audio data and client audio data.

The specific of device about voice quality inspection limits the restriction that may refer to the method above for voice quality inspection, This is repeated no more.Modules in the device of above-mentioned voice quality inspection can come fully or partially through software, hardware and combinations thereof It realizes.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be with software Form is stored in the memory in computer equipment, executes the corresponding operation of the above modules in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 8.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment must read key data, violation key data and dialog template data for storing.The computer The network interface of equipment is used to communicate with external terminal by network connection.The computer program is executed by processor Shi Yishi A kind of method of existing voice quality inspection.

It will be understood by those skilled in the art that structure shown in Fig. 8, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, the processor perform the steps of when executing computer program

When the number that keyword occurs in client audio data that must read in the first set of keywords is not equal to number threshold There is no deposit in art keyword or business personnel's audio data in words art set of keywords in value or business personnel's audio data When violation keyword in violation set of keywords, determine that the testing result of audio to be detected is to generate and mend not by detection Record prompt.The computer equipment of above-mentioned voice quality inspection detects client audio data according to default first set of keywords, And business personnel's audio data is detected according to default second set of keywords, it realizes to client audio data and business Member's audio data is detected respectively, the testing result of audio to be detected is determined according to testing result, when the inspection of audio to be detected Surveying result is to generate amended record prompt when not passing through detection.In this way, during video record, in real time to each recording The audio to be detected of node carries out quality inspection, realizes and supervises in time to the dialogue in the links during video record Control, improves the efficiency being monitored to business service process.

In one embodiment, it is also performed the steps of when processor executes computer program

Obtaining from default first set of keywords multiple must read keyword；

Client audio data are converted into client's lteral data；

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor

When the number that keyword occurs in client audio data that must read in the first set of keywords is not equal to number threshold There is no deposit in art keyword or business personnel's audio data in words art set of keywords in value or business personnel's audio data When violation keyword in violation set of keywords, determine that the testing result of audio to be detected is to generate and mend not by detection Record prompt.The storage medium of above-mentioned voice quality inspection detects client audio data according to default first set of keywords, and Business personnel's audio data is detected according to default second set of keywords, is realized to client audio data and business personnel Audio data is detected respectively, determines the testing result of audio to be detected according to testing result, when the detection of audio to be detected As a result it is prompted when not passing through detection, to generate amended record.In this way, during video record, each recording is saved in real time The audio to be detected of point carries out quality inspection, realizes and is monitored in time to the dialogue in the links during video record, Improve the efficiency being monitored to business service process.

In one embodiment, it is also performed the steps of when computer program is executed by processor

Obtaining from default first set of keywords multiple must read keyword；

Client audio data are converted into client's lteral data；

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of method of voice quality inspection, which comprises

Each video to be detected for recording node and corresponding with the video to be detected time are obtained during video record in real time Number threshold value extracts each audio to be detected for recording node from the video to be detected；

It by the audio segmentation to be detected is multiple audio fragments according to preset voice partitioning algorithm, and according to preset voice Clustering algorithm will belong in multiple audio fragments the same speaker audio fragment merge, obtain business personnel's audio data and Client audio data；

The client audio data are detected according to default first set of keywords, and according to default second set of keywords Business personnel's audio data is detected, second set of keywords includes that words art set of keywords and violation are crucial Word set；

When the number that keyword occurs in the client audio data that must read in first set of keywords is not equal to institute It states and art keyword or institute in the words art set of keywords is not present in frequency threshold value or business personnel's audio data It states in business personnel's audio data there are when the violation keyword in the violation set of keywords, determines the audio to be detected Testing result is to generate amended record prompt not by detection.

2. the method according to claim 1, wherein the basis presets the first set of keywords to the client Audio data carries out detection

Obtaining from default first set of keywords multiple must read keyword；

The client audio data are converted into client's lteral data；

According to it is each it is described must read keyword, traverse client's lteral data, counting each described must read keyword in the client The number occurred in lteral data；

According to each number that must be read keyword and occur in client's lteral data, obtain each described to read keyword and existing The number occurred in the client audio data.

3. the method according to claim 1, wherein described real-time acquisition corresponding with the video to be detected time Counting threshold value includes:

The dialog template for recording node corresponding with the video to be detected is obtained in real time；

According to first set of keywords, the number that keyword appearance must be respectively read in the dialog template is counted；

According to the number that must respectively read keyword appearance in the dialog template, frequency threshold value is obtained.

4. the method according to claim 1, wherein the basis presets the second set of keywords to the business Member's audio data is detected, and second set of keywords includes words art set of keywords and violation set of keywords packet It includes:

Art template if recording node corresponding with the video to be detected is obtained, according to the words art template from the business personnel Art information if corresponding to is extracted in lteral data；

Words art keyword is obtained from second set of keywords, and art information is talked about according to the words art keyword match；

Violation keyword is obtained from second set of keywords, and business personnel's text is traversed according to the violation keyword Digital data.

5. the method according to claim 1, wherein the basis presets the first set of keywords to the client Audio data is detected, and after being detected according to default second set of keywords to business personnel's audio data, also Include:

When the number that keyword occurs in the client audio data that must read in first set of keywords reaches described Frequency threshold value, and there are art keyword in the words art set of keywords in business personnel's audio data, and the industry There is no when the violation keyword in the violation set of keywords in business person's audio data, the inspection of the audio to be detected is determined Surveying result is to pass through detection.

6. the method according to claim 1, wherein described will be described to be checked according to preset voice partitioning algorithm Acoustic frequency division is segmented into multiple audio fragments, and will belong to the same theory in multiple audio fragments according to preset voice clustering algorithm The audio fragment for talking about people merges, and obtains business personnel's audio data and client audio data include:

The audio to be detected is filtered, the noise and ambient sound in the audio to be detected are filtered out；

The audio fragment for belonging to the same speaker in multiple audio fragments is merged according to preset voice clustering algorithm, is obtained Business personnel's audio data and client audio data.

7. a kind of device of voice quality inspection, which is characterized in that described device includes:

Obtain module, for obtain during video record in real time each video to be detected for recording node and with it is described to be detected The corresponding frequency threshold value of video extracts each audio to be detected for recording node from the video to be detected；

Extraction module is used to according to preset voice partitioning algorithm be multiple audio fragments by the audio segmentation to be detected, and The audio fragment for belonging to the same speaker in multiple audio fragments is merged according to preset voice clustering algorithm, obtains business Member's audio data and client audio data；

Detection module, for being detected according to default first set of keywords to the client audio data, and according to default Second set of keywords detects business personnel's audio data, and second set of keywords includes words art set of keywords Conjunction and violation set of keywords；

Processing module must read what keyword occurred in the client audio data for working as in first set of keywords There is no arts in the words art set of keywords not equal in the frequency threshold value or business personnel's audio data for number There are when the violation keyword in the violation set of keywords in keyword or business personnel's audio data, determine described in The testing result of audio to be detected is to generate amended record prompt not by detection.

8. device according to claim 7, which is characterized in that the detection module is also used to from default first set of keywords Obtained in conjunction it is multiple must read keyword, the client audio data are converted into client's lteral data, described must read to close according to each Key word, traverses client's lteral data, count it is each it is described must read the number that keyword occurs in client's lteral data, According to each number that must be read keyword and occur in client's lteral data, determining each described must read keyword described The number occurred in client audio data.

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 6 the method when executing the computer program.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 6 is realized when being executed by processor.