CN114882913A

CN114882913A - Call voice quality inspection method, device, equipment and storage medium

Info

Publication number: CN114882913A
Application number: CN202210517255.XA
Authority: CN
Inventors: 于凤英; 王健宗; 程宁
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-08-09

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a call voice quality inspection method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring call voice, and converting the call voice into a call text, wherein the call text comprises a client call text and an agent reply text; inputting the call text into a pre-trained text detection model to obtain a target text segment comprising sensitive content; performing detail verification on the target text segment based on a preset violation detection mode; if the target text segment is illegal, confirming the illegal type corresponding to the target text segment as a target illegal type; and generating quality inspection result information according to the target text segment and the target violation type. According to the invention, the text fragments which are certainly possible to have violation in the communication between the agent and the client are preliminarily screened through the model, and then the detail verification is carried out on the text fragments based on a preset detection mode, so that the accuracy of the quality inspection result is improved.

Description

Call voice quality inspection method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for quality inspection of call voice.

Background

At present, a call center has become an important way for enterprises to provide online comprehensive service information, and an agent of the call center develops outgoing or incoming telephone traffic services according to the needs of the enterprises to accept services such as opinion feedback, consultation suggestions and the like of clients, or performs services such as market research, telephone sales, after-sales tracking and the like on products of the enterprises. In the process of the call center developing the above-mentioned telephone traffic service, the normalization and the speciality of the call content between the agent and the customer need to be ensured, and the quality and the efficiency of the telephone traffic service provided by the agent can be ensured, so the quality inspection and the control of the call content are particularly important. For the quality inspection, in the prior art, a mode of manually listening to call records is usually adopted to perform quality inspection evaluation, which needs to allocate a corresponding number of quality inspectors to a call center, and the mode has the disadvantages of high manpower demand cost, large quality inspection workload, low efficiency and low coverage, and is difficult to effectively evaluate the overall service quality.

Disclosure of Invention

The application provides a call voice quality inspection method, a call voice quality inspection device, a call voice quality inspection equipment and a storage medium, and aims to solve the problems of high labor cost and low efficiency of the existing call voice quality inspection.

In order to solve the technical problem, the application adopts a technical scheme that: a method for testing quality of call voice is provided, which comprises the following steps: acquiring call voice, and converting the call voice into a call text, wherein the call text comprises a client call text and an agent reply text; inputting the call text into a pre-trained text detection model to obtain a target text segment comprising sensitive content; performing detail verification on the target text segment based on a preset violation detection mode; if the target text segment is illegal, confirming the illegal type corresponding to the target text segment as a target illegal type; and generating quality inspection result information according to the target text segment and the target violation type.

As a further improvement of the application, the method for verifying the details of the target text segment based on the preset violation detection mode comprises the following steps: acquiring preset violation detection modes corresponding to multiple preset violation types; and respectively carrying out detail verification on the target text segment in a preset violation detection mode corresponding to each violation type.

As a further improvement of the present application, the text detection model includes a first coding layer, a second coding layer and a softmax layer; inputting the call text into a pre-trained text detection model to obtain a target text segment including sensitive content, wherein the method comprises the following steps: randomly initializing a call text based on positive space distribution to obtain a position vector; inputting the call text into a first coding layer for coding to obtain a sentence vector; splicing the sentence vector and the position vector to obtain a spliced vector; encoding the spliced vectors by using a second encoding layer to obtain comprehensive vectors; and inputting the comprehensive vector into a softmax layer for prediction to obtain a target text segment.

As a further improvement of the present application, the method obtains a call voice, and converts the call voice into a call text, where the call text includes a client call text and an agent reply text, and includes: acquiring call voice; carrying out voiceprint feature recognition on each section of voice in the call voice to obtain a client voice section and a seat voice section; and converting the customer voice segment into a customer call text, and converting the seat voice segment into a seat reply text.

As a further development of the application, it is characterized in that the violation type comprises an information-unrevealed class violation; based on a preset violation detection mode, detail verification is carried out on the target text fragment, and the method comprises the following steps: detecting whether the target text segment meets preset characteristics or not; if the preset characteristics are met, detecting whether text content related to the preset characteristics exists in the target text segment; if no relevant text content exists, the target text segment triggering information does not inform the class violation.

As a further improvement of the present application, the violation types include information informing of incomplete or error-like violations; based on a preset violation detection mode, performing detail verification on the target text segment, wherein the detail verification comprises the following steps: acquiring a previous text segment and a next text segment of a target text segment, forming the previous text segment and the target text segment into a first text segment, forming the target text segment and the next text segment into a second text segment, and forming the previous text segment, the target text segment and the next text segment into a third text segment; respectively vectorizing the target text segment, the first text segment, the second text segment and the third text segment to obtain a target vector, a first vector, a second vector and a third vector; respectively inputting the target vector, the first vector, the second vector and the third vector into a BERT model for coding, and then respectively matching the target vector, the first vector, the second vector and the third vector with preset violation vectors in a preset violation vector library one by one to obtain matching degrees respectively corresponding to the target vector, the first vector, the second vector and the third vector; and selecting the final text content corresponding to the vector with the highest matching degree, and outputting final text content triggering information to inform incomplete or error violation.

As a further refinement of the present application, the violation type includes a superclass information notification class violation; based on a preset violation detection mode, performing detail verification on the target text segment, wherein the detail verification comprises the following steps: respectively matching with the target text segments by using a preset rule; when matching is successful, the target text segment triggers the superclass information to inform the class violation.

In order to solve the above technical problem, another technical solution adopted by the present application is: provided is a call voice quality inspection device including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring call voice and converting the call voice into a call text, and the call text comprises a client call text and an agent reply text; the screening module is used for inputting the call text into a pre-trained text detection model to obtain a target text segment comprising sensitive content; the inspection module is used for carrying out detail verification on the target text segment based on a preset violation detection mode; the confirming module is used for confirming the violation type corresponding to the target text segment as the target violation type when the target text segment violates the rule; and the output module is used for generating quality inspection result information according to the target text segment and the target violation type.

In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a computer device comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions which, when executed by the processor, cause the processor to perform the steps of the call voice quality testing method as claimed in any one of the preceding claims.

In order to solve the above technical problem, the present application adopts another technical solution that: a storage medium is provided, which stores program instructions for implementing the call voice quality inspection method.

The beneficial effect of this application is: the call voice quality inspection method of the application screens out target text segments comprising sensitive contents from call texts by converting the voice of the conversation between the seat and the client into the call texts and inputting the call texts into a pre-trained text detection model, the method is not limited to the content replied by the agent, and comprehensively screens the conversation content of the client in combination, so that the target text segment including the sensitive content obtained by screening is more accurate, then, the violation detection is carried out on the target text fragment based on a preset detection mode, whether the violation content really exists in the target text fragment is further confirmed, and the violation type of the target text segment with violation content can be identified, so that the finally output call voice quality inspection result is more visual, and quality inspection personnel can be helped to quickly confirm the quality and effect of the call voice replied by the seat.

Drawings

FIG. 1 is a flow chart of a call voice quality inspection method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a text detection model according to an embodiment of the present invention;

FIG. 3 is a specific flowchart of step S102 in FIG. 1;

FIG. 4 is a specific flowchart of step S103 in FIG. 1;

FIG. 5 is another detailed flowchart of step S103 in FIG. 1

FIG. 6 is another detailed flowchart of step S103 in FIG. 1

FIG. 7 is a functional block diagram of a call voice quality inspection apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. All directional indications (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Fig. 1 is a flowchart illustrating a call voice quality inspection method according to an embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the method comprises the steps of:

step S101: and acquiring call voice, and converting the call voice into a call text, wherein the call text comprises a client call text and an agent reply text.

Specifically, when an agent of a call center accesses a call or dials a call, once a telephone traffic connection between the agent and a customer is established, the agent and the customer usually perform voice communication around a certain subject, and the subject usually includes product sales, after-sales support, product research, product consultation, user trouble answering, complaint suggestion and the like. In step S101, when the seat and the client successfully establish the telephone traffic connection, recording the call between the seat and the client is started, so as to obtain the call voice between the seat and the client, and then converting the call voice into a call text. Wherein the Speech conversion to text is realized based on an Automatic Speech Recognition model (ASR).

Further, after the call voice is acquired, the client voice and the agent reply voice in the call voice need to be recognized, so the step S101 specifically includes:

1. and acquiring call voice.

2. And carrying out voiceprint feature recognition on each section of voice in the call voice to obtain a client voice section and a seat voice section.

3. And converting the customer voice segment into a customer call text, and converting the seat voice segment into a seat reply text.

When the voice recognition model is used for voice recognition, voiceprint features in call voice need to be recognized firstly, generally, voice calls are generally conducted between two people, wherein the voice calls comprise two voiceprint features, voice fragments belonging to different voiceprint features are divided, so that voice fragments belonging to clients and voice fragments belonging to seats are divided, position labels can be printed on the voice fragments according to positions of the voice fragments, and the voice fragments are obtained by dividing according to alternate time points when the clients communicate with the seats. And then respectively inputting the client voice fragment and the seat voice fragment with the position labels to a voice recognition model so as to obtain a client text fragment and a seat text fragment, sequentially splicing the client text fragment and the seat text fragment according to the position labels so as to obtain a complete conversation text, marking the client text fragment in the conversation text with the client label, and marking the seat text fragment with the seat label.

It should be noted that the collected call voice may include sounds other than the human voices made by the agent and the client, for example, sounds in the environment where the terminal device is located. In this case, to avoid noise interference of sounds other than the human voice in the call voice, the call voice may be subjected to a voice enhancement process to eliminate noise in the call voice. When speech enhancement is carried out on call speech, an unvoiced speech enhancement algorithm can be selected according to requirements so as to eliminate noise and keep clearer voice. Note that since the human voice is generally translated in the translation process, all sounds other than the human voice may be defined as noise. When the voice enhancement is carried out, all noise in the call voice can be eliminated.

Step S102: and inputting the call text into a pre-trained text detection model to obtain a target text segment comprising sensitive content.

It should be noted that the text detection model is obtained by training based on a sample acquired in advance, where the sample includes a text with sensitive content and a text without sensitive content, and the sensitive content is set in advance.

Specifically, after the call text is obtained, the call text is input to a text detection model to detect the content of each text segment in the text, so that a target text segment including sensitive content is screened out from all the text segments. The sensitive content refers to content in which violation easily occurs, for example, when an agent informs a customer that a text segment relates to income information of a financial product, the sensitive content exists in the text segment, when the income information is informed according to expected income of the product, the informed content is considered to be correct, and when the income information seriously exceeds the expected income of the product, the agent is considered to exaggerate the income of the product, and violation behavior exists.

Further, in this embodiment, referring to fig. 2, the text detection model includes a first coding layer, a second coding layer and a softmax layer. The first coding layer and the second coding layer are realized on the basis of an Encoder layer of a Transformer model, the first coding layer is used for carrying out sentence coding on a call text to obtain a sentence vector, and the second coding layer is used for carrying out full-quantity coding on a splicing vector consisting of the sentence vector and a position vector obtained by initializing the call text. It should be understood that the two coding layers in this embodiment are used for obtaining sentence vectors of a call text through the first coding layer, splicing each sentence vector with a position vector generated according to the call text, and then performing full-scale coding on the spliced vectors, so that the text detection model combines the sentence vectors and the position vectors to perform comprehensive and comprehensive judgment, and the recognition capability of the text detection model on a target text segment with sensitive content is improved.

Specifically, referring to fig. 3, in step S102, inputting a call text into a pre-trained text detection model to obtain a target text segment including sensitive content, specifically including:

step S201: and randomly initializing the call text based on the positive space distribution to obtain a position vector.

Step S202: and inputting the call text into the first coding layer for coding to obtain a sentence vector.

Step S203: and splicing the sentence vector and the position vector to obtain a spliced vector.

Step S204: and coding the spliced vectors by utilizing the second coding layer to obtain comprehensive vectors.

Step S205: and inputting the comprehensive vector into a softmax layer for prediction to obtain a target text segment.

Further, the training process of the text detection model comprises:

1. obtaining a training sample, and performing random initialization on the training sample based on positive distribution to obtain a sample position vector, wherein a sensitive label is marked on a text with sensitive content in the training sample, and a normal label is marked on a text without sensitive content in the training sample.

2. Inputting the training sample to a first coding layer of a text detection model for coding to obtain a sample sentence vector;

3. and splicing the sentence vectors and the position vectors to obtain a spliced sample vector.

4. And coding the sample splicing vector by using a second coding layer of the text detection model to obtain a sample comprehensive vector.

5. And inputting the comprehensive sample vector into a softmax layer of the text detection model for prediction to obtain a sample prediction result.

6. And reversely updating parameters of the text detection model according to the sample prediction result, the label of the training sample and a preset loss function until the precision of the text detection model reaches a preset requirement or the training times reaches the preset requirement.

Step S103: and performing detail verification on the target text segment based on a preset violation detection mode.

In step S103, after the target text segment is obtained, violation detection is performed on the target text segment based on a preset violation detection method, so as to determine whether a violation exists in the target text segment. The violation detection mode is designed according to violation types appearing in historical call records of clients and seats, and has the function of performing fine-grained detection on the text segment to confirm whether the text segment has a violation or not.

It should be understood that, in this embodiment, the violation types may be obtained according to violation types appearing in the historical call records of the customer and the seat, and therefore, the number of the violation types is more than one, and in order to reasonably detect each violation type, the detection mode of each violation type is separately designed according to the appearance mode of the violation type, so that a preset violation detection mode corresponding to each violation type is obtained.

Therefore, in some embodiments, the step S103 specifically includes:

1. and acquiring preset violation detection modes corresponding to multiple preset violation types.

2. And respectively carrying out detail verification on the target text segment in a preset violation detection mode corresponding to each violation type.

It should be understood that contents belonging to different violation types may exist in the target text segment, and therefore, after the target text segment is obtained, each target text segment is verified according to each preset violation detection mode, so that whether the target text segment violates the violation is determined. Step S104: and if the target text segment is illegal, confirming the illegal type corresponding to the target text segment as the target illegal type.

Specifically, the violation types include information-unreported class violations, information-posted incomplete or error class violations, and super-class information-posted class violations. The fact that the agent must inform the client of relevant information when the call relates to the content of a certain section is considered, the fact that the agent replies triggers the fact that the agent does not inform the class violation when the agent does not inform the relevant information in time is considered, for example, when the agent communicates with the client to communicate a certain insurance product, the agent is required to inform the client of the information such as the name, the content, the insurance premium and the like of the insurance product, and when the information is not detected in a relevant text segment in a call text, the fact that the relevant text segment has the information and does not inform the class violation. The incomplete information notification or the error violation is that when the call relates to the content of a certain block, the agent must notify the client of relevant information, and when the relevant information notified by the agent is complete or wrong, the agent reply is considered to trigger the incomplete information notification or the error violation, for example, when a certain insurance product is communicated with the client in a call, the agent reply only notifies the client of the name and the content of the insurance product, when the insurance premium information is not notified, the agent reply information is considered to notify the incomplete information, and when the benefit of the insurance product returned by the agent exceeds the expected benefit, the agent reply information is considered to notify the error. The super-class information notification violation refers to that when a seat uses a word or a sentence prohibited from being used in communication with a client, or when an incorrect suggestion is provided to the client in violation of a rule, the seat reply is considered to trigger the super-class information notification violation, for example, when insurance is reported, the client is prompted to falsify own information.

Further, referring to fig. 4, for the fact that the type violation is not notified by the information, in step S103, the detail verification is performed on the target text segment based on a preset violation detection manner, which specifically includes:

step S301: and detecting whether the target text segment meets the preset characteristics. If the preset characteristics are met, executing step S302; and if the rule is not in accordance with the preset characteristics, detecting other violation types.

Specifically, the preset feature is preset according to actual conditions, for example, when a customer consults a use limit or a contraindication of a product during product telemarketing, the seat needs to inform applicable people, cautions, use situations and the like of the product, and the "use limit or the contraindication of the product" is the preset feature.

Step S302: and detecting whether text content related to the preset features exists in the target text segment. If there is no related text content, step S303 is executed.

Specifically, when the target text segment is confirmed to accord with the preset features, preset text content corresponding to the preset features is obtained, and the preset text content is matched with text content in the target text segment, so that whether text content related to the preset features exists in the target text segment is confirmed.

Step S303: the target text segment trigger information does not inform of the class violation.

Specifically, when the text content related to the preset feature does not exist in the target text segment, it is considered that the trigger information of the target text segment does not inform the class violation.

Further, referring to fig. 5, for incomplete information notification or violation of error class, in step S103, based on a preset violation detection mode, the detail verification of the target text segment specifically includes:

step S401: the method comprises the steps of obtaining a previous text segment and a next text segment of a target text segment, combining the previous text segment and the target text segment to obtain a first text segment, combining the target text segment and the next text segment to obtain a second text segment, and combining the previous text segment, the target text segment and the next text segment to obtain a third text segment.

Specifically, in this embodiment, a sliding window with a size of 3 is used to perform context association on the target text segment, so as to obtain a previous text segment and a next text segment associated with the target text.

Step S402: and respectively vectorizing the target text segment, the first text segment, the second text segment and the third text segment to obtain a target vector, a first vector, a second vector and a third vector.

Specifically, the target text segment, the first text segment, the second text segment, and the third text segment are respectively subjected to vectorization operation to obtain a target vector corresponding to the target text segment, a first vector corresponding to the first text segment, a second vector corresponding to the second text segment, and a third vector corresponding to the third text segment. Text vectorization is to represent text into a series of vectors capable of expressing text semantics. In this embodiment, text vectorization is implemented based on a word2vec model.

Step S403: and respectively inputting the target vector, the first vector, the second vector and the third vector into a BERT model for coding, and then respectively matching the target vector, the first vector, the second vector and the third vector with preset violation vectors in a preset violation vector library one by one to obtain matching degrees respectively corresponding to the target vector, the first vector, the second vector and the third vector.

Specifically, a target vector, a first vector, a second vector and a third vector are respectively input into the BERT model for encoding, and then the encoding results are respectively matched with the preset violation vectors in the preset violation vector library one by one to obtain a target matching degree corresponding to the target vector, a first matching degree corresponding to the first vector, a second matching degree corresponding to the second vector and a third matching degree corresponding to the third vector. Wherein the predetermined violation vector library is pre-established.

Step S404: and selecting the final text content corresponding to the vector which exceeds the preset matching degree threshold value and has the highest matching degree, and outputting final text content triggering information to inform incomplete or wrong violation.

Specifically, whether the target matching degree, the first matching degree, the second matching degree and the third matching degree exceed a preset matching degree threshold value is judged. If one matching degree exceeds a preset matching degree threshold value, confirming the vector corresponding to the matching degree, confirming the final text content corresponding to the vector, and finally outputting final text content triggering information to inform incomplete or wrong violation. If two or more matching degrees exceed a preset matching degree threshold value, selecting the final text content corresponding to the vector with the matching degree to the highest degree, and outputting final text content triggering information to inform incomplete or wrong violation. If the matching degree exceeding the preset matching degree threshold does not exist, the target text segment, the first text segment, the second text segment and the third text segment do not trigger information to inform incomplete or wrong violation.

In the embodiment, the violation point detection is performed by adopting a sliding window strategy and fusing the characteristic information of the context, and the context information between the seat and the customer is fully utilized, so that the accuracy of the violation detection result is improved.

Further, referring to fig. 6, for a class violation notified by the meta-information, in step S103, based on a preset violation detection mode, the detail verification is performed on the target text segment, which specifically includes:

step S501: and respectively matching with the target text segments by using preset rules.

Specifically, the preset rule includes at least one of regular matching, keyword matching, and the like. And when the regular matching is carried out, a regular template belonging to the content of the super-class information notification violation class is constructed in advance, after a target text segment is obtained, the target text segment is converted into a regular expression, and then the regular expression is matched with the regular template. When the keywords are matched, keywords belonging to illegal contents, such as 'price counterfeiting' and 'false identity', are specified in advance, and then words in the target text segment are matched with the keywords respectively.

Step S502: when matching is successful, the target text segment triggers the superclass information to inform the class violation.

Step S105: and generating quality inspection result information according to the target text segment and the target violation type.

Specifically, after the target text segment and the target violation type of the violation are obtained, quality inspection result information is generated based on the target text segment and the target violation type of the violation. The quality inspection result information comprises target text segment content and a target violation type, and can be highlighted in the call text for conveniently viewing the quality inspection result, for example, the target text segment with the violation content is displayed by adopting a color different from other text segments, and a note is added to the target text segment, wherein the note content is the target violation type.

Further, in order to further evaluate the call quality of the agent, in this embodiment, after obtaining the target text segment of the violation and the violation type, the method further includes:

obtaining a deduction value corresponding to each violation type;

confirming the occurrence frequency of each violation type in the violation target text segment;

calculating to obtain a total deduction value according to the occurrence times and the deduction value of each violation type, and subtracting the total deduction value by using a preset full point to obtain a scoring record of the current seat call;

counting all scoring records of the current seat within a period of time and taking a mean value to obtain a scoring mean value;

and confirming the call quality of the current seat according to the grading mean value. Specifically, the grade ranges are divided, each grade range corresponds to one grade, and the conversation quality of the current seat can be confirmed according to the grade to which the grading mean value belongs.

The call voice quality inspection method of the embodiment of the invention converts the voice of the conversation between the agent and the client into the call text, and then inputs the call text into the pre-trained text detection model, thereby screening the target text segment comprising the sensitive content from the call text, the method is not limited to the content replied by the agent, and comprehensively screens the conversation content of the client in combination, so that the target text segment including the sensitive content obtained by screening is more accurate, then, the violation detection is carried out on the target text fragment based on a preset detection mode, whether the violation content really exists in the target text fragment is further confirmed, and the violation type of the target text segment with violation content can be identified, so that the finally output call voice quality inspection result is more visual, and quality inspection personnel can be helped to quickly confirm the quality and effect of the call voice replied by the seat.

Fig. 7 is a functional block diagram of a call voice quality inspection apparatus according to an embodiment of the present invention. As shown in fig. 7, the call voice quality inspection apparatus 60 includes an acquisition module 61, a screening module 62, a checking module 63, a confirmation module 64, and an output module 65.

The obtaining module 61 is configured to obtain a call voice, and convert the call voice into a call text, where the call text includes a client call text and an agent reply text;

the screening module 62 is configured to input the call text into a pre-trained text detection model to obtain a target text segment including sensitive content;

the inspection module 63 is used for performing detail verification on the target text segment based on a preset violation detection mode;

the confirming module 64 is configured to, when the target text segment violates a rule, confirm the violation type corresponding to the target text segment as the target violation type;

and the output module 65 is configured to generate quality inspection result information according to the target text segment and the target violation type.

Optionally, the operation of the checking module 63 performing detail verification on the target text segment based on a preset violation detection mode specifically includes: acquiring preset violation detection modes corresponding to multiple preset violation types; and respectively carrying out detail verification on the target text segment in a preset violation detection mode corresponding to each violation type.

Optionally, the text detection model includes a first encoding layer, a second encoding layer, and a softmax layer; the operation of the screening module 62 for inputting the call text into the pre-trained text detection model to obtain the target text segment including the sensitive content specifically includes: randomly initializing a call text based on positive space distribution to obtain a position vector; inputting the call text into a first coding layer for coding to obtain a sentence vector; splicing the sentence vector and the position vector to obtain a spliced vector; encoding the spliced vectors by using a second encoding layer to obtain comprehensive vectors; and inputting the comprehensive vector into a softmax layer for prediction to obtain a target text segment.

Optionally, the obtaining module 61 executes the operation of obtaining the call voice and converting the call voice into a call text, where the operation of the call text including the client call text and the agent reply text specifically includes: acquiring call voice; carrying out voiceprint feature recognition on each section of voice in the call voice to obtain a client voice section and a seat voice section; and converting the customer voice segment into a customer call text, and converting the seat voice segment into a seat reply text.

Optionally, the violation type includes information that does not inform a class violation; the operation of the checking module 63, which is executed based on the preset violation detection mode and performs detail checking on the target text segment, specifically includes: detecting whether the target text segment meets preset characteristics or not; if the preset characteristics are met, detecting whether text content related to the preset characteristics exists in the target text segment; and if no related text content exists, the triggering information of the target text segment does not inform the class violation.

Optionally, the violation type includes information telling an incomplete or error-like violation; the operation of the checking module 63, which is executed based on the preset violation detection mode and performs detail checking on the target text segment, specifically includes: acquiring a previous text segment and a next text segment of a target text segment, forming the previous text segment and the target text segment into a first text segment, forming the target text segment and the next text segment into a second text segment, and forming the previous text segment, the target text segment and the next text segment into a third text segment; respectively vectorizing the target text segment, the first text segment, the second text segment and the third text segment to obtain a target vector, a first vector, a second vector and a third vector; respectively inputting the target vector, the first vector, the second vector and the third vector into a BERT model for coding, and then respectively matching the target vector, the first vector, the second vector and the third vector with preset violation vectors in a preset violation vector library one by one to obtain matching degrees respectively corresponding to the target vector, the first vector, the second vector and the third vector; and selecting the final text content corresponding to the vector with the highest matching degree, and outputting final text content triggering information to inform incomplete or error violation.

Optionally, the violation type includes a superclass information notification class violation; the operation of the checking module 63, which is executed based on the preset violation detection mode and performs detail checking on the target text segment, specifically includes: respectively matching with the target text segments by using a preset rule; when matching is successful, the target text segment triggers the superclass information to inform class violation.

For other details of the technical solution implemented by each module in the call voice quality inspection apparatus in the above embodiment, reference may be made to the description of the call voice quality inspection method in the above embodiment, and details are not repeated here.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 8, the computer device 70 includes a processor 71 and a memory 72 coupled to the processor 71, wherein the memory 72 stores program instructions, and the program instructions, when executed by the processor 71, cause the processor 71 to execute the steps of the call voice quality inspection method according to any of the embodiments.

The processor 71 may also be referred to as a CPU (Central Processing Unit). The processor 71 may be an integrated circuit chip having signal processing capabilities. The processor 71 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores program instructions 81 capable of implementing all the methods described above, where the program instructions 81 may be stored in the storage medium in the form of a software product, and include several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or computer equipment, such as a computer, a server, a mobile phone, and a tablet.

In the several embodiments provided in the present application, it should be understood that the disclosed computer apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. A call voice quality inspection method is characterized by comprising the following steps:

acquiring call voice, and converting the call voice into a call text, wherein the call text comprises a client call text and an agent reply text;

performing detail verification on the target text segment based on a preset violation detection mode;

if the target text segment is illegal, confirming the illegal type corresponding to the target text segment as a target illegal type;

and generating quality inspection result information according to the target text segment and the target violation type.

2. The call voice quality inspection method according to claim 1, wherein the detail verification of the target text segment based on a preset violation detection mode includes:

acquiring preset violation detection modes corresponding to multiple preset violation types;

and respectively carrying out detail verification on the target text segment in a preset violation detection mode corresponding to each violation type.

3. The call voice quality inspection method according to claim 1, wherein the text detection model includes a first coding layer, a second coding layer and a softmax layer; inputting the call text into a pre-trained text detection model to obtain a target text segment including sensitive content, wherein the method comprises the following steps:

randomly initializing the call text based on positive space distribution to obtain a position vector;

inputting the call text into the first coding layer for coding to obtain a sentence vector;

splicing the sentence vector and the position vector to obtain a spliced vector;

encoding the splicing vector by using the second encoding layer to obtain a comprehensive vector;

and inputting the comprehensive vector into the softmax layer for prediction to obtain the target text segment.

4. The call voice quality inspection method according to claim 1, wherein the obtaining call voice and converting the call voice into call text, the call text including a customer call text and an agent reply text, comprises:

acquiring the call voice;

carrying out voiceprint feature recognition on each section of voice in the call voice to obtain a client voice section and a seat voice section;

and converting the customer voice segment into the customer call text, and converting the seat voice segment into the seat reply text.

5. The call voice quality inspection method according to claim 1, wherein the violation type includes a non-information-informed class violation; the detail verification of the target text segment based on the preset violation detection mode comprises the following steps:

detecting whether the target text segment meets preset characteristics or not;

if the target text segment meets the preset characteristics, detecting whether text content related to the preset characteristics exists in the target text segment;

and if no related text content exists, triggering the information by the target text segment to not inform the class violation.

6. The call voice quality inspection method according to claim 1, wherein the violation type includes information informing of incomplete or error-type violations; the detail verification of the target text segment based on the preset violation detection mode comprises the following steps:

acquiring a previous text segment and a next text segment of the target text segment, forming the previous text segment and the target text segment into a first text segment, forming the target text segment and the next text segment into a second text segment, and forming the previous text segment, the target text segment and the next text segment into a third text segment;

vectorizing the target text segment, the first text segment, the second text segment and the third text segment respectively to obtain a target vector, a first vector, a second vector and a third vector;

inputting the target vector, the first vector, the second vector and the third vector into a BERT model respectively for encoding, and then matching the target vector, the first vector, the second vector and the third vector with preset violation vectors in a preset violation vector library one by one to obtain matching degrees corresponding to the target vector, the first vector, the second vector and the third vector respectively;

and selecting the final text content corresponding to the vector with the highest matching degree, and outputting the final text content to trigger the information to inform incomplete or wrong violation.

7. The call voice quality inspection method according to claim 1, wherein the violation type includes a superclass information notification class violation; the detail verification is carried out on the target text segment based on a preset violation detection mode, and the method comprises the following steps:

respectively matching the target text segments by using a preset rule;

and when the matching is successful, the target text segment triggers the super-outline information to inform class violation.

8. A call voice quality inspection device, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring call voice and converting the call voice into a call text, and the call text comprises a client call text and an agent reply text;

the screening module is used for inputting the call text into a pre-trained text detection model to obtain a target text segment comprising sensitive content;

the inspection module is used for carrying out detail inspection on the target text segment based on a preset violation detection mode and carrying out detail inspection on the target text segment based on the preset violation detection mode;

the confirming module is used for confirming the violation type corresponding to the target text segment as the target violation type when the target text segment violates the rule;

and the output module is used for generating quality inspection result information according to the target text segment and the target violation type.

9. A computer device comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions that, when executed by the processor, cause the processor to perform the steps of the call voice quality testing method according to any one of claims 1-7.

10. A storage medium storing program instructions capable of implementing the call voice quality inspection method according to any one of claims 1 to 7.