CN113270114A

CN113270114A - Voice quality inspection method and system

Info

Publication number: CN113270114A
Application number: CN202110810862.0A
Authority: CN
Inventors: 张�杰; 于皓; 王展; 吴信东
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-08-17

Abstract

The application provides a voice quality inspection method and a system, and relates to the technical field of computers, wherein the method is used for carrying out voice recognition on a voice file to obtain a voice text; generating topic labels by sentence according to the character content of the voice text; drawing an actual conversation process between the user and the customer service according to the topic labels generated sentence by sentence; calculating the similarity between the actual conversation process and the customer service standard conversation process; and acquiring a quality inspection result of the voice file based on the similarity. Before the similarity between the actual conversation process and the customer service standard conversation process is calculated, redundancy of the actual conversation process is eliminated, and the customer service standard conversation process is pruned and serialized, so that more accurate similarity is calculated. The voice quality inspection method can greatly reduce the cost of voice quality inspection of customer service personnel by a customer service department and improve the accuracy of voice quality inspection.

Description

Voice quality inspection method and system

Technical Field

The application belongs to the technical field of computers, and particularly relates to a voice quality inspection method and system.

Background

In order to improve market competitiveness and increase user satisfaction, enterprises often set up special customer service departments, but the management difficulty of customer service personnel is higher. A large amount of voice data can be generated in the daily customer service process, and if the voice data of each customer service person is manually subjected to quality inspection to judge whether the customer service person communicates with a user according to a standard conversation process, a large amount of manpower can be consumed, so that the form of spot inspection is only adopted.

It can be seen that there is a contradiction in the management of customer service personnel: quality inspection cost and quality inspection range. If the quality inspection coverage is high, the labor cost is correspondingly increased; if the quality inspection coverage is low, the service quality and the operation normalization of customer service are difficult to guarantee.

With the development of voice recognition technology, automatic voice quality inspection becomes possible, at present, in some customer service departments, through presetting a keyword list, such as 'complaint', 'refund', 'cheat', and the like, then detecting whether a keyword appears in call records of customers and customer services, and transferring records triggering the keyword to quality inspection personnel for key screening, the method can achieve full-person coverage, but is not accurate enough, and cannot judge pre-sale or post-sale service flows.

Disclosure of Invention

In view of the above, the present application provides a voice quality inspection method and system, so as to reduce the cost of voice quality inspection performed on the customer service personnel by the customer service department and improve the accuracy of voice quality inspection.

In a first aspect, an embodiment of the present application provides a voice quality inspection method, including the following steps:

carrying out voice recognition on the voice file to obtain a voice text; wherein the voice file comprises a voice conversation between a user and a customer service;

generating topic labels by sentence according to the character content of the voice text;

drawing an actual conversation process between the user and the customer service according to the topic labels generated sentence by sentence;

calculating the similarity between the actual conversation process and the customer service standard conversation process;

and acquiring a quality inspection result of the voice file based on the similarity.

In one possible embodiment, performing speech recognition on a speech file to obtain a speech text includes the following steps:

determining the language type of the voice file based on the telephone number attribution or the service hall location;

acquiring a pronunciation dictionary and a language model corresponding to the language type;

performing feature extraction on the voice file based on the pronunciation dictionary and the language model to obtain voice features;

inputting the voice features into an encoder of a voice recognition model for encoding to obtain an initial voice text;

performing text post-processing on the initial voice text to obtain a voice text; wherein the text post-processing includes spoken language correction and punctuation prediction.

In one possible implementation, generating topic labels from the text content of the voice text sentence by sentence, includes the following steps:

generating topic labels for the text contents of the voice text sentence by a dictionary-based method, a regular expression-based method or a statistical learning model-based method; wherein the categories of the topic tags include condition tags, attribute tags, and flow tags, the condition tags including age, region, and occupation; the process label comprises a question, a product introduction, a chatting, a next question and a track.

In a possible implementation, the actual dialog flow between the user and the customer service is drawn according to the topic labels generated sentence by sentence, and the method comprises the following steps:

rejecting worthless topic labels in the topic labels; wherein the worthless topic tag comprises a chatting flow tag with the frequency not exceeding a preset threshold value;

merging adjacent and identical topic tags;

drawing an actual conversation process between the user and the customer service according to the removed and combined topic labels;

segmenting the actual conversation process at the position corresponding to the label indicating the new process; wherein the predictive new flow label comprises a next problem flow label.

In a possible implementation manner, the similarity between the actual conversation process and the customer service standard conversation process is calculated, and the method comprises the following steps:

determining a process node corresponding to the actual conversation process and the customer service standard conversation process;

judging whether the flow nodes in the customer service standard conversation flow are multi-branch flow nodes or not, if so, eliminating redundant branch flows of the customer service standard conversation flow at the multi-branch flow nodes; the redundant branch flows are the rest branch flows except the branch flows corresponding to the actual conversation flow and the customer service standard conversation flow;

expanding parallel flows in the customer service standard conversation flows to generate a plurality of serialized customer service standard conversation flows;

and respectively calculating the similarity between a plurality of serialized customer service standard conversation processes and the actual conversation process.

In a possible implementation manner, the obtaining of the quality inspection result of the voice file based on the similarity includes the following steps:

judging whether the highest value of the similarity is lower than a similarity threshold value or not;

if so, judging that the voice quality inspection is unqualified;

if not, the voice quality inspection is judged to be qualified.

In a second aspect, an embodiment of the present application further provides a voice quality inspection system, including:

the voice recognition module is used for carrying out voice recognition on the voice file to obtain a voice text; wherein the voice file comprises a voice conversation between a user and a customer service;

the topic label generating module is used for generating topic labels by sentence according to the text content of the voice text;

the flow drawing module is used for drawing an actual conversation flow between the user and the customer service according to the topic labels generated sentence by sentence;

and the calculation module is used for calculating the similarity between the actual conversation process and the customer service standard conversation process and acquiring the quality inspection result of the voice file based on the similarity.

In one possible implementation, the flow rendering module includes:

the redundancy removing unit is used for removing the worthless topic labels in the topic labels, combining the adjacent and same topic labels and segmenting the actual conversation process at the position corresponding to the label indicating a new process; wherein the worthless topic tag comprises a chatting flow tag with the frequency not exceeding a preset threshold value; the predictive new flow label includes a next problem flow label.

The graph matching unit is used for eliminating redundant branch flows of the customer service standard conversation flows at the multi-branch flow node and expanding parallel flows in the customer service standard conversation flows to generate a plurality of serialized customer service standard conversation flows; the redundant branch flows are the rest branch flows except the branch flows corresponding to the actual conversation flow and the standard flow.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.

In a fourth aspect, this embodiment of the present application further provides a storage medium storing program instructions executable by a processor, where the program instructions are configured to perform the steps in the first aspect described above or any one of the possible implementation manners of the first aspect.

The voice quality inspection method provided by the embodiment of the application carries out voice recognition on a voice file to obtain a voice text; generating topic labels by sentence according to the character content of the voice text; drawing an actual conversation process between the user and the customer service according to the topic labels generated sentence by sentence; calculating the similarity between the actual conversation process and the customer service standard conversation process; and acquiring a quality inspection result of the voice file based on the similarity. Before the similarity between the actual conversation process and the customer service standard conversation process is calculated, redundancy of the actual conversation process is eliminated, and the customer service standard conversation process is pruned and serialized, so that more accurate similarity is calculated. The voice quality inspection method can greatly reduce the cost of voice quality inspection of customer service personnel by a customer service department and improve the accuracy of voice quality inspection.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a flow chart illustrating a voice quality testing method provided herein;

FIG. 2 is a flow chart illustrating speech recognition of a speech file to obtain a speech text according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a structure of a speech model provided by an embodiment of the present application;

FIG. 4 is a flow chart illustrating a process of drawing an actual dialog between a user and a customer service according to a topic tag provided by an embodiment of the present application;

FIG. 5 illustrates a schematic diagram of culling worthless ones of the hashtags provided by an embodiment of the application;

FIG. 6 is a flow chart illustrating a process of calculating similarity between an actual conversation process and a standard conversation process;

FIG. 7 is a schematic diagram illustrating serialization of a customer service standard dialog flow provided by an embodiment of the present application;

fig. 8 shows a block diagram of a voice quality inspection system provided in the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In the prior art, supervision of customer service personnel by a customer service department is usually performed by playing back conversation voice between the customer service personnel and a user to judge whether the customer service personnel are effectively communicated with the user according to a specified conversation process. However, this method of playing back speech manually is not only very labor-consuming, but also has subjective color, which results in inaccurate speech quality inspection result. The embodiment of the application provides a voice quality inspection method and a voice quality inspection system, so that the labor cost of voice quality inspection of customer service personnel by a customer service department is reduced, and the accuracy of voice quality inspection results is improved.

Referring to the accompanying drawing 1, in the description, fig. 1 is a speech quality inspection method provided by an embodiment of the present application, and the method may include the following steps:

s1, carrying out voice recognition on the voice file to obtain a voice text; wherein the voice file comprises a voice conversation between a user and a customer service;

however, in order to better recognize the speech file of the conversation between the customer service department and the user in this embodiment, it is necessary to confirm the region served by the customer service department to deal with the problem that the pronunciation of the user and the customer service is different in different regions, such as the front nasal sound, the back nasal sound, the flat tongue, the curled tongue, etc. The speech recognition model used in the present application is a regional speech model, which integrates a regional acoustic model, a dictionary, and a speech model, as shown in fig. 3 in the specification. In the method, a plurality of regional voice databases are established, and a plurality of regional acoustic models are obtained based on the training of the plurality of regional voice databases; in the application, a text database is established, and a speech model is obtained based on the text database training. The training steps of the acoustic models in the plurality of regions and the training steps of the speech models are not described herein again.

In one embodiment, as shown in FIG. 2, the method for performing speech recognition on a speech file to obtain a speech text comprises the following steps:

s101, determining the language type of the voice file based on the telephone number attribution or the service hall location;

s102, acquiring a pronunciation dictionary and a language model corresponding to the language type;

s103, extracting the characteristics of the voice file based on the pronunciation dictionary and the language model to obtain voice characteristics;

s104, inputting the voice characteristics into an encoder of a voice recognition model for encoding to obtain an initial voice text;

s105, performing text post-processing on the initial voice text to obtain a voice text; wherein the text post-processing includes spoken language correction and punctuation prediction.

When the method is applied, a voice recognition model of a corresponding region is selected according to a corresponding telephone number attribution place and a service hall place when a voice file is input, then voice characteristics are extracted from the voice file based on a pronunciation dictionary and a language model, and finally the voice characteristics are coded through a coder of the voice recognition model to obtain an initial voice text. The initial speech text is said to be because the speech recognition result by the speech recognition model is often in a spoken language, i.e. with a large number of linguistic words, so that the initial speech text needs to be subjected to text post-processing to obtain the speech text. Specifically, the text post-processing includes spoken language correction and punctuation prediction, and the spoken language words, such as "the", "that", "kayi", "o", "ya", etc., are removed in the spoken language correction stage; and in the punctuation prediction stage, commas, periods, exclamation marks, alphabets and the like are predicted according to the characteristics of the words before and after the punctuation prediction stage, the stay time characteristics and the volume characteristics. Therefore, the dialogue between the customer service and the user in the voice text is processed into a sentence with standard grammar, and a basis is provided for subsequently generating a topic standard.

S2, generating topic labels by sentence according to the character contents of the voice text;

specifically, the topic labels are divided into different categories, namely condition labels, attribute labels and process labels, according to different genre attributes. The condition labels include age, occupation, region, etc.; the attribute labels include 37 years old, Beijing, drivers, etc.; the flow label comprises question, product introduction, preferential introduction, competitive product comparison, chatting, track, next question and the like.

The dictionary-based method marks a condition label on the conversation content between the customer service and the user: setting a dictionary for all categories in the condition tags, and if the conversation content between the customer service and the user hits the dictionary, marking the corresponding category tags; the method based on the regular expression is used for marking the conversation content between the customer service and the user with an attribute label: digital labels such as age, price, income, etc.; the method based on the statistical learning model is used for marking a flow label on the conversation content between the customer service and the user: according to the method, a sample needs to be labeled manually, weights corresponding to corresponding features are obtained according to word features, context features and grammatical features in the sample through training, and label prediction is conducted on sentences to be processed.

According to the voice text generated in step S1, the topic labels are generated from the dialog contents between the customer service and the user sentence by sentence, and the simplified topic labels represent the dialog contents between the customer service and the user, which is a basis for subsequently generating the actual dialog flow between the customer service and the user, because each flow node is represented by the topic label more intuitively.

S3, drawing an actual conversation process between the user and the customer service according to the topic labels generated sentence by sentence;

in order to accurately generate the actual dialog flow between the customer and the user through the topic tag generated in step S2, redundancy of the topic tags generated sentence by sentence in step S2 must be removed. As in step S2, if the customer service or a sentence of the user dialog content is too short, the topic tag cannot be generated effectively, and thus the customer service is labeled as the topic tag of "unrecognizable", or the customer service is in several sentences of friendly meeting and user "chatting", and the topic tags of "unrecognizable" and "chatting" are all attributed as the trivial topic tag, wherein the topic tag of "unrecognizable" can be directly deleted from the actual dialog flow, a threshold value is set for the topic tag of "chatting", and if the number of occurrences of the topic tag of "chatting" does not exceed the threshold value, the topic tag of "chatting" can be directly deleted from the actual dialog flow.

As shown in fig. 5 of the specification, and as shown in fig. 5 of the specification, 5a in the figure is a process of generating an initial actual conversation according to a topic tag, that is, a customer service performs "welcome", "after-sale question answering", "asking for a question", "new product promotion", and "guest delivery" processes in sequence in an actual conversation process with a user, wherein in the "after-sale question answering" process, the customer service performs "chatting" with the user, such as chatting weather, but the "chatting" process is not a necessary process in a standard conversation process, and is only used for embodying a friendly attitude of the customer service. Therefore, in order to better determine the comparison between the actual conversation process and the customer service standard conversation process, the 'chatting' process of the worthless topic label is deleted, and the actual conversation process after the elimination processing shown in fig. 5b is obtained.

Further, after the worthless topic labels are removed, the actual conversation process before the customer service and the user is combined. Specifically, if the same topic tag is typed into an adjacent sentence of the dialogue content between the customer service and the user, the sentence is merged. If the client or the user confirms a certain question emphatically, namely responses are repeated, the client or the user can be classified into a topic label, and therefore the actual conversation process before the customer service and the user is more concise.

In addition, in order to make the actual conversation process between the customer service and the user more hierarchical, a segmentation process is required. For example, for a sentence indicating a new flow, "please ask what can help you", a topic label of "next question" is marked, and according to the topic label, the actual conversation flow before the customer service and the user is split.

In one embodiment, as shown in fig. 4, the actual conversation process between the user and the customer service is drawn according to the topic labels generated sentence by sentence, and the method comprises the following steps:

s301, removing worthless topic labels in the topic labels; wherein the worthless topic tag comprises a chatting flow tag with the frequency not exceeding a preset threshold value;

s302, combining the adjacent and same topic labels;

s303, drawing an actual conversation process between the user and the customer service according to the removed and combined topic labels;

s304, segmenting the actual conversation process at the position corresponding to the label indicating the new process; wherein the predictive new flow label comprises a next problem flow label.

The redundancy removal processing of the actual conversation process between the user and the customer service through the steps can obtain a more accurate actual conversation process, and the accuracy and the basis of the voice quality inspection result are determined.

S4, calculating the similarity between the actual conversation process and the customer service standard conversation process;

in the application, the similarity between the actual conversation process and the customer service standard conversation process is calculated through comparison of the two processes, so as to judge whether the customer service has carried out conversation with the user according to the specified customer service standard conversation process. In step S3, a simple and accurate actual conversation process is obtained by redundancy removal processing on the actual conversation process between the user and the customer service, and in order to better compare the actual conversation process with the customer service standard conversation process, the customer service standard conversation process also needs to be processed, and the specific processing manner includes pruning and serialization.

In one embodiment, as shown in fig. 6 of the specification, the actual conversation process between the user and the customer service is drawn according to the topic labels generated sentence by sentence, and the method comprises the following steps:

s401, determining a process node corresponding to the actual conversation process and the customer service standard conversation process;

s402, judging whether the flow node in the customer service standard conversation flow is a multi-branch flow node, if so, eliminating redundant branch flows of the customer service standard conversation flow at the multi-branch flow node; the redundant branch flows are the rest branch flows except the branch flows corresponding to the actual conversation flow and the customer service standard conversation flow;

s403, expanding parallel flows in the customer service standard conversation flow to generate a plurality of serialized customer service standard conversation flows;

s404, respectively calculating the similarity between the plurality of serialized customer service standard conversation processes and the actual conversation process.

The pruning processing is step S401 and step S402, when the actual dialog flow and the customer service standard dialog flow have corresponding flow nodes, and the flow nodes in the customer service standard dialog flow are multi-branch flow nodes, the redundant branch flows of the customer service standard dialog flow at the multi-branch flow nodes are eliminated, and only the branch flows corresponding to the actual dialog flow are reserved. Such as: if the attribute label (for example, age is 37 years) in the actual conversation process is placed in the customer service standard conversation process, the redundant branches of the customer service standard conversation process in the attribute label are pruned (for example, branch processes of 0< age is less than or equal to 25, 25< age is less than or equal to 35, 45< age is less than or equal to 60, age is greater than 60 and the like are eliminated), so that the customer service standard conversation process can be better compared with the actual conversation process.

The serialization processing is step S403, and when the customer service standard conversation process is a parallel process at a certain process node, the parallel processes of the process node are serially expanded according to the permutation and combination sequence. Referring to FIG. 7 of the specification, FIG. 7a shows a standard customer service dialog flow, and FIGS. 7b and 7c show two serialized standard customer service dialog flows that are separated by the standard customer service dialog flow. Specifically, in the customer service standard conversation process shown in fig. 7a, after the customer service performs the "welcome" process, the "ask for a demand" or "new product recommendation" process may be performed, and then the final "guest sending" process is performed. In 7b, after the customer service performs the welcome flow, the demand inquiring flow is performed, and the guest sending flow is directly performed in the demand inquiring flow; in 7c, after the customer service performs the welcome process, a new product recommendation process is performed, and the guest sending process is directly performed in the new product recommendation process.

It can be seen that in the actual conversation process between the customer service and the user, after the customer service performs the "welcome" process, the "ask for a demand" or "new product recommendation" process only performs one process, rather than two processes, so that in order to better compare the customer service standard conversation process with the actual conversation process, the customer service standard conversation process shown in fig. 7a is divided into two serialized customer service standard conversation processes, namely, 7b and 7 c.

Then, the two serialized customer service standard conversation processes are compared with the actual conversation process respectively to determine the similarity of the two processes according to the step S404. In one embodiment, the similarity may be calculated according to the number of flow nodes where the actual conversation flow coincides with the customer service standard conversation flow.

And S5, acquiring a quality inspection result of the voice file based on the similarity.

In step S5, a similarity threshold may be set, for example, the threshold is set to 60%, that is, when the maximum similarity value generated in step S4 is lower than 60%, the quality inspection result is not qualified, that is, the customer service corresponding to the voice file does not answer the customer service according to the customer service standard conversation process; when the maximum similarity value generated in step S4 is higher than 60%, the quality inspection result is qualified, that is, the customer service corresponding to the voice file is answered with the customer service according to the standard conversation process of the customer service.

The voice quality inspection method comprises the steps of firstly determining a voice recognition model of a corresponding region according to a telephone number attribution place of a voice file or a service hall location, translating the voice file into a voice text through the voice recognition model, then generating topic labels sentence by sentence according to text contents of the voice text, generating an actual conversation process between a user and a customer service according to the topic labels, removing redundancy of the actual conversation process, carrying out pruning and serialization processing on a customer service standard conversation process, then calculating similarity according to the number of process nodes of the actual conversation process consistent with the customer service standard conversation process, and judging a quality inspection result of the voice file through the similarity so as to judge whether the customer service corresponding to the voice file is answered with the customer service according to the customer service standard conversation process. Compared with the prior art, the manual quality inspection method has the advantages that the labor consumption is saved for parts, and the quality inspection accuracy is improved.

Based on the same inventive concept, an embodiment of the present application further provides a voice quality inspection system, as shown in fig. 8 in the specification, including:

Wherein the flow drawing module comprises:

Based on the same inventive concept, an embodiment of the present application further provides an electronic device, which includes a processor, a memory and a bus, where the memory stores machine-readable instructions executable by the processor, and when the electronic device runs, the processor and the memory communicate with each other through the bus, and the machine-readable instructions, when executed by the processor, perform the steps of the voice quality inspection method.

Based on the same inventive concept, embodiments of the present application further provide a computer-readable storage medium, which when executed on a computer, causes the computer to perform the above offline charging method. The storage medium includes one or more computer programs. The procedures or functions according to the embodiments of the present invention are wholly or partially generated when the computer program is loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable system. The computer program can be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer program can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The storage medium includes, but is not limited to, non-volatile and/or volatile memory. Non-volatile memory includes, but is not limited to, read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory includes, but is not limited to, Random Access Memory (RAM) or external cache memory.

Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the technical solutions of the present application, and the scope of the present application is not limited thereto, although the present application is described in detail with reference to the foregoing examples, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A voice quality inspection method is characterized by comprising the following steps:

2. The voice quality inspection method according to claim 1, wherein performing voice recognition on the voice file to obtain the voice text comprises the following steps:

3. The voice quality inspection method as claimed in claim 2, wherein generating topic labels for the text contents of the voice text sentence by sentence comprises the following steps:

4. The voice quality inspection method as claimed in claim 3, wherein the actual dialog flow between the user and the customer service is drawn according to the topic labels generated sentence by sentence, comprising the following steps:

merging adjacent and identical topic tags;

5. The voice quality inspection method according to claim 4, wherein calculating the similarity between the actual conversation process and the customer service standard conversation process comprises the following steps:

6. The voice quality inspection method according to claim 5, wherein obtaining the quality inspection result of the voice file based on the similarity comprises the following steps:

if so, judging that the voice quality inspection is unqualified;

if not, the voice quality inspection is judged to be qualified.

7. A voice quality inspection system, comprising:

8. The voice quality inspection system of claim 7, wherein the flow rendering module comprises:

the redundancy removing unit is used for removing the worthless topic labels in the topic labels, combining the adjacent and same topic labels and segmenting the actual conversation process at the position corresponding to the label indicating a new process; wherein the worthless topic tag comprises a chatting flow tag with the frequency not exceeding a preset threshold value; the predictive new flow label comprises a next problem flow label;

9. An electronic device comprising a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory communicate via the bus when the electronic device is running, and the machine-readable instructions, when executed by the processor, perform the steps of the voice quality testing method according to any one of claims 1 to 6.

10. A storage medium having stored thereon program instructions executable by a processor to perform the steps of the voice quality testing method of any one of claims 1 to 6.