CN111314566A

CN111314566A - Voice quality inspection method, device and system

Info

Publication number: CN111314566A
Application number: CN202010067212.7A
Authority: CN
Inventors: 姜秋宇; 李晓宇; 李明
Original assignee: Beijing Ultrapower Intelligent Data Technology Co ltd
Current assignee: Beijing Ultrapower Intelligent Data Technology Co ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-19

Abstract

The invention discloses a voice quality inspection method, a voice quality inspection device and a voice quality inspection system. The method of the invention comprises the following steps: obtaining conversation voices of both parties of a conversation and converting the conversation voices into text data; performing keyword matching processing on the text data to obtain a keyword matching result; performing text similarity calculation on the text data to obtain a similarity calculation result; and obtaining a quality inspection report of the dialogue voice according to the keyword matching result and the similarity calculation result. The voice quality inspection method combines the keyword retrieval and the text similarity calculation to perform voice quality inspection, and compared with a single keyword retrieval quality inspection mode, the error can be reduced, the error of the voice quality inspection is reduced, and further the labor cost and the time cost of an enterprise are reduced.

Description

Voice quality inspection method, device and system

Technical Field

The invention relates to the technical field of data analysis, in particular to a voice quality inspection method, device and system.

Background

At present, in the electricity sales outbound service, manual sampling inspection is the most common quality inspection means. The manual spot inspection takes the qualification rate of part of samples as a delivery standard, the traditional inspection method is simple and intuitive, and can be used as a standard for measuring the quality of services under most conditions, but when the number of the total individuals is large, the recall ratio cannot be ensured, and the great labor cost and time cost are generated; when the overall difference degree is larger, the qualification rate representativeness obtained by the traditional sampling method is also reduced correspondingly.

As shown in fig. 1, the telemarketing outbound service is firstly telemarketing by an outbound team according to a standard telephone technology, after the telemarketing is finished, stored call data are provided for a quality inspection team, the quality inspection team performs spot inspection on the data, qualified data are delivered to a client after the spot inspection qualification rate reaches a certain standard, unqualified data are fed back to the outbound team, the outbound team records the data, and the outbound team is reformed or the standard telephone technology is optimized according to unqualified reasons.

With the proposal of big data concept, the manual sampling inspection method exposes the problems of high labor cost and time cost, low recall ratio and the like, and the quality inspection of the call records only obviously causes data waste.

Disclosure of Invention

The invention aims to provide a voice quality inspection method, a voice quality inspection device and a voice quality inspection system.

In a first aspect, an embodiment of the present invention provides a voice quality inspection method, including:

obtaining conversation voices of both parties of a conversation and converting the conversation voices into text data; performing keyword matching processing on the text data to obtain a keyword matching result; performing text similarity calculation on the text data to obtain a similarity calculation result; and obtaining a quality inspection report of the dialogue voice according to the keyword matching result and the similarity calculation result.

In a second aspect, an embodiment of the present invention provides a voice quality inspection apparatus, including:

the processing unit is used for acquiring conversation voices of both parties of a conversation and converting the conversation voices into text data; the retrieval unit is used for performing keyword matching processing on the text data to obtain a keyword matching result; the calculation unit is used for performing text similarity calculation on the text data to obtain a similarity calculation result; and the quality inspection unit is used for obtaining a quality inspection report of the conversation voice according to the keyword matching result and the similarity calculation result.

In a third aspect, an embodiment of the present invention provides a voice quality inspection system, including: a memory and a processor; a memory storing computer-executable instructions; a processor, the computer executable instructions when executed causing the processor to perform a voice quality testing method.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which one or more computer programs are stored, where the one or more computer programs, when executed, implement a voice quality inspection method.

The invention at least achieves the following technical effects: after the matching degree of the text data of the dialogue voice and the keywords is calculated, the similarity between the text data of the dialogue voice and the standard dialogues is calculated, whether the dialogues of the customer service are approximately consistent with the standard dialogues is judged by utilizing the similarity, and whether the customer service speaks sentences required in the dialogues is judged according to the keywords appearing in the text instead of singly, so that quality inspection errors caused by language habits of customer service personnel and customers in quality inspection only depending on keyword retrieval are avoided. The voice quality inspection method combines the keyword retrieval and the text similarity calculation to perform voice quality inspection, and compared with a single keyword retrieval quality inspection mode, the error can be reduced, the error of the voice quality inspection is reduced, and further the labor cost and the time cost of an enterprise are reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. For a person skilled in the art, it is possible to derive other relevant figures from these figures without inventive effort.

FIG. 1 is a flow chart of a manual spot check method in the prior art;

fig. 2 is a block diagram showing a hardware configuration of a voice quality inspection system according to an embodiment of the present invention;

FIG. 3 is a flowchart of a voice quality inspection method according to an embodiment of the present invention;

FIG. 4 is a flow chart illustrating a voice quality inspection according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a quality inspection report according to an embodiment of the present invention;

fig. 6 is a block diagram illustrating a voice quality inspection apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram illustrating a voice quality inspection system according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< example one >

Fig. 2 is a block diagram of a hardware configuration of the voice quality inspection system 100 according to an embodiment of the present invention.

As shown in fig. 2, the voice quality inspection system 100 includes a data acquisition device 1000 and a voice quality inspection device 2000.

The data collection device 1000 is configured to collect conversation voices of both parties of a conversation and provide the collected conversation voices to the voice quality inspection device 2000.

The voice quality inspection apparatus 2000 may be any electronic device, such as a PC, a notebook computer, a server, or the like.

In this embodiment, referring to fig. 2, the voice quality inspection apparatus 2000 may include a processor 2100, a memory 2200, an interface apparatus 2300, a communication apparatus 2400, a display apparatus 2500, an input apparatus 2600, a speaker 2700, a microphone 2800, and the like.

The processor 2100 may be a mobile version processor. The memory 2200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 2300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 2400 can perform wired or wireless communication, for example, the communication device 2400 may include a short-range communication device, such as any device that performs short-range wireless communication based on a short-range wireless communication protocol, such as a Hilink protocol, WiFi (IEEE 802.11 protocol), Mesh, bluetooth, ZigBee, Thread, Z-Wave, NFC, UWB, LiFi, and the like, and the communication device 2400 may also include a remote communication device, such as any device that performs WLAN, GPRS, 2G/3G/4G/5G remote communication. The display device 2500 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 2600 may include, for example, a touch screen, a keyboard, and the like. A user can input/output voice information through the speaker 2700 and the microphone 2800.

In this embodiment, the memory 2200 of the voice quality inspection apparatus 2000 is configured to store instructions for controlling the processor 2100 to operate at least to perform the voice quality inspection method according to any embodiment of the present invention. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

Although a plurality of devices of the voice quality inspection device 2000 are illustrated in fig. 2, the present invention may only relate to some of the devices, for example, the voice quality inspection device 2000 only relates to the memory 2200 and the processor 2100.

In this embodiment, the diagram data collecting device 1000 is configured to collect conversation voices of both parties of a call and provide the collected conversation voices to the voice quality inspection device 2000, and the voice quality inspection device 2000 implements the voice quality inspection method according to any embodiment of the present invention based on the conversation voices.

It should be understood that although fig. 2 only shows one data collection device 1000 and one voice quality inspection device 2000, the number of each is not meant to be limited, and a plurality of data collection devices 1000 and/or voice quality inspection devices 2000 may be included in the voice quality inspection system 100.

< example two >

Fig. 3 is a flowchart of a voice quality inspection method according to an embodiment of the present invention, and as shown in fig. 3, the method according to the embodiment includes:

s3100, obtaining conversation voices of two parties in conversation, and converting the conversation voices into text data.

In this embodiment, the two parties to the call include a first party and a second party, the first party sends the access voice to the second party, and the second party returns the feedback voice to the first party based on the access voice. For example, the first party is a service person, for example, an outbound person in a call-out service, and the second party is a service object in the call-out service.

Accordingly, conversational speech includes: the method comprises the following steps that access voice sent to a second party by a first party and feedback voice returned to the first party by the second party according to the access voice sent by the first party.

Among other things, speech conversion techniques may be employed to convert conversational speech into text data.

S3200, performing keyword matching processing on the text data to obtain a keyword matching result; and performing text similarity calculation on the text data to obtain a similarity calculation result.

And S3300, obtaining a quality inspection report of the dialogue voice according to the keyword matching result and the similarity calculation result.

Carrying out weight assignment processing on the keyword matching result and the similarity calculation result to obtain a quality inspection model, and calculating an output result of the quality inspection model; if the output result is larger than the preset threshold value, the output quality is qualified; and if the output result is not greater than the preset threshold value, outputting that the quality inspection is unqualified. The preset threshold value can be set according to experience, the quality inspection model is a linear model related to the keyword matching result and the similarity calculation result, the quality inspection model comprises a first weight corresponding to the keyword matching result and a second weight corresponding to the similarity calculation result, and the first weight and the second weight can be set according to business requirements.

And matching the similarity calculation result obtained by calculation with the keyword matching result, for example, giving a weight to the similarity calculation result and the keyword matching result, wherein the weight can be adjusted according to the actual condition of the service, and the quality inspection is carried out on the dialogue voice based on the final result after assignment. Therefore, the dialogue voice with high similarity and high keyword matching score can be marked as qualified data, the dialogue voice with low similarity and low keyword matching score can be marked as qualified data, the dialogue voice with high similarity and low keyword matching score can be marked as data needing further judgment, and when the dialogue voice is marked as needing further judgment, the dialogue voice can be judged according to the service requirements. For example, when the customer service telemarketing content basically meets the standard speech technology requirement but lacks some key speech technologies, such as forgetting to ask about the client about the subscription time, acceptable price range, and other key speech technologies, the embodiment may determine that the speech of the section needs to be further determined.

In the embodiment, after the matching degree of the text data of the dialogue voice and the keywords is calculated, the similarity between the text data of the dialogue voice and the standard dialogues is calculated, and whether the dialogues of the customer service are approximately consistent with the standard dialogues is judged by using the similarity instead of judging whether the customer service utters the sentences required in the dialogues according to the keywords appearing in the text, so that quality inspection errors caused by language habits of the customer service staff and the clients in quality inspection only by means of keyword retrieval are avoided. According to the embodiment, the keyword retrieval and text similarity calculation are combined to perform voice quality inspection, and compared with a single keyword retrieval quality inspection mode, errors can be reduced, errors existing in voice quality inspection are reduced, and further labor cost and time cost of enterprises are reduced.

< example three >

The embodiment also provides a voice quality inspection method. With reference to fig. 4, the performing text similarity calculation on the text data in step S3200 to obtain a similarity calculation result further includes:

firstly, performing word segmentation processing on text data to obtain word segmentation of the text data; and performing word segmentation processing on the text data to obtain word segmentation of the text data. For example, the word segmentation process of "here is the outbound team" can obtain three components of "here/this/outbound team". And (4) completing conversational detection according to the obtained participles, namely whether necessary keywords of the conversational speech are spoken in the marketing process. Then, calculating the occurrence frequency of each participle in the text data, and calculating the text quantity of each participle in the preset corpus; and finally, calculating the similarity between the text vector and the text vector of the standard dialect text according to the frequency and the text number of each word segmentation, and taking the calculated similarity value as a similarity calculation result.

In one example, a TF-IDF (Term Frequency-Inverse document Frequency) may be used for text similarity calculation. Wherein, TF refers to the frequency of word occurrence in a text, and generally TF is the number of times of word occurrence in the text/the number of all words in the text; IDF refers to the number of documents in the corpus in which a word appears, and the number of documents is logarithmized, for example, IDF is log (total number of texts in the corpus/number of different texts in the corpus in which a word appears).

The principle of TF-IDF is: the more frequently a word appears in a text, the more important it is for the text; the more texts the word appears, the less the word has strong distinction degree to the article, the smaller the weight in the document is, and the inverse of the word frequency is generally adopted. Also consider a phenomenon, the number of times that some common words appear may be tens of times or hundreds of times of the low frequency words, if it is only a simple inversion process, the weight of the common words will change very little, and the weight of the rare words will appear too big. In order to balance the weight relation between common words and rare words, logarithm operation is adopted for inverse operation.

Comparing the text data with the standard dialogues by the TF-IDF word frequency similarity calculation method to obtain the vector similarity of the text data and the standard dialogues, wherein the larger the similarity value is, the more the content of the call is matched with the standard dialogues, and the smaller the content of the call is otherwise.

For example, the present embodiment constructs a bag-of-words model TF _ IDF (t, d) × IDF (t) for calculating the similarity based on the TF-IDF algorithm.

Similarity between the text vector and the text vector of the standard dialect text is then calculated based on the bag-of-words model. Specifically, with speech a: "i like watching tv and dislike watching movie", voice B: "i do not like watching tv nor watching movie" is exemplified.

The speech A 'i like watching TV and dislike watching movie' is segmented, and the obtained segmentation result is as follows: i/like/watch/tv, not/like/watch/movie; the speech B 'i do not like watching TV nor watching movie' is segmented, and the obtained segmentation result is as follows: i/no/like/watch/tv, and also/no/like/watch/movie.

The corresponding dimensionality of the two voice word segments is as follows: i like, watch, tv, movie, not, too.

Therefore, the word frequency corresponding to each participle in the statistical speech A is as follows:

and voice A: i 1, like 2, watch 2, tv 1, movie 1, not 1, also 0.

And B, voice B: i 1, like 2, watch 2, tv 1, movie 1, not 2, also 1.

The vector data corresponding to the voice A and the voice B obtained through statistics is as follows:

and voice A: [1,2,2,1,1,1,0]

And B, voice B: [1,2,2,1,1,2,1]

Obtaining the speech A and speech B pairAfter the corresponding vector data, the TF (t, d) value of each participle in the speech A can be calculated, and then based on a formula

Calculating the IDF (t) value of each participle in the speech A, and substituting the TF (t, d) value and the IDF (t) value obtained by calculation into the bag-of-words model to calculate the similarity between the speech A and the text vector of the standard conversational text.

In this embodiment, the two parties of the call include a first party and a second party, and in the voice quality inspection, if it is required to determine whether the keyword necessary for the call operation is spoken in the marketing process, the conversation voice corresponding to the first party needs to be determined. Thus, the access voice of the first party in the dialogue voice can be acquired, and the access voice can be converted into the first text data. Correspondingly, keyword matching processing and text similarity calculation are respectively carried out on the first text data, and a keyword matching result and a similarity calculation result are obtained.

Taking the call between the outbound person A and the client B in the electricity sales outbound service as an example, the method can acquire the dialogue voice between the outbound person A and the client B from the recording file, convert the dialogue voice into text data, at the moment, the text data comprises first text data corresponding to the outbound person A and second text data corresponding to the client B, perform keyword matching processing and text similarity calculation on the first text data, acquire a keyword matching result and a similarity calculation result, input the keyword matching result and the similarity calculation result into a quality inspection model, and judge whether the voice call is qualified or not by using an output result of the quality inspection model.

In some embodiments, after the access voice is converted into the first text data, word segmentation processing may be further performed on the first text data to obtain word segmentation of the first text data; and performing sensitive word retrieval on the participles, and outputting unqualified quality inspection when the sensitive words exist in the retrieved participles.

Referring to fig. 4, after performing word segmentation processing on the first text data corresponding to the first party, performing keyword search on the processed word segmentation, outputting unqualified quality inspection if sensitive words, such as sensitive words like abusive words, are retrieved, and marking the unqualified reason in the quality inspection report as "the sensitive words occur" so as to complete alarm pushing based on the unqualified mark in the quality inspection report.

In some embodiments, feedback speech of a second party in the conversation speech may be obtained, the feedback speech is converted into second text data, user portrait collection is performed on the second text data, and user portrait data is obtained, wherein the user portrait data includes age, gender, location and specific needs of a client, so that potential needs of the client are mined according to the user portrait data.

With continued reference to fig. 4, tag keywords may be obtained through keyword retrieval or through contextual semantic association, the tag keywords may be clustered, such as purchasing preferences, interest preferences, potential needs, economic strength, etc., the customer tags may be analyzed, the customer information may be tagged, and the collection of user portrait data may be completed.

In some embodiments, after converting the feedback voice into the second text data, the emotional state of the second party during the call is analyzed based on the second text data, an emotion analysis index of the second party is obtained, and the success or failure of the sale is predicted based on the emotion analysis index.

With continued reference to fig. 4, by analyzing the emotional state of the second party during the call, an emotional analysis index such as optimism, anger, hesitation, etc. can be obtained, and the success of the sale can be predicted when the emotional analysis index of the second party is obtained as optimism.

In some embodiments, the obtaining a quality inspection report of the conversational speech according to the keyword matching result and the similarity calculation result in step S3300 further includes:

continuing to refer to fig. 4, selecting a corresponding assignment method according to the service requirement to perform weighted assignment calculation on the keyword matching result and the similarity calculation result, and performing quality inspection on the conversation voice according to the calculation result. For example, an entropy weight method, a standard deviation method, or a CRITIC method (criterion impact high intercritical correlation) may be selected to perform weighted assignment calculation on the keyword matching result and the similarity calculation result.

Entropy weight method: it is generally considered that if the entropy of the index is smaller, the larger the amount of information contained in the index is, the higher the proportion of the index in the overall evaluation should be. According to this principle, the weight calculation formula of the index given in this embodiment is as follows:

firstly according to the formula

The information entropy of the above two indexes (i.e., the keyword matching result and the similarity calculation result) is calculated.

Wherein, the parameter E_jInformation entropy representing jth index, n is the number of evaluation objects, and parameter p_ijIs the specific gravity value, parameter d_ijTo normalize the data, if p_ijWhen the value is equal to 0, then

After the information entropies of the two indexes are calculated, the weight coefficient of each index is constructed by using the information entropies, and the specific calculation formula is as follows:

wherein the parameter w_jThe weight coefficient of the jth index is shown, the parameter m is the number of indexes, and m is 2 in this embodiment.

Standard deviation method:

the idea of standard dispersion is quite similar to the entropy weight method, but it is based on standard deviation rather than information entropy. It is generally considered that the larger the standard deviation of an index is, the larger the variation of the index is, i.e. the more information contained therein, the larger the weight thereof should be. Based on this idea, the standard deviation sigma of the jth index is used_jCan adopt a formula

And calculating a weight coefficient.

CRITIC method:

CRITIC methodThe basic idea of (2) is to base on contrast strength and conflict when constructing weights. Wherein the contrast intensity is expressed in terms of standard deviation. Generally, the larger the standard deviation of the index, the larger the difference between the evaluation objects. The conflict is represented by the correlation coefficient between the indexes, and if the correlation between the indexes is strong, the conflict is weak. Based on the above thought, the present embodiment constructs the index c containing two kinds of information_jWherein c is_jThe calculation formula of (2) is as follows:

as can be seen from the above indices, c_jThe larger the information is, the more information is contained, and the larger the weight coefficient is. Therefore, the weighting factor of the j index can be expressed by the formula

And (4) calculating.

Fig. 5 shows a quality inspection report obtained by using the voice quality inspection method of the present embodiment, and in combination with fig. 5, the voice quality inspection method of the present embodiment can implement the following points:

1. marketing session technical quality inspection and evaluation: the quality inspection process marks key words such as required phrases and illegal phrases, scores calls according to quality inspection rules, supports manual sampling inspection, rechecks texts and recordings, and marks inconsistent conclusions. The embodiment outputs a quality inspection detailed table, a qualified (unqualified) condition, a spot inspection difference table and the like, and can also define grading detailed rules and corresponding addition and subtraction values by self, thereby quantifying the service quality and seeing the grading result more intuitively. And according to the scoring rules, the telephone traffic detection result, the content detection result, the hit recording details, the sub telephone traffic group and the seat risk details can be consulted.

2. And (3) mining the potential demands of the customers: the customer information is labeled by collecting the user portrait data, and the newly added label is synchronized to a customer management system, so that a basis is provided for later-stage accurate service.

3. And (3) predicting sales behaviors: predicting whether the sale is successful by collecting emotion analysis indexes;

4. optimizing a marketing strategy: and analyzing data of the quality-tested dialogue voice, carrying out telephone traffic statistics, hot spot division and abnormal analysis visual display, and providing a basis for decision making of a management layer. The analysis scene comprises ultra-long mute analysis, ultra-long call analysis, speech speed analysis, dialectical comparison analysis and the like, and quality inspection conclusion and improvement suggestions are provided for the field, the flow and the skill through analysis.

5. And (3) alarm pushing: and monitoring public opinion risks such as 'customer complaints' and 'customer exposure' in high-risk emergency conversation scenes in real time.

< example four >

Fig. 6 is a block diagram of a voice quality inspection apparatus according to an embodiment of the present invention, and as shown in fig. 6, the apparatus according to the embodiment includes:

a processing unit 6100, configured to obtain conversation voices of both parties of a call, and convert the conversation voices into text data;

a retrieving unit 6200, configured to perform keyword matching processing on the text data to obtain a keyword matching result;

a calculating unit 6300, configured to perform text similarity calculation on the text data to obtain a similarity calculation result;

and the quality inspection unit 6400 is configured to obtain a quality inspection report of the conversational speech according to the keyword matching result and the similarity calculation result.

In some embodiments, the quality inspection unit 6400 is configured to perform weight assignment processing on the keyword matching result and the similarity calculation result to obtain a quality inspection model; calculating an output result of the quality inspection model; if the output result is larger than a preset threshold value, outputting that the quality inspection is qualified; and if the output result is not greater than the preset threshold value, outputting that the quality inspection is unqualified.

In some embodiments, the computing unit 6300 is configured to perform word segmentation processing on the text data to obtain words of the text data; calculating the occurrence frequency of the participles in the text data and calculating the number of texts with the participles in a preset corpus; and calculating the similarity between the text vector and the text vector of the standard dialect text according to the frequency and the text number, and taking the calculated similarity value as the similarity calculation result.

In some embodiments, the conversational speech includes access speech sent by the first party to the second party, and feedback speech returned by the second party to the first party based on the access speech sent by the first party.

A processing unit 6100, configured to obtain an access voice of the first party in the conversation voice, and convert the access voice into first text data; a retrieving unit 6200, configured to perform keyword matching processing on the first text data to obtain a keyword matching result; the calculating unit 6300 is configured to perform text similarity calculation on the first text data to obtain a similarity calculation result.

The retrieval unit 6200 is further configured to perform word segmentation processing on the first text data to obtain a word segmentation of the first text data, and perform sensitive word retrieval on the word segmentation; and the quality inspection unit 6400 is used for outputting that the quality inspection is unqualified when the sensitive word exists in the segmented words.

A processing unit 6100, further configured to obtain feedback voice of the second party in the conversation voice, and convert the feedback voice into second text data; a retrieving unit 6200, configured to perform user portrait collection on the second text data to obtain user portrait data.

The voice quality inspection device further comprises an emotion analysis unit, and the emotion analysis unit is used for analyzing the emotion state of the second party in the call process based on the second text data to obtain the emotion analysis index of the second party.

The specific implementation manner of each module in the apparatus embodiment of the present invention may refer to the related content in the method embodiment of the present invention, and is not described herein again.

< example five >

Fig. 7 is a block diagram of a voice quality inspection system according to an embodiment of the present invention, and as shown in fig. 7, in a hardware level, the virtual reality system includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least one disk Memory.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may comprise program code comprising computer executable instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the voice quality inspection device on the logic level. And the processor executes the program stored in the memory to realize the voice quality inspection method.

The method performed by the voice quality inspection apparatus according to the embodiment shown in fig. 7 of the present specification can be implemented in a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the voice quality inspection method described above may be implemented by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is positioned in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the voice quality inspection method.

The invention also provides a computer readable storage medium.

The computer readable storage medium stores one or more computer programs, the one or more computer programs comprising instructions, which when executed by a processor, are capable of implementing the voice quality inspection method described above.

For the convenience of clearly describing the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, the words "first", "second", and the like are used to distinguish the same items or similar items with basically the same functions and actions, and those skilled in the art can understand that the words "first", "second", and the like do not limit the quantity and execution order.

While the foregoing is directed to embodiments of the present invention, other modifications and variations of the present invention may be devised by those skilled in the art in light of the above teachings. It should be understood by those skilled in the art that the foregoing detailed description is for the purpose of better explaining the present invention, and the scope of the present invention should be determined by the scope of the appended claims.

Claims

1. A voice quality inspection method is characterized by comprising the following steps:

obtaining conversation voices of both parties of a conversation and converting the conversation voices into text data;

performing keyword matching processing on the text data to obtain a keyword matching result; performing text similarity calculation on the text data to obtain a similarity calculation result;

and obtaining a quality inspection report of the dialogue voice according to the keyword matching result and the similarity calculation result.

2. The method of claim 1, wherein obtaining a quality inspection report of the conversational speech according to the keyword matching result and the similarity calculation result comprises:

carrying out weight assignment processing on the keyword matching result and the similarity calculation result to obtain a quality inspection model;

calculating an output result of the quality inspection model;

if the output result is larger than a preset threshold value, outputting that the quality inspection is qualified; and if the output result is not greater than the preset threshold value, outputting that the quality inspection is unqualified.

3. The method of claim 1, wherein performing text similarity calculation on the text data to obtain a similarity calculation result comprises:

performing word segmentation processing on the text data to obtain word segmentation of the text data;

calculating the occurrence frequency of the participles in the text data and calculating the number of texts with the participles in a preset corpus;

and calculating the similarity between the text vector and the text vector of the standard dialect text according to the frequency and the text number, and taking the calculated similarity value as the similarity calculation result.

4. The method of claim 1, wherein if the conversation voice includes an access voice transmitted from the first party to the second party, acquiring conversation voices of both parties of the conversation, and converting the conversation voices into text data, comprises:

and acquiring the access voice of the first party in the conversation voice, and converting the access voice into first text data.

5. The method according to claim 4, wherein the text data is subjected to keyword matching processing to obtain a keyword matching result; and performing text similarity calculation on the text data to obtain a similarity calculation result, wherein the similarity calculation result comprises the following steps:

and respectively carrying out keyword matching processing and text similarity calculation on the first text data to obtain a keyword matching result and a similarity calculation result.

6. The method of claim 4, further comprising, after converting the access speech into first text data:

performing word segmentation processing on the first text data to obtain word segmentation of the first text data;

and performing sensitive word retrieval on the participles, and outputting unqualified quality inspection when sensitive words exist in the participles.

7. The method of claim 4, wherein the conversation voice further includes a feedback voice returned by the second party to the first party according to the access voice sent by the first party, and the method acquires conversation voices of both parties of the conversation and converts the conversation voices into text data, comprising:

acquiring feedback voice of the second party in the conversation voice, and converting the feedback voice into second text data;

and collecting the user portrait of the second text data to obtain user portrait data.

8. The method of claim 7, further comprising, after converting the feedback speech into second text data:

and analyzing the emotional state of the second party in the call process based on the second text data to obtain an emotional analysis index of the second party.

9. A voice quality inspection apparatus comprising:

the processing unit is used for acquiring conversation voices of both parties of a conversation and converting the conversation voices into text data;

the retrieval unit is used for performing keyword matching processing on the text data to obtain a keyword matching result;

the calculation unit is used for performing text similarity calculation on the text data to obtain a similarity calculation result;

and the quality inspection unit is used for obtaining a quality inspection report of the conversation voice according to the keyword matching result and the similarity calculation result.

10. A voice quality inspection system comprising: a memory and a processor;

the memory storing computer-executable instructions;

the processor, computer executable instructions, when executed, cause the processor to perform the voice quality testing method of any of claims 1-9.