CN111723204B

CN111723204B - Method and device for correcting voice quality inspection area, correction equipment and storage medium

Info

Publication number: CN111723204B
Application number: CN202010543859.2A
Authority: CN
Inventors: 聂镭; 邹茂泰; 聂颖
Original assignee: Longma Zhixin Zhuhai Hengqin Technology Co ltd
Current assignee: Longma Zhixin Zhuhai Hengqin Technology Co ltd
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2021-04-02
Anticipated expiration: 2040-06-15
Also published as: CN111723204A

Abstract

The application is applicable to the technical field of voice processing, and provides a method, a device and a storage medium for correcting a voice quality inspection area, wherein the method comprises the following steps: the method comprises the steps of obtaining a voice text to be processed, determining a quality inspection area of the voice text to be processed according to a quality inspection link corresponding to a voice text segment of the voice text to be processed, determining a quality inspection area to be corrected in the quality inspection area according to preset keywords, and correcting the voice text segment of the quality inspection area to be corrected according to a complete text segment corresponding to the preset keywords. Therefore, after the voice text to be processed is obtained, the quality inspection area of the text to be processed is determined, the quality inspection area can be corrected according to the preset keywords, subsequent quality inspection on the quality inspection area is facilitated, the problem that in the prior art, voice calls of customer service staff and customers are converted into the voice text with low accuracy is solved, and the effect of improving the voice recognition accuracy is achieved.

Description

Method and device for correcting voice quality inspection area, correction equipment and storage medium

Technical Field

The present application belongs to the field of speech processing technologies, and in particular, to a method and an apparatus for correcting a speech quality inspection region, a correction device, and a storage medium.

Background

For the purposes of improving customer satisfaction, improving customer service, evaluating the work of customer service staff, and the like, quality inspection of voice calls between the customer service staff and the customer is generally required, for example, the voice calls between the customer service staff and the customer in the insurance industry are inspected to find out violation points.

The traditional quality inspection mode is manual quality inspection, but because the voice call quantity of customer service personnel and customers is too large, the traditional manual quality inspection efficiency is low, and manpower and material resources are greatly consumed.

The current quality inspection mode is intelligent quality inspection, and the principle is as follows: firstly, voice calls between customer service personnel and customers are converted into voice texts through a regular expression technology or a natural voice processing technology, then a quality inspection area of the voice texts is determined, and finally the quality inspection area is checked to find out violation points existing in the quality inspection area. However, in the prior art, in the process of converting the voice call between the customer service staff and the client into the voice text through the regular expression technology or the natural voice processing technology, a large number of training samples are required to be trained first, and the problems of insufficient training samples, large difference between the voice environment in the training process and the voice environment in the application process, and the like exist, so that the accuracy of converting the voice call between the customer service staff and the client into the voice text in the prior art is low, and the efficiency of performing quality inspection subsequently according to the voice text is low.

Disclosure of Invention

The embodiment of the application provides a method and a device for correcting a voice quality inspection area, and can solve the problem that in the prior art, the accuracy rate of converting voice calls between customer service personnel and clients into voice texts is low.

In a first aspect, an embodiment of the present application provides a method for correcting a voice quality inspection region, including:

acquiring a voice text to be processed, wherein the voice text to be processed comprises at least one voice text fragment;

determining a quality inspection area of the voice text to be processed according to a quality inspection link corresponding to the voice text segment, wherein the quality inspection area comprises at least one voice text segment;

determining a quality inspection area to be corrected in the quality inspection area according to a preset keyword;

and correcting the voice text segment of the quality inspection area to be corrected according to the complete text segment corresponding to the preset keyword.

In a possible implementation manner of the first aspect, correcting the voice text segment of the quality inspection region to be corrected according to a complete text segment corresponding to a preset keyword includes:

screening out a correctable quality inspection area in the quality inspection area to be corrected according to the complete text segment corresponding to the preset keyword;

and identifying the keywords to be corrected in the correctable quality inspection area, and correcting the keywords to be corrected.

In a possible implementation manner of the first aspect, screening out a correctable quality inspection region in a quality inspection region to be corrected according to a complete text segment corresponding to a preset keyword includes:

constructing a window according to the length of the complete text segment;

performing sliding window matching on the voice text segment of the quality inspection area to be corrected according to the window by using a preset step length;

and determining a correctable quality inspection area in the quality inspection area to be corrected according to the similarity between the voice text fragment of the quality inspection area to be corrected corresponding to each window and the complete text fragment.

In a possible implementation manner of the first aspect, before the obtaining the to-be-processed speech text, the method further includes:

acquiring voice audio to be processed;

and converting the voice audio to be processed into a voice text to be processed.

In a possible implementation manner of the first aspect, the converting the to-be-processed speech audio into to-be-processed speech text includes:

separating out a target voice audio frequency from the voice audio frequency to be processed;

performing role object segmentation on the target voice audio to obtain a voice audio segment; wherein each voice audio clip corresponds to a character object;

and converting the voice audio fragment into a voice text fragment, and forming the voice text to be processed according to the voice text fragment.

In a possible implementation manner of the first aspect, the determining, according to a quality inspection link corresponding to the speech text segment, a quality inspection area of the speech text to be processed includes:

inputting the voice text segment of the voice text to be processed to a preset cnn model to obtain a quality inspection link corresponding to the voice text segment;

and determining the quality inspection links of the same type, and taking the areas of the voice text segments corresponding to the quality inspection links of the same type as a quality inspection area.

In a possible implementation manner of the first aspect, the determining the quality inspection links of the same category, and using, as a quality inspection area, areas where the voice text segments respectively correspond to the quality inspection links of the same category includes:

performing cluster analysis on the quality inspection links of the same category to obtain a quality inspection link set;

determining a starting quality inspection link and an ending quality inspection link in the quality inspection link set;

and taking the areas of the voice text segments respectively corresponding to the initial quality inspection link, the intermediate quality inspection link between the initial quality inspection link and the ending quality inspection link as quality inspection differences.

In a second aspect, an embodiment of the present application provides a device for correcting a voice quality inspection area, including:

the acquisition module is used for acquiring a voice text to be processed; wherein the voice text to be processed comprises at least one voice text segment;

the positioning module is used for determining a quality inspection area of the voice text to be processed according to a quality inspection link corresponding to the voice text fragment;

the dividing module is used for determining a quality inspection area to be corrected in the quality inspection area according to a preset keyword;

and the correction module is used for correcting the voice text segment of the quality inspection area to be corrected according to the complete text segment corresponding to the preset keyword.

In a possible implementation manner of the second aspect, the correction module further includes:

and the screening submodule is used for screening out a correctable quality inspection area in the quality inspection area to be corrected according to the complete text segment corresponding to the preset keyword.

And the correction submodule is used for identifying the keywords to be corrected in the correctable quality inspection area and correcting the keywords to be corrected.

In a possible implementation manner of the second aspect, the screening submodule further includes:

the construction unit is used for constructing a window according to the length of the complete text segment;

the matching unit is used for performing sliding window matching on the voice text segment of the quality inspection area to be corrected according to the window by using a preset step length;

and the determining unit is used for determining the correctable quality inspection area in the quality inspection area to be corrected according to the similarity between the voice text segment of the quality inspection area to be corrected corresponding to each window and the complete text segment.

the audio acquisition module is used for acquiring voice audio to be processed;

and the conversion module is used for converting the voice audio to be processed into a voice text to be processed.

In one possible implementation manner of the second aspect, the conversion module includes:

the separation submodule is used for separating a target voice audio from the voice audio to be processed;

the segmentation submodule is used for performing role object segmentation on the target voice audio to obtain a voice audio segment; wherein each voice audio clip corresponds to a character object;

and the conversion sub-module is used for converting the voice audio fragment into a voice text fragment and forming the voice text to be processed according to the voice text fragment.

In one possible implementation manner of the second aspect, the positioning module includes:

the classification submodule is used for inputting the voice text segment of the voice text to be processed into a preset cnn model to obtain a quality inspection link corresponding to the voice text segment;

a determining submodule for determining the quality testing links of the same category and using the areas of the voice text segments corresponding to the quality testing links of the same category as a quality testing area

In a possible implementation manner of the second aspect, the determining the sub-module further includes:

the cluster analysis unit is used for carrying out cluster analysis on the quality inspection links of the same category to obtain a quality inspection link set;

the determining unit is used for determining a starting quality inspection link and an ending quality inspection link in the quality inspection link set;

a dividing unit for using the areas of the voice text segment corresponding to the initial quality inspection link, the intermediate quality inspection link between the initial quality inspection link and the ending quality inspection link as quality inspection differences

In a third aspect, an embodiment of the present application provides a correction device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a storage medium, and the computer program, when executed by a processor, implements the method according to the first aspect.

Compared with the prior art, the embodiment of the application has the advantages that:

after the voice text to be processed is obtained, the quality inspection area to be processed of the text to be processed is determined, the quality inspection area can be corrected according to the preset keywords, quality inspection can be conveniently carried out on the quality inspection area subsequently, the problem that in the prior art, voice calls of customer service staff and clients are converted into the voice text with low accuracy is solved, and the effect of improving the voice recognition accuracy is achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart illustrating a method for correcting a voice quality inspection area according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for correcting a voice quality inspection area according to an embodiment of the present application, before step S101 in fig. 1;

fig. 3 is a flowchart illustrating a specific process of step S202 in fig. 2 of a method for correcting a voice quality inspection area according to an embodiment of the present application;

fig. 4 is a schematic flowchart illustrating a specific process of step S102 in fig. 1 of a method for correcting a voice quality inspection area according to an embodiment of the present application;

fig. 5 is a flowchart illustrating a specific process of step 402 in fig. 4 of a method for correcting a voice quality inspection area according to an embodiment of the present application;

fig. 6 is a schematic flowchart illustrating a specific process of step S104 in fig. 1 of a method for correcting a voice quality inspection area according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a device for correcting a voice quality inspection area according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a calibration apparatus provided in an embodiment of the present application;

FIG. 9 is a diagram illustrating an example of a first comparison relationship between a voice segment and a quality inspection link in a method for correcting a voice quality inspection area according to an embodiment of the present application;

fig. 10 is a diagram illustrating a second correspondence between a quality inspection link and a speech text segment in the method for correcting a speech quality inspection region according to the embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The application scenario of the present application may be any scenario that requires quality inspection of voice, for example, a typical application scenario may be a scenario that performs quality inspection of voice calls between customer service personnel and customers in the insurance industry.

The technical solutions provided in the embodiments of the present application will be described below with specific embodiments.

Example one

Referring to fig. 1, a schematic flowchart of a method for correcting a voice quality inspection area provided in an embodiment of the present application is shown, by way of example and not limitation, the method may be applied to a correction device, where the correction device includes a terminal device or a server, and the method may include the following steps:

and S101, acquiring a voice text to be processed.

The voice text to be processed comprises at least one voice text segment.

In a possible manner, referring to fig. 2, a schematic flow chart of the method for correcting the voice quality inspection area provided in the embodiment of the present application before step S101 in fig. 1 is shown, where before obtaining the to-be-processed voice text, the method further includes:

step S201, obtaining the voice audio to be processed.

In a specific application, the embodiment of the present application may directly obtain the to-be-processed voice audio from the call center, and may also indirectly obtain the to-be-processed voice audio from the relay server, that is, the obtaining source of the to-be-processed voice audio in the embodiment of the present application is not limited. In addition, the number of the voice audio to be processed is not limited in the embodiment of the present application, for example, the number of the voice audio to be processed may be 500 voice calls between the customer service personnel and the customer.

Step S202, converting the voice audio to be processed into a voice text to be processed.

As an example and not by way of limitation, as shown in fig. 3, a specific flowchart of step S202 in fig. 2 of a method for correcting a voice quality inspection area provided in an embodiment of the present application is shown, where converting a voice audio to be processed into a voice text to be processed includes:

and S301, separating target voice audio in the voice audio to be processed.

It can be understood that the background noise and the target voice audio exist in the voice audio to be processed, and the background noise and the target voice audio need to be separated.

By way of example and not limitation, the specific process of separating out the target speech audio from the to-be-processed speech audio may be:

firstly, framing a voice audio to be processed to obtain an audio frame.

Secondly, calculating the energy of the audio frame according to the following formula:

，

wherein E is_nIs the energy of an audio frame, N is the time instant, x is a frame sample value, m is the average sound amplitude, and N is the window length.

And thirdly, screening out the audio frames with the energy larger than the energy threshold.

And fourthly, forming target voice audio according to the audio frames with the energy larger than the energy threshold value.

It can be understood that, in the embodiment of the present application, the target speech audio is separated by using the difference between the energy of the background noise and the target speech audio.

And step S302, performing role object segmentation on the target voice audio to obtain voice audio segments.

Wherein each voice audio clip corresponds to a character object.

In a specific application, in an application scene of quality inspection by voice call between a customer service person and a client in the insurance industry, the role object can comprise the customer service person and the client.

By way of example and not limitation, the character object segmentation is performed on the target voice audio, and a specific process of obtaining the voice audio segment may be:

firstly, all role objects corresponding to the target voice audio are determined.

For example, the role object of the embodiment of the present application may be a customer service person and a client in the insurance industry.

And secondly, searching a preset voice characteristic model corresponding to each role object.

The preset voice characteristic model is set in advance according to the voice characteristics of the role object.

For example, the preset speech feature model of the customer service personnel can be obtained by extracting speech feature values of the customer service personnel according to a Mel frequency cepstrum system (MFCC) and inputting the speech feature values into a speech feature model, such as a Gaussian mixture model for training. Correspondingly, the preset speech feature model of the customer can be obtained by extracting speech feature values of customer personnel according to Mel Frequency Cepstrum Coefficient (MFCC) and inputting the speech feature values into a speech feature model, such as a Gaussian mixture model for training

And thirdly, substituting the preset voice characteristic model corresponding to each role object into a preset function to calculate a jump prediction value.

For example, the predetermined function may be a likelihood function plus a penalty term.

And fourthly, taking the moment of the jump prediction value larger than the jump prediction threshold value as a jump point, and segmenting the target voice audio according to the jump point to obtain a voice audio segment.

It can be understood that, in the embodiment of the present application, by predicting the transition point of the speech audio segment, the speech audio segments corresponding to different character objects in the speech audio are segmented.

Step S303, converting the voice audio segment into a voice text segment, and forming a voice text to be processed according to the voice text segment.

And the voice audio segments correspond to the voice text segments one by one.

By way of example and not limitation, converting a speech audio segment to a speech text segment and forming a to-be-processed speech text from the speech text segment may be:

firstly, extracting a characteristic value of a voice audio segment.

And secondly, inputting the characteristic value into a preset acoustic model to obtain a voice characteristic vector sequence.

The preset acoustic model is obtained by training according to acoustic data and a voice feature vector sequence in advance.

And thirdly, inputting the voice feature vector sequence into a preset voice model to obtain a character sequence.

The preset voice model is obtained by training in advance according to the character sequence and the voice feature vector sequence.

And fourthly, forming a voice text to be processed according to the character sequence.

And S102, determining a quality inspection area of the voice text to be processed according to a quality inspection link corresponding to the voice text fragment.

Wherein the quality control region comprises at least one speech text segment.

As an example and not by way of limitation, referring to fig. 4, a specific flowchart of step S102 in fig. 1 of a method for correcting a voice quality inspection region provided in an embodiment of the present application is shown, and determining a quality inspection region of a to-be-processed voice text according to a quality inspection link corresponding to a voice text segment includes:

step S401, inputting the voice text segment of the voice text to be processed into a preset cnn model to obtain a quality inspection link corresponding to the voice text segment.

The preset cnn model is a neural network model obtained in advance according to the training of the voice text fragments and the corresponding quality testing links. It should be noted that the quality inspection link refers to a conversation progress between role objects, and for example, in an application scenario of performing quality inspection on a voice call between a customer service person and a client in the insurance industry, the quality inspection link represents the conversation progress between the customer service person and the client, and the quality inspection area is determined based on the quality inspection link in the embodiment of the present application, so that the voice call between the customer service person and the client can be subsequently performed quality inspection according to the quality inspection area.

For example, the following steps are carried out: as shown in fig. 9, which is an exemplary diagram of a first comparison relationship between a speech text segment and a quality inspection link in the method for correcting a speech quality inspection area according to the embodiment of the present application, wherein a first column is a sequence number of the speech text segment, and since the speech text segment corresponds to a speech audio segment one by one, the first column also represents a sequence number of the speech audio segment; the second column is a quality inspection link; the third column is the start-stop time of the speech text segment, and can also represent the start-stop time of the speech audio segment, which is the same as the first column; the fourth column is a speech text segment. Therefore, quality inspection link classification can be carried out on the voice text fragments through the preset cnn model, and quality inspection links corresponding to the voice text fragments are obtained.

And S402, determining quality inspection links of the same type, and taking the areas of the voice text segments respectively corresponding to the quality inspection links of the same type as a quality inspection area.

The categories of quality control links include, but are not limited to, product introduction, information verification, health notification, disclaimer, opening, warranty confirmation, hesitation, association of one, association of two, association of three, and the like.

It can be understood that the same link can appear in one-time quality inspection process, namely, in one-time call between customer service personnel and a customer, the region where the voice text segment respectively corresponding to the quality inspection links of the same category is located is used as a quality inspection region in the embodiment of the application, the effect of accurately and quickly positioning the quality inspection region is achieved, and convenience is provided for subsequent quality inspection according to the quality inspection region.

Exemplarily, referring to fig. 5, a specific flowchart of the voice quality inspection region correction method provided in the embodiment of the present application in step 402 in fig. 4 is shown, so as to determine quality inspection links of the same category, and taking the regions where the voice text segments respectively corresponding to the quality inspection links of the same category as a quality inspection region specifically includes:

step S501, performing cluster analysis on quality inspection links of the same category to obtain a quality inspection link set.

The cluster analysis refers to an analysis process of grouping a set of physical or abstract objects into a plurality of classes composed of similar objects, that is, collecting data for classification on the basis of similarity of information.

Specifically, a quality inspection link with the relevance greater than a preset relevance threshold is selected, and a quality inspection link set is constructed according to the quality inspection link with the relevance greater than the preset relevance threshold.

It can be understood that the relevance refers to the similarity between quality inspection links, and the selected relevance is greater than a preset relevance threshold value, which is equivalent to finding out quality inspection links of the same quality inspection link type.

And step S502, determining a starting quality inspection link and an ending quality inspection link in the quality inspection link set.

It should be noted that, in the prior art, the quality inspection link with the previous time sequence number is generally used as the initial quality inspection link in a quality inspection set, that is, the first-appearing quality inspection link is used as the initial quality inspection link in a quality inspection set, but since an error occurs by presetting a cnn model, the first-appearing quality inspection link is used as the initial quality inspection link and a deviation occurs,

exemplarily, the specific process of determining the initial quality inspection link and the ending quality inspection link in the quality inspection link set may be:

the method comprises the following steps of firstly, calculating the sequence number difference value of the sequence numbers of the voice text segments corresponding to the quality inspection links in the quality inspection link set.

Referring to fig. 10, in the method for correcting a voice quality inspection area according to the embodiment of the present application, an example diagram of a second corresponding relationship between a quality inspection link and a voice text segment is selected, and a second association quality inspection link set is selected, where the second association quality inspection link set includes a second association link with a serial number 131, a second association link with a serial number 136, a second association link with a serial number 137, and a second association link with a serial number 138.

And secondly, selecting a quality inspection link with a sequence number in the front sequence in two quality inspection links with a sequence number difference value smaller than a preset sequence number difference value threshold as an initial quality inspection link in a quality inspection link set.

Taking fig. 10 as an example, the preset serial number difference threshold is 2, the serial number difference between the association two link with serial number 131 and the association two link with serial number 136 is 5, which is greater than the preset serial number difference threshold, so that the association two link with serial number 131 cannot be used as the initial quality inspection link of the quality inspection link set, the serial number difference between the association two link with serial number 136 and the association two link with serial number 137 is 1, which is less than the preset serial number difference threshold, so that the association two link with serial number 136 can be used as the initial quality inspection link of the quality inspection link set.

And thirdly, determining a next initial quality inspection link of the next quality inspection link set according to the first step and the second step, and taking a quality inspection link before the next initial quality inspection link as an ending quality inspection link in the quality inspection link set in the second step.

Taking fig. 10 as an example, the association link with reference number 138 is the quality inspection end link.

It is understood that the ending quality inspection link of each quality inspection link set can be determined by the starting quality inspection link of the next quality inspection link set.

Step S503, the areas of the voice text segments respectively corresponding to the initial quality inspection link, the intermediate quality inspection link between the initial quality inspection link and the ending quality inspection link are used as quality inspection areas.

It can be understood that the voice text segments covered by the initial quality inspection link, the intermediate quality inspection link between the initial quality inspection link and the ending quality inspection link are used as the voice text segments covered by the quality inspection area.

Taking fig. 10 as an example, if the two links with serial numbers 136 are the initial quality inspection link, the two links with serial numbers 137 are the intermediate quality inspection link, and the two links with serial numbers 138 are the end quality inspection link, then the voice text fragments corresponding to the initial quality inspection link, the intermediate quality inspection link between the initial quality inspection link and the end quality inspection link, and the end quality inspection link are respectively "this indicates that you are going to worship for time and need to sign … …", and "the courier will sign the exclusive insurance … …" and "let you sign the insurance policy on three to four working days, and will take effect for liu yuan one number … … two-one-nine years.

And S103, determining a quality inspection area to be corrected in the quality inspection area according to the preset keywords.

The preset keywords may be keywords extracted from a standard conversational script, and generally, the keywords extracted from the standard conversational script correspond to quality inspection items. Illustratively, the preset keywords of the embodiment of the present application include a product name, customer information, polite voice, customer response words, and the like.

Specifically, a preset keyword is searched in the quality inspection area, and the quality inspection area without the preset keyword is used as the quality inspection area to be corrected.

And step S104, correcting the voice text segment of the quality inspection area to be corrected according to the complete text segment corresponding to the preset keyword.

The complete text segment refers to a complete text segment corresponding to a preset keyword in a standard dialect script.

It can be understood that, in the process of searching the preset keywords in the quality inspection area, because a search error exists or a conversion error occurs in the process of converting the voice text segment of the quality inspection area to be corrected from the audio frequency to the text, some quality inspection areas to be corrected, in which the preset keywords substantially exist, are determined as the areas to be corrected.

Exemplarily, referring to fig. 6, a specific flowchart of the step S104 in fig. 1 of the method for correcting a voice quality inspection region provided in the embodiment of the present application is shown, where the correcting the voice text segment of the quality inspection region to be corrected according to the complete text segment corresponding to the preset keyword includes:

step S601, a correctable quality inspection area in the quality inspection area to be corrected is screened out according to the complete text segment corresponding to the preset keyword.

Specifically, the specific process of screening out the correctable quality inspection area in the quality inspection area to be corrected according to the complete text segment corresponding to the preset keyword may be:

firstly, a window is constructed according to the length of the complete text fragment.

And secondly, performing sliding window matching on the voice text segment of the quality inspection area to be corrected according to the window by using a preset step length.

And thirdly, determining a correctable quality inspection area in the quality inspection area to be corrected according to the similarity between the voice text fragment and the complete text fragment of the quality inspection area to be corrected corresponding to the window.

It can be understood that, if the similarity between the speech text segment of the quality inspection area to be corrected corresponding to the window and the complete text segment reaches a similarity threshold value, for example, 90 percent, it may be determined that the preset keyword substantially exists in the speech text segment of the quality inspection area to be corrected corresponding to the window.

Illustratively, a first edit distance value of the voice text segment of the quality inspection area to be corrected corresponding to the window is calculated, a second edit distance value of the complete text segment is calculated, and a difference value between the first edit distance value and the second edit distance value is used as a similarity between the voice text segment of the quality inspection area to be corrected corresponding to the window and the complete text segment. Or, a first character string arrangement value of the voice text segment of the quality control region to be corrected corresponding to the window is calculated, a second character string arrangement value of the complete text segment is calculated, and a difference value between the first character string arrangement value and the second character string arrangement value is used as the similarity between the voice text segment of the quality control region to be corrected corresponding to the window and the complete text segment. Or, a first word vector matrix value of the voice text segment of the quality inspection area to be corrected corresponding to the window is calculated, a second word vector matrix value of the complete text segment is calculated, and a difference value between the first word vector matrix value and the second word vector matrix value is used as the similarity between the voice text segment of the quality inspection area to be corrected corresponding to the window and the complete text segment.

Step S602, identifying the keywords to be corrected in the correctable quality inspection area, and correcting the keywords to be corrected.

Specifically, the keywords to be corrected, which are substantially the same as the preset keywords, can be found out by the edit distance, and the keywords to be corrected are corrected.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 7 is a block diagram of a structure of a device for correcting a voice quality inspection region according to an embodiment of the present application, which corresponds to the method for correcting a voice quality inspection region according to the foregoing embodiment.

Referring to fig. 7, the correction device includes:

an obtaining module 71, configured to obtain a to-be-processed speech text; wherein the voice text to be processed comprises at least one voice text segment;

the positioning module 72 is configured to determine a quality inspection area of the to-be-processed voice text according to a quality inspection link corresponding to the voice text segment;

the dividing module 73 is used for determining a quality inspection area to be corrected in the quality inspection area according to a preset keyword;

and the correcting module 74 is configured to correct the voice text segment of the quality inspection region to be corrected according to the complete text segment corresponding to the preset keyword.

In one possible implementation, the correction module further includes:

In one possible implementation, the screening submodule further includes:

In one possible implementation, the correction module further includes:

the audio acquisition module is used for acquiring voice audio to be processed;

In one possible implementation, the conversion module includes:

In one possible implementation, the positioning module includes:

In one possible implementation, the determining the sub-module further includes:

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

Fig. 8 is a schematic structural diagram of a calibration apparatus according to an embodiment of the present application. As shown in fig. 8, the correction device 8 of this embodiment includes: at least one processor 80, a memory 81 and a computer program 82 stored in the memory 81 and operable on the at least one processor 80, the processor 80 implementing the steps in the above-mentioned embodiment of the method for correcting the voice quality inspection area when executing the computer program 82.

The correction device 8 may be a terminal device or a computing device such as a server.

The Processor 80 may be a Central Processing Unit (CPU), and the Processor 80 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 81 may in some embodiments be an internal storage unit of the correction device 8, such as a hard disk or a memory of the correction device 8.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for correcting a voice quality inspection area is characterized by comprising the following steps:

determining a quality inspection area of the voice text to be processed according to a quality inspection link corresponding to the voice text fragment, wherein the quality inspection area comprises: inputting the voice text segment of the voice text to be processed into a preset cnn model to obtain a quality inspection link corresponding to the voice text segment, determining the quality inspection links of the same type, and taking the areas of the voice text segment corresponding to the quality inspection links of the same type as a quality inspection area;

2. The method for correcting the voice quality inspection region according to claim 1, wherein the step of correcting the voice text segment of the quality inspection region to be corrected according to the complete text segment corresponding to the preset keyword comprises the steps of:

screening a correctable quality inspection area in the quality inspection area to be corrected according to the similarity between the complete text segment and the voice text segment of the quality inspection area to be corrected, wherein the complete text segment is the complete text segment corresponding to a preset keyword in a standard dialect script;

3. The method for correcting the voice quality inspection area according to claim 2, wherein the step of screening out the correctable quality inspection area in the quality inspection area to be corrected according to the similarity between the complete text segment and the voice text segment of the quality inspection area to be corrected comprises:

constructing a window according to the length of the complete text segment;

4. The method for correcting the voice quality inspection area according to claim 1, wherein before the obtaining the voice text to be processed, the method further comprises:

acquiring voice audio to be processed;

5. The method for correcting the voice quality inspection area according to claim 4, wherein the converting the voice audio to be processed into the voice text to be processed comprises:

6. The method for correcting the voice quality inspection area according to claim 1, wherein the determining the quality inspection links of the same category, and using the areas where the voice text segments respectively corresponding to the quality inspection links of the same category are located as a quality inspection area, comprises:

7. A device for correcting a voice quality inspection region, comprising:

the correction module is used for correcting the voice text segment of the quality inspection area to be corrected according to the complete text segment corresponding to the preset keyword;

the positioning module includes:

and the determining submodule is used for determining the quality testing links of the same type and taking the areas of the voice text segments respectively corresponding to the quality testing links of the same type as a quality testing area.

8. A correction device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the computer program.

9. A storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any one of claims 1 to 6.