CN112468658A

CN112468658A - Voice quality detection method and device, computer equipment and storage medium

Info

Publication number: CN112468658A
Application number: CN202011310497.9A
Authority: CN
Inventors: 汪淼
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-03-09
Anticipated expiration: 2040-11-20
Also published as: CN112468658B

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a voice quality detection method which comprises the steps of obtaining text data of a voice file to be detected and preset violation corpora, and calculating semantic similarity between the text data and the violation corpora based on a preset discrimination model; taking the text data with the semantic similarity larger than or equal to a preset threshold as suspicious data, acquiring a node information field corresponding to the current suspicious data, and determining a quality inspection point corresponding to the suspicious data according to the node information field; and acquiring a configuration dictionary corresponding to the quality inspection point, and detecting the suspicious data according to key values configured in the configuration dictionary to obtain a quality inspection result. The application also provides a voice quality detection device, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and the quality detection result can be stored in the block chain. The application realizes the high-efficiency detection of the voice quality.

Description

Voice quality detection method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for detecting speech quality, a computer device, and a storage medium.

Background

The voice quality detection is an important supervision system applied to each data transmission platform, and transmission of illegal data can be reduced by performing quality detection on voice, so that the voice data is safer and more reliable in the transmission process.

Traditional voice quality detection often needs to invest in a large amount of manpower, and whether voice data violates rules or not is mainly judged in a mode of listening to recording manually. However, since there are a plurality of quality inspection points and each quality inspection point needs to perform quality inspection, when inspecting the same type of quality inspection item, there are often complex and detailed determination rules in the same type of quality inspection item. Therefore, when the quality of the speech is detected, it is usually necessary to use huge manpower and material resources to memorize and analyze the speech text, which eventually leads to the problem of low efficiency of speech quality detection.

Disclosure of Invention

An embodiment of the present application provides a method and an apparatus for detecting voice quality, a computer device, and a storage medium, so as to solve the technical problem of low efficiency of voice quality detection.

In order to solve the above technical problem, an embodiment of the present application provides a voice quality detection method, which adopts the following technical solutions:

acquiring text data of a voice file to be detected and a preset violation corpus, and calculating to obtain semantic similarity between the text data and the violation corpus based on a preset discrimination model;

taking the text data with the semantic similarity larger than or equal to a preset threshold as suspicious data, acquiring a node information field corresponding to the current suspicious data, and determining a quality inspection point corresponding to the suspicious data according to the node information field;

and acquiring a configuration dictionary corresponding to the quality inspection point, and detecting the suspicious data according to key values configured in the configuration dictionary to obtain a quality inspection result.

Further, the preset discriminant model includes a coding feature layer, a bidirectional long-short term memory network layer, a pooling layer, a fitting-down layer and a full connection layer, and the step of calculating the semantic similarity between the text data and the violation corpus based on the preset discriminant model specifically includes:

mapping character coding, attention mask coding and position coding on the text data and the violation corpus to obtain a first vector, a second vector and a third vector which correspond to each other;

and superposing the first vector, the second vector and the third vector to obtain an input vector, inputting the input vector to the coding feature layer, and outputting to obtain the semantic similarity between the text data and the violation corpus through the bidirectional long-short term memory network layer, the pooling layer, the fitting layer and the full-connection layer.

Further, the key values include a first category key value and a second category key value, and the step of detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result specifically includes:

acquiring field information of the suspicious data, and determining the suspicious data to be compliance information when the field information is successfully matched with the detection field under the first class key value and the field information is not matched with the detection field under the second class key value completely;

and when the field information is successfully matched with the detection field under the second class key value, determining that the suspicious data is violation information.

Further, after the step of detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result, the method further includes:

when the suspicious data has a plurality of quality inspection points of different types, acquiring the quality inspection results of all the quality inspection points;

and calculating the quality inspection score of the suspicious data according to the quality inspection result, and generating a quality inspection evaluation table of the suspicious data according to the quality inspection score and the quality inspection result.

Further, before the step of calculating the semantic similarity between the text data and the violation corpus based on a preset discriminant model, the method further includes:

acquiring a corpus in a public text data set and a violation scene corpus;

and calculating a loss function of the basic discriminant model according to the public text data set and the corpus set, and determining the basic discriminant model as the preset discriminant model when the loss function is converged.

Further, the step of calculating the loss function of the basic discriminant model according to the public text data set and the corpus set specifically includes:

inputting the public text data set and the corpus set into the basic discrimination model, and calculating to obtain training semantic similarity;

and obtaining the standard semantic similarity of the public text data set and the corpus, and calculating to obtain a loss function of the basic discriminant model according to the training semantic similarity and the standard semantic similarity.

Further, before the step of training the basic discriminant model according to the public text data set and the corpus set, the method further includes:

acquiring a preset migration model, and taking a sequence output layer of the preset migration model as an initial coding feature layer of the basic discrimination model;

and connecting the initial coding feature layer, the initial bidirectional long-short term memory network layer, the initial pooling layer, the initial descending fit layer and the initial full-connection layer to obtain a basic network of the basic discriminant model.

In order to solve the above technical problem, an embodiment of the present application further provides a voice quality detection apparatus, which adopts the following technical solutions:

the calculation module is used for acquiring text data of a voice file to be detected and a preset violation corpus, and calculating semantic similarity between the text data and the violation corpus based on a preset discrimination model;

the confirming module is used for taking the text data with the semantic similarity larger than or equal to a preset threshold as suspicious data, acquiring a node information field corresponding to the current suspicious data, and determining a quality inspection point corresponding to the suspicious data according to the node information field;

and the detection module is used for acquiring the configuration dictionary corresponding to the quality inspection point and detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory and a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor implements the steps of the voice quality detection method when executing the computer readable instructions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored, and when executed by a processor, the computer-readable instructions implement the steps of the voice quality detection method described above.

According to the voice quality detection method, the text data of the voice file to be detected and the preset violation corpus are obtained, the semantic similarity between the text data and the violation corpus is calculated based on the preset discrimination model, and the semantic similarity of the text data can be automatically judged through the preset discrimination model; then, taking the text data with the semantic similarity larger than or equal to a preset threshold as suspicious data, acquiring a node information field corresponding to the current suspicious data, determining a quality inspection point corresponding to the suspicious data according to the node information field, and further detecting the suspicious data through the quality inspection point, so that errors of data quality detection are avoided, and accurate detection on voice quality is ensured; and then, acquiring a configuration dictionary corresponding to the quality inspection point, detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result, realizing automatic intelligent detection on the voice data quality, improving the efficiency of the voice data quality, judging the accuracy of the voice data, avoiding the occurrence of voice violation data, and standardizing the use of the voice data in the intelligent customer service.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a voice quality detection method according to the present application;

FIG. 3 is a schematic block diagram of one embodiment of a speech quality detection apparatus according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Reference numerals: the device comprises a voice quality detection device 300, a calculation module 301, a confirmation module 302 and a detection module 303.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the voice quality detection method provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the voice quality detection apparatus is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to FIG. 2, a flow diagram of one embodiment of a method of speech quality detection according to the present application is shown. The voice quality detection method comprises the following steps:

step S201, acquiring text data of a voice file to be detected and a preset violation corpus, and calculating to obtain semantic similarity between the text data and the violation corpus based on a preset discrimination model;

in this embodiment, the text data and the violation corpus of the voice file to be detected are obtained, that is, after the voice file to be detected is obtained, the voice file to be detected is converted into the text data, and meanwhile, the prestored violation corpus is obtained. And determining the semantic similarity between the text data and the violation corpus based on a preset discrimination model, wherein the preset discrimination model is a preset semantic similarity discrimination model. The preset discrimination model adopts a Bert-base model as a preset migration model, and a sequence output layer of the preset migration model is used as a coding feature layer of the current preset discrimination model, wherein the Bert-base model is a basic model in Bert (Bidirectional Encoder retrieval from converters). When the text data and the illegal corpus are obtained, an input vector is obtained by encoding the input text data and the illegal corpus, the input vector is input to a coding feature layer of a current preset discrimination model, and then the semantic similarity of the current text data and the illegal corpus is obtained by outputting through a bidirectional long-short term memory network layer connected with the coding feature layer, an average pooling layer, a maximum pooling layer, a descending fit layer and a full connection layer.

Step S202, taking the text data with the semantic similarity larger than or equal to a preset threshold as suspicious data, acquiring a node information field corresponding to the current suspicious data, and determining a quality inspection point corresponding to the suspicious data according to the node information field;

in this embodiment, when obtaining the semantic similarity between the current text data and the violation corpus, the text data with the semantic similarity being greater than or equal to the preset threshold is taken as suspicious data. And acquiring a node information field of the suspicious data, wherein the node information field is field information of a process node where the current suspicious data is located, and determining a quality inspection point associated with the current suspicious data according to the node information field. Different node information fields may correspond to different quality inspection points, and different quality inspection points correspond to different quality inspection conditions. When the node information field corresponding to the suspicious data is obtained, the quality inspection point associated with the current node information field is obtained, and the suspicious data can be detected according to the quality inspection point.

Step S203, obtaining a configuration dictionary corresponding to the quality inspection point, and detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result.

In this embodiment, when a quality inspection point corresponding to suspicious data is obtained, a configuration dictionary corresponding to the quality inspection point is obtained, and each different quality inspection point is configured with a different configuration dictionary. And detecting the suspicious data according to the configuration dictionary to obtain a quality inspection result. Specifically, each quality inspection point is configured with a key value, such as "must _ no" and "must _ have", and different detection fields are included under different classes of key values. And matching the detection field under each key value with the field information of the suspicious data to obtain a quality inspection result of the current suspicious data under the quality inspection point, wherein the quality inspection result reflects whether the current suspicious data is violation information. When the field information of the suspicious data is successfully matched with the detection field of the key value under the forbidden matching category, the suspicious data is determined to be violation information; and determining that the voice file to be detected is violation information if the quality inspection result of the text data of the current voice file to be detected is the violation information.

It is emphasized that the quality inspection result can also be stored in a node of a block chain in order to further ensure the privacy and security of the quality inspection result.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The embodiment realizes automatic intelligent detection of voice data quality, improves the efficiency of voice data quality and the accuracy of voice data discrimination, avoids voice violation data and standardizes the use of voice data in intelligent customer service.

In some embodiments of the present application, the preset discriminant model includes a coding feature layer, a bidirectional long-term and short-term memory network layer, a pooling layer, a fitting-down layer, and a full connection layer, and the calculating to obtain the semantic similarity between the text data and the violating corpus based on the preset discriminant model includes:

In this embodiment, when obtaining text data, the text data and the violation corpus are simultaneously mapped to character codes, attention masks, and position codes to obtain corresponding first, second, and third vectors. The mapping character coding is to convert words in the text data and the violation corpus into a vector representation form with fixed dimensions, and the text data and the violation corpus are subjected to mapping character coding to obtain a first vector; the attention mask is used for coding the text data and the violation corpus by a word source with only two words so as to distinguish the words in the text data from the words in the violation corpus and obtain a second vector by the text data and the violation corpus through the attention mask; and the position coding is used for coding the positions of the words in the text data and the illegal corpus, and the third vector is obtained by the text data and the illegal corpus through the position coding. And superposing the first vector, the second vector and the third vector to obtain an input vector. And inputting the input vector into a coding feature layer of a preset discrimination model, taking the output of the previous layer of network as the input of the next layer of network, and outputting to obtain the semantic similarity of the text data and the violation corpus according to the sequence of the coding feature layer, the bidirectional long-short term memory network layer, the pooling layer, the fitting layer and the full connection layer.

According to the embodiment, the semantic similarity between the text data and the illegal corpus is judged through the preset judgment model, so that the text data is accurately judged, and the efficiency and the accuracy of voice data quality inspection are further improved.

In some embodiments of the present application, the key values include a first category key value and a second category key value, and detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result includes:

In this embodiment, the key values in the configuration dictionary may be divided into a first category key value and a second category key value, where the first category key value indicates that the detection field under the first category key value is a compliant field, and the second category key value indicates that the detection field under the second category key value is a violation field. Matching the field information of the suspicious data with the detection field under the first class key value and the detection field under the second class key value; if the field information of the suspicious data is successfully matched with any detection field under the first class key value, determining that the field information of the suspicious data is successfully matched with the detection field under the first class key value, and the field information of the suspicious data is not completely matched with the detection field under the second class key value, determining that the suspicious data is compliance information; if the field information of the suspicious data is successfully matched with the detection field under the second class key value, namely if the field information of the suspicious data is successfully matched with any detection field under the second class key value, determining that the suspicious data is violation information; and if the field information of the suspicious data is not matched with the detection fields under the first class key value and the second class key value, determining that the suspicious data is invalid information.

According to the embodiment, by performing matching judgment on the suspicious data, whether the suspicious data is violation information or not is further accurately judged, the accuracy of voice data quality inspection is ensured, and possible errors of voice data quality inspection results are avoided.

In some embodiments of the present application, after detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result, the method further includes:

In this embodiment, each piece of suspicious data may have a plurality of quality inspection points of different types at the same time, and when the piece of suspicious data has a plurality of quality inspection points of different types, the quality inspection results of all the quality inspection points and the rating scores corresponding to the quality inspection results are obtained. The rating scores and the quality inspection results have a mapping relation table, the rating scores corresponding to the current quality inspection results can be obtained according to the mapping relation table, and different quality inspection results can correspond to different rating scores. And when the rating scores corresponding to the quality inspection results are obtained, weighting and summing the weight scores of all classes of the quality inspection points corresponding to the suspicious data and the rating scores of the quality inspection points, and finally calculating the quality inspection scores of the suspicious scores. And generating a quality inspection evaluation table of the suspicious data according to the quality inspection score and the quality inspection result.

According to the quality inspection evaluation table, the quality inspection result is obtained, so that the total quality inspection condition of the voice data can be accurately acquired when the voice data has a plurality of quality inspection points, and the quality inspection condition of the current voice data can be rapidly known through the quality inspection evaluation table.

In some embodiments of the present application, before the calculating the semantic similarity between the text data and the violation corpus based on the preset discriminant model, the method further includes:

acquiring a corpus in a public text data set and a violation scene corpus;

In this embodiment, before detecting the text data through the preset discriminant model, a corpus in a public text data set and a corpus in the violation scene corpus needs to be obtained, and a basic discriminant model is trained according to the public text data set and the corpus to obtain the preset discriminant model. Specifically, the public text data set is a collected voice text data set, and the corpus in the violation scene corpus is a collected violation corpus data set; and calculating the semantic similarity of the data in the public text data set and the data in the corpus set according to the cosine similarity, and taking the semantic similarity as the standard semantic similarity. Training a basic discrimination model according to the standard semantic similarity, namely adjusting parameters of the basic discrimination model; and when the loss function calculated according to the basic discrimination model after the parameter adjustment is converged, determining that the basic discrimination model is trained, wherein the trained basic discrimination model is the preset discrimination model.

In the embodiment, the basic discrimination model is trained in advance, so that the preset discrimination model obtained through training can quickly and accurately judge the text data of the voice data, and further, the automatic detection of the voice data quality is realized.

In some embodiments of the present application, the calculating the loss function of the basic discriminant model according to the public text data set and the corpus set includes:

In this embodiment, when the public text data set and the corpus set are obtained, a standard semantic similarity between data in the public text data set and data in the corpus set is calculated, and the standard semantic similarity can be obtained by calculating a cosine similarity between a word vector of the data in the public text data set and a word vector of the data in the corpus set. Meanwhile, training semantic similarity of the public text data set data and the corpus data set data obtained by calculation based on the basic discrimination model is calculated according to the training semantic similarity and the standard semantic similarity to obtain a loss function of the basic discrimination model; and when the loss function is converged, determining that the training of the basic discrimination model is finished, and obtaining the trained basic discrimination model as a preset discrimination model.

According to the embodiment, the processing precision of the preset discrimination model is improved by training the basic discrimination model, so that the text data can be accurately judged through the preset discrimination model.

In some embodiments of the present application, before the training of the basic discriminant model according to the public text data set and the corpus, the method further includes:

In this embodiment, the basic network of the basic discriminant model is composed of an initial coding feature layer, an initial bidirectional long-short term memory network layer, an initial pooling layer, an initial descent fit layer, and an initial full connection layer. Acquiring a preset migration model, wherein the preset migration model adopts an open bert-base model, and the bert-base is a network structure of 12 attention heads and 12 hidden layers. And taking the sequence output layer of the preset migration model as an initial coding feature layer of a basic discrimination model, and taking the initial coding feature layer as an input layer of the text data when the text data is obtained. And sequentially connecting the initial coding feature layer, the initial bidirectional long-short term memory network layer, the initial pooling layer, the initial descending fit layer and the initial full-connection layer to obtain the basic network of the basic discrimination model. The initial pooling layer comprises an average pooling layer and a maximum pooling layer, and the average pooling layer and the maximum pooling layer are spliced to obtain the initial pooling layer; and the initial fitting layer adopts a dropout algorithm, and the model overfitting can be prevented through the dropout algorithm. In addition, the initial coding feature layer, the initial bidirectional long-short term memory network layer, the initial pooling layer, the initial descending fit layer and the initial full connection layer of the basic discriminant model are the same as the coding feature layer, the bidirectional long-short term memory network layer, the pooling layer, the descending fit layer and the full connection layer in the preset discriminant model obtained after training, but have different parameters.

In the embodiment, the sequence output layer of the preset migration model is extracted and used as the coding feature layer of the basic discrimination model, so that the structure of the model is perfected, and the generalization capability and the data processing capability of the model are improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a speech quality detection apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 3, the voice quality detection apparatus 300 according to the present embodiment includes: a calculation module 301, a confirmation module 302 and a detection module 303. Wherein:

the calculation module 301 is configured to obtain text data of a voice file to be detected and a preset violation corpus, and calculate a semantic similarity between the text data and the violation corpus based on a preset discrimination model;

wherein, the preset discriminant model includes a coding feature layer, a bidirectional long-short term memory network layer, a pooling layer, a descent fit layer and a full connection layer, and the calculation module 301 includes:

the coding unit is used for mapping character coding, attention mask coding and position coding on the text data and the violation corpus to obtain a corresponding first vector, a second vector and a third vector;

and the judging unit is used for superposing the first vector, the second vector and the third vector to obtain an input vector, inputting the input vector to the coding feature layer, and outputting the input vector to obtain the semantic similarity between the text data and the violation corpus through the bidirectional long-short term memory network layer, the pooling layer, the fitting-down layer and the full-connection layer.

A confirming module 302, configured to take the text data with the semantic similarity greater than or equal to a preset threshold as suspicious data, obtain a node information field corresponding to the current suspicious data, and determine a quality inspection point corresponding to the suspicious data according to the node information field;

The detection module 303 is configured to obtain a configuration dictionary corresponding to the quality inspection point, and detect the suspicious data according to a key value configured in the configuration dictionary to obtain a quality inspection result.

Wherein the key values include a first category key value and a second category key value, the detecting module 303 includes:

the matching unit is used for acquiring field information of the suspicious data, and determining that the suspicious data is compliance information when the field information is successfully matched with the detection field under the first class key value and the field information is not matched with the detection field under the second class key value completely;

and the first confirmation unit is used for determining that the suspicious data is violation information when the field information is successfully matched with the detection field under the second class key value.

The voice quality detection apparatus proposed in this embodiment further includes:

the first acquisition module is used for acquiring the quality inspection results of all the quality inspection points when the suspicious data has a plurality of quality inspection points of different types;

and the generating module is used for calculating the quality inspection score of the suspicious data according to the quality inspection result and generating a quality inspection evaluation table of the suspicious data according to the quality inspection score and the quality inspection result.

The second acquisition module is used for acquiring the public text data set and the corpus of the violation scene corpus;

and the training module is used for calculating a loss function of the basic discriminant model according to the public text data set and the corpus set, and determining the basic discriminant model as the preset discriminant model when the loss function is converged.

A third obtaining module, configured to obtain a preset migration model, and use a sequence output layer of the preset migration model as an initial coding feature layer of the basic discrimination model;

and the connection module is used for connecting the initial coding feature layer, the initial bidirectional long-short term memory network layer, the initial pooling layer, the initial descending fit layer and the initial full connection layer to obtain a basic network of the basic discrimination model.

Wherein the training module comprises:

the first calculation unit is used for inputting the public text data set and the corpus set into the basic discrimination model and calculating to obtain training semantic similarity;

and the second calculation unit is used for acquiring the standard semantic similarity of the public text data set and the corpus, and calculating to obtain a loss function of the basic discriminant model according to the training semantic similarity and the standard semantic similarity.

The voice quality detection device provided by the embodiment realizes automatic intelligent detection on voice data quality, improves the efficiency of the voice data quality and the accuracy rate of voice data discrimination, avoids voice violation data, and standardizes the use of voice data in intelligent customer service.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as computer readable instructions of a voice quality detection method. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute computer readable instructions stored in the memory 61 or process data, such as computer readable instructions for executing the voice quality detection method.

The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The computer equipment provided by the embodiment realizes automatic intelligent detection on the voice data quality, improves the efficiency of the voice data quality and the accuracy rate of voice data judgment, avoids the occurrence of voice violation data, and standardizes the use of voice data in intelligent customer service.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the voice quality detection method as described above.

The computer-readable storage medium provided by the embodiment realizes automatic intelligent detection on voice data quality, improves the efficiency of voice data quality and the accuracy of voice data discrimination, avoids voice violation data, and standardizes the use of voice data in intelligent customer service.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A voice quality detection method is characterized by comprising the following steps:

2. The speech quality detection method according to claim 1, wherein the preset discriminant model includes a coding feature layer, a bidirectional long-short term memory network layer, a pooling layer, a fitting-down layer, and a full connection layer, and the step of calculating the semantic similarity between the text data and the violation corpus based on the preset discriminant model specifically includes:

3. The voice quality detection method according to claim 1, wherein the key values include a first category key value and a second category key value, and the step of detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result specifically includes:

4. The method according to claim 1, wherein after the step of detecting the suspicious data according to the key values configured in the configuration dictionary to obtain a quality inspection result, the method further comprises:

5. The method according to claim 1, wherein before the step of calculating the semantic similarity between the text data and the violation corpus based on a preset discriminant model, the method further comprises:

acquiring a corpus in a public text data set and a violation scene corpus;

6. The method according to claim 5, wherein the step of calculating the loss function of the basic discriminant model according to the public text data set and the corpus specifically comprises:

7. The method according to claim 5, further comprising, before the step of training a basic discriminant model according to the public text data set and the corpus, the steps of:

8. A speech quality detection apparatus, comprising:

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the speech quality detection method according to any one of claims 1 to 7.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the speech quality detection method according to any one of claims 1 to 7.