CN112669850A - Voice quality detection method and device, computer equipment and storage medium - Google Patents

Voice quality detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112669850A
CN112669850A CN202011540509.7A CN202011540509A CN112669850A CN 112669850 A CN112669850 A CN 112669850A CN 202011540509 A CN202011540509 A CN 202011540509A CN 112669850 A CN112669850 A CN 112669850A
Authority
CN
China
Prior art keywords
text
violation
recognition model
interactive
suspected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011540509.7A
Other languages
Chinese (zh)
Inventor
邓真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202011540509.7A priority Critical patent/CN112669850A/en
Publication of CN112669850A publication Critical patent/CN112669850A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a voice quality detection method which comprises the steps of converting a recording file into text data when the recording file is received; splitting text data into a plurality of interactive texts; the method comprises the steps of obtaining a preset intention recognition model, inputting an interactive text to the first recognition model, and determining whether the interactive text is a suspected illegal text according to the first recognition model, wherein the intention recognition model comprises a first recognition model and a second recognition model; and when the interactive text is the suspected violation text, determining whether the suspected violation text is the target violation text according to the second recognition model and the text data, and when the second recognition model detects the violation intention, determining that the suspected violation text is the target violation text. The application also provides a voice quality detection device, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and the interactive text can be stored in the block chain. The application improves the accuracy of voice quality detection.

Description

Voice quality detection method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for detecting speech quality, a computer device, and a storage medium.
Background
The voice quality detection is an important supervision system applied to each data transmission platform, and transmission of illegal data can be reduced by performing quality detection on voice, so that the voice data is safer and more reliable in the transmission process.
The traditional voice quality detection usually needs to invest a large amount of manpower, and whether the seat dialect is violated or not is mainly judged by adopting a manual recording listening mode. However, the quality inspection data is huge, the sampling inspection proportion can only reach 1% -2%, the processing time is low, the hysteresis is provided, and the problem of low detection accuracy is often caused when the quality inspection is carried out manually.
Disclosure of Invention
An embodiment of the present application provides a method, an apparatus, a computer device, and a storage medium for voice quality detection, so as to solve the technical problem of low accuracy of voice quality detection.
In order to solve the above technical problem, an embodiment of the present application provides a voice quality detection method, which adopts the following technical solutions:
when receiving a recording file, converting the recording file into text data;
splitting the text data into a plurality of interactive texts;
acquiring a preset intention recognition model, wherein the intention recognition model comprises a first recognition model and a second recognition model, inputting the interactive text to the first recognition model, and determining whether the interactive text is a suspected illegal text according to the first recognition model;
and when the interactive text is the suspected violation text, determining whether the suspected violation text is a target violation text according to the second recognition model and the text data, and when the second recognition model detects a violation intention, determining that the suspected violation text is the target violation text.
Further, the step of splitting the text data into a plurality of interactive texts specifically includes:
identifying a character tag and a dialog period in the text data;
classifying the text data according to the role labels and the conversation period, and determining that the text data which is in the same conversation period and comprises all the role labels is the interactive text.
Further, the step of determining whether the interactive text is a suspected violation text according to the first recognition model specifically includes:
inputting the interactive text to the first recognition model to obtain a text label of the interactive text;
and matching the text label with a preset label, and determining the interactive text as the suspected violation text when the text label is successfully matched with the preset label.
Further, the step of determining whether the suspected violation text is a target violation text according to the second recognition model and the text data specifically includes:
inputting the suspected violation text into a preset binary identification model, and determining the suspected violation text as a middle violation text when the preset binary identification model outputs a violation label;
and acquiring the text type of the illegal intermediate text, and determining whether the illegal intermediate text is a target illegal text based on the second recognition model and the text type.
Further, the step of determining whether the violation text is a target violation text based on the second recognition model and the text type specifically includes:
when the text type is suspected to be combined violation, acquiring a detection text of the middle violation text in the text data within a neighboring preset time range;
and inputting the detection text into a second recognition model, and determining the intermediate violation text as the target violation text when the detection text is detected to have violation intention.
Further, the step of determining whether the violation text is a target violation text based on the second recognition model and the text type specifically includes:
when the text type is suspected semantic violation, inputting the text data to the second recognition model;
when the violation intent is detected to be present in the text data, determining that the intervening violation text is the target violation text.
Further, after the step of determining whether the interactive text is suspected illegal text, the method further includes:
and when the interactive text is not the suspected violation text, obtaining a quality inspection point of the interactive text, and detecting the interactive text based on the quality inspection point.
In order to solve the above technical problem, an embodiment of the present application further provides a voice quality detection apparatus, which adopts the following technical solutions:
the conversion module is used for converting the sound recording file into text data when the sound recording file is received;
the splitting module is used for splitting the text data into a plurality of interactive texts;
the first confirming module is used for acquiring a preset intention recognition model, the intention recognition model comprises a first recognition model and a second recognition model, the interactive text is input into the first recognition model, and whether the interactive text is a suspected illegal text is determined according to the first recognition model;
and the second confirming module is used for determining whether the suspected violation text is a target violation text according to the second recognition model and the text data when the interactive text is the suspected violation text, and determining the suspected violation text as the target violation text when the second recognition model detects a violation intention.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory and a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor implements the steps of the voice quality detection method when executing the computer readable instructions.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored, and when executed by a processor, the computer-readable instructions implement the steps of the voice quality detection method described above.
According to the voice quality detection method, when the recording file is received, the recording file is converted into text data; the text data is split into a plurality of interactive texts, and the detection accuracy and detection efficiency of illegal texts can be improved by detecting a single interactive text; then, a preset intention recognition model is obtained, the intention recognition model comprises a first recognition model and a second recognition model, the interactive text is input into the first recognition model, whether the interactive text is suspected illegal text or not is determined according to the first recognition model, and the detection accuracy of the illegal text can be further improved through model detection; and then, when the interactive text is the suspected illegal text, determining whether the suspected illegal text is the target illegal text according to the second recognition model and the text data, and when the second recognition model detects an illegal intention, determining that the suspected illegal text is the target illegal text, so that the high-efficiency and accurate detection of the voice quality is realized, the recognition accuracy of illegal sentences is improved, and whether the sentences are illegal or not is detected through the model, and the use of illegal data is further avoided.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a voice quality detection method according to the present application;
FIG. 3 is a schematic block diagram of one embodiment of a speech quality detection apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Reference numerals: the voice quality detection device 300, a conversion module 301, a splitting module 302, a first confirmation module 303 and a second confirmation module 304.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the voice quality detection method provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the voice quality detection apparatus is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to FIG. 2, a flow diagram of one embodiment of a method of speech quality detection according to the present application is shown. The voice quality detection method comprises the following steps:
step S201, when receiving a sound recording file, converting the sound recording file into text data;
in this embodiment, the recording file is a dialogue interaction recording file, and when the recording file is received, the recording file is converted into text data. The conversion may be performed by ASR (Automatic Speech Recognition), specifically, when a recording file is obtained, the recording file is preprocessed to obtain a Speech feature, and then a preset Speech template and the Speech feature are identified and compared by a Speech Recognition model to obtain a Recognition result, where the Recognition result is the obtained text data.
Step S202, splitting the text data into a plurality of interactive texts;
in this embodiment, when obtaining text data, the text data is split into a plurality of interactive texts, where the interactive texts are interactive text data between dialogues. The current text data is identified, and the text data can be split into a plurality of interactive texts according to the time sequence of the conversations in the text data. When the interactive texts are obtained, the interactive texts can be numbered in sequence according to the time sequence and then stored in a preset database.
It is emphasized that the interactive text may also be stored in a node of a blockchain in order to further ensure the privacy and security of the interactive text.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Step S203, acquiring a preset intention recognition model, wherein the intention recognition model comprises a first recognition model and a second recognition model, inputting the interactive text to the first recognition model, and determining whether the interactive text is a suspected illegal text according to the first recognition model;
in this embodiment, the intention recognition model is a preset text intention recognition model, and the intention recognition model includes a first recognition model and a second recognition model, where the first recognition model is a multi-classification recognition model in bert (pre-training language model), that is, text labels corresponding to all interactive texts can be obtained through the first recognition model; the second recognition model is a preset text detection model, and text detection can be performed on input data through the second recognition model. Specifically, when the interactive text corresponding to the text data is obtained, all the obtained interactive texts are input into the first recognition model, and the text labels corresponding to the interactive text are obtained through recognition of the first recognition model. The text labels are classification labels, and the violation type related to the current interactive text can be determined according to the text labels. If the text label of the current interactive text obtained through the first recognition model is one of preset labels, determining that the interactive text is a suspected illegal text; and if the text label of the current interactive text obtained through the first recognition model recognition does not belong to one of preset labels, determining that the interactive text is not suspected to be the illegal text.
Step S204, when the interactive text is the suspected violation text, determining whether the suspected violation text is a target violation text according to the second recognition model and the text data, and when the second recognition model detects a violation intention, determining that the suspected violation text is the target violation text.
In this embodiment, when it is determined that the interactive text is the suspected violation text, it is determined whether the suspected violation text is the target violation text according to the second recognition model and the text data. Specifically, when the suspected violation text is obtained, the text type of the suspected violation text is obtained, where the text type includes a suspected joint violation and a suspected semantic violation. When the suspected violation text is suspected combined violation, obtaining a detection text in a neighboring preset time range from the current text data, inputting the detection text into a second recognition model, and detecting whether violation intentions exist in the detection text; and when the second recognition model detects that the violation intention exists in the detection text, determining that the interactive text is the target violation text. When the suspected violation text is suspected semantic violation, inputting the text data to a second recognition model, and detecting whether full text in the text data has violation intention according to the second recognition model; and when the second recognition model detects that the text data has violation intention, determining the suspected violation text as the target violation text.
According to the embodiment, the voice quality is efficiently and accurately detected, the recognition accuracy of the violation statements is improved, whether the statements violate the rules or not is detected through the model, and the use of violation data is further avoided.
In some embodiments of the present application, the splitting the text data into a plurality of interactive texts includes:
identifying a character tag and a dialog period in the text data;
classifying the text data according to the role labels and the conversation period, and determining that the text data which is in the same conversation period and comprises all the role labels is the interactive text.
In this embodiment, when obtaining the text data, the text data may be further divided into a plurality of interactive texts according to the time sequence of the character tags and the dialog periods by recognizing the character tags and the dialog periods in the text data. Specifically, the role labels are labels of the dialogues in the text data, and different dialogues correspond to different role labels; the conversation period is a period in which one pair of conversations occurs, one for each conversation period. And when the text data is obtained, identifying the character tags and the conversation period in the text data, and taking the text data of the same conversation period and including all the character tags as an interactive text.
According to the method and the device, the text data are split to obtain the multiple interactive texts, so that the precision of detecting the illegal voice texts is improved, and the illegal texts in the current voice texts can be more accurately detected through the interactive texts.
In some embodiments of the present application, the determining whether the interactive text is suspected illegal text according to the first recognition model includes:
inputting the interactive text to the first recognition model to obtain a text label of the interactive text;
and matching the text label with a preset label, and determining the interactive text as the suspected violation text when the text label is successfully matched with the preset label.
In this embodiment, the first recognition model is a multi-class recognition model in a bert (pre-training language model), the interactive text is input to the first recognition model, and the text label of the current interactive text is obtained through the output of the first recognition model. The text labels are intention labels of the current interactive text, and comprise labels of different categories such as major violation labels and dialect violations. And when the text label is obtained, acquiring a stored preset label, and setting different preset labels according to different scenes. And matching to obtain a text label and a preset label, and when the text label is consistent with any label in the preset labels, determining that the text label is successfully matched with the preset label, wherein the current interactive text is the suspected violation text.
According to the method and the device, the suspected illegal text is detected, so that illegal detection of the voice text is avoided, the detection accuracy of the voice text is ensured, and the detection precision of the voice text is improved.
In some embodiments of the present application, the determining whether the suspected violation text is a target violation text according to the second recognition model and the text data includes:
inputting the suspected violation text into a preset binary identification model, and determining the suspected violation text as a middle violation text when the preset binary identification model outputs a violation label;
and acquiring the text type of the illegal intermediate text, and determining whether the illegal intermediate text is a target illegal text based on the second recognition model and the text type.
In this embodiment, when the suspected illegal text is determined according to the second recognition model and the text data, the suspected illegal text may be further detected through the preset classification recognition model, so as to improve the detection accuracy of the illegal text in the voice file. Specifically, when a suspected violation text is obtained, the suspected violation text is input into a preset binary classification recognition model, the preset binary classification recognition model is a preset binary classification recognition model in a bert (pre-training language model), and single classification recognition can be performed on input data according to the preset binary classification recognition model to obtain a unique classification result. Detecting the suspected violation text through the preset binary identification model to determine whether the suspected violation text is a middle violation text, and if the preset binary identification model outputs a violation label, determining that the suspected violation text is the middle violation text; and if the preset binary classification recognition model outputs a non-violation label, determining that the suspected violation text is a non-intermediate violation text, and the suspected violation text has no violation. Wherein, the violation label and non-violation label can be represented by 1 and 0, 1 represents the violation label, and 0 represents the non-violation label. And when the suspected violation text is determined to be the middle violation text, acquiring the text type of the middle violation text, and determining whether the middle violation text is the target violation text according to the text type and the second recognition model.
According to the method and the device, the suspected illegal text is further detected through the preset two-classification recognition model, so that the detection precision of the voice text is improved, the misdetection of the non-illegal text is avoided, and the detection accuracy of the illegal text in the voice text is improved.
In some embodiments of the present application, the determining whether the violation text is a target violation text based on the second recognition model and the text type includes:
when the text type is suspected to be combined violation, acquiring a detection text of the middle violation text in the text data within a neighboring preset time range;
and inputting the detection text into a second recognition model, and determining the intermediate violation text as the target violation text when the detection text is detected to have violation intention.
In this embodiment, the text type of the violation text may also be classified into a suspected union violation and a suspected semantic violation, where the suspected union violation indicates that there may be a common violation in the union context of the current violation text, and the suspected semantic violation indicates that there may be a text semantic violation in the current violation text. And analyzing the text label of the violation text to obtain the text type of the current violation text. And when the middle violation text is suspected to be combined violation, acquiring a detection text in a neighboring preset time range in text data where the current suspected violation text is located, such as an interactive text at the previous moment of the current interactive text. And when the detection text is obtained, inputting the detection text into a second recognition model, and when the second recognition model detects that the violation intention exists in the current detection text, determining that the middle violation text is the target violation text.
According to the embodiment, the suspected united illegal text is accurately detected, and the detection precision and the detection efficiency of the voice text are improved.
In some embodiments of the present application, the determining whether the violation text is a target violation text based on the second recognition model and the text type includes:
when the text type is suspected semantic violation, inputting the text data to the second recognition model;
when the violation intent is detected to be present in the text data, determining that the intervening violation text is the target violation text.
In this embodiment, a suspected semantic violation indicates that there may be a text semantic violation in the current intermediate violation text. And analyzing the text label of the violation text to obtain the text type of the current violation text. And when the illegal intervening text is suspected to be semantic illegal, inputting the text data to the second recognition model, and when detecting that the illegal intent corresponding to the current illegal intervening text exists in the text data, determining the illegal intervening text to be the target illegal text.
According to the embodiment, the suspected semantic violation text is accurately detected, and the detection precision and the detection efficiency of the voice text are improved.
In some embodiments of the present application, after the determining whether the interactive text is suspected illegal text, the method further includes:
and when the interactive text is not the suspected violation text, obtaining a quality inspection point of the interactive text, and detecting the interactive text based on the quality inspection point.
In this embodiment, when the interactive text is not a suspected violation text, a quality inspection point of the interactive text is obtained. Different interactive texts correspond to different quality inspection points. And determining a quality inspection point associated with the current interactive text according to the node information of the interactive text, wherein the node information is the flow node information of the interactive text. And when the quality inspection point corresponding to the current interactive text is determined, detecting the interactive text based on a preset quality inspection mode of the quality inspection point.
According to the method and the device, quality detection of the non-suspected illegal text is realized, the detection range is improved, and detection omission of the illegal text is avoided.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a speech quality detection apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 3, the voice quality detection apparatus 300 according to the present embodiment includes: a conversion module 301, a splitting module 302, a first validation module 303, and a second validation module 304. Wherein:
the conversion module 301 is configured to convert, when a sound recording file is received, the sound recording file into text data;
in this embodiment, the recording file is a dialogue interaction recording file, and when the recording file is received, the recording file is converted into text data. The conversion may be performed by ASR (Automatic Speech Recognition), specifically, when a recording file is obtained, the recording file is preprocessed to obtain a Speech feature, and then a preset Speech template and the Speech feature are identified and compared by a Speech Recognition model to obtain a Recognition result, where the Recognition result is the obtained text data.
A splitting module 302, configured to split the text data into multiple interactive texts;
wherein, the splitting module 302 comprises:
a first identification unit configured to identify a character tag and a dialogue period in the text data;
and the classification unit is used for classifying the text data according to the role labels and the conversation time interval, and determining that the text data which is in the same conversation time interval and comprises all the role labels is one interactive text.
In this embodiment, when obtaining text data, the text data is split into a plurality of interactive texts, where the interactive texts are interactive text data between dialogues. The current text data is identified, and the text data can be split into a plurality of interactive texts according to the time sequence of the conversations in the text data. When the interactive texts are obtained, the interactive texts can be numbered in sequence according to the time sequence and then stored in a preset database.
It is emphasized that the interactive text may also be stored in a node of a blockchain in order to further ensure the privacy and security of the interactive text.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The first confirming module 303 is configured to obtain a preset intention recognition model, where the intention recognition model includes a first recognition model and a second recognition model, input the interactive text to the first recognition model, and determine whether the interactive text is a suspected violation text according to the first recognition model;
wherein, the first confirmation module 303 includes:
the second identification unit is used for inputting the interactive text to the first identification model to obtain a text label of the interactive text;
and the matching unit is used for matching the text label with a preset label, and when the text label is successfully matched with the preset label, determining that the interactive text is the suspected violation text.
In this embodiment, the intention recognition model is a preset text intention recognition model, and the intention recognition model includes a first recognition model and a second recognition model, where the first recognition model is a multi-classification recognition model in bert (pre-training language model), that is, text labels corresponding to all interactive texts can be obtained through the first recognition model; the second recognition model is a preset text detection model, and text detection can be performed on input data through the second recognition model. Specifically, when the interactive text corresponding to the text data is obtained, all the obtained interactive texts are input into the first recognition model, and the text labels corresponding to the interactive text are obtained through recognition of the first recognition model. The text labels are classification labels, and the violation type related to the current interactive text can be determined according to the text labels. If the text label of the current interactive text obtained through the first recognition model is one of preset labels, determining that the interactive text is a suspected illegal text; and if the text label of the current interactive text obtained through the first recognition model recognition does not belong to one of preset labels, determining that the interactive text is not suspected to be the illegal text.
A second confirming module 304, configured to determine, according to the second recognition model and the text data, whether the suspected violation text is a target violation text when the interactive text is the suspected violation text, and determine, when the second recognition model detects a violation intention, that the suspected violation text is the target violation text.
Wherein the second confirmation module 304 comprises:
the first confirming unit is used for inputting the suspected violation text to a preset binary identification model, and when the preset binary identification model outputs a violation label, the suspected violation text is determined to be a middle violation text;
and the second confirming unit is used for acquiring the text type of the illegal intermediate text and determining whether the illegal intermediate text is the target illegal text or not based on the second recognition model and the text type.
Wherein the second confirmation unit includes:
the obtaining subunit is configured to, when the text type is a suspected union violation, obtain a detection text of the middle violation text in the text data within a neighboring preset time range;
and the first confirming subunit is used for inputting the detection text into a second recognition model, and when the detection text is detected to have violation intention, determining the middle violation text as the target violation text.
The input subunit is configured to input the text data to the second recognition model when the text type is a suspected semantic violation;
and the second confirming subunit is used for determining the intermediate violation text as the target violation text when the violation intention is detected to exist in the text data.
In this embodiment, when it is determined that the interactive text is the suspected violation text, it is determined whether the suspected violation text is the target violation text according to the second recognition model and the text data. Specifically, when the suspected violation text is obtained, the text type of the suspected violation text is obtained, where the text type includes a suspected joint violation and a suspected semantic violation. When the suspected violation text is suspected combined violation, obtaining a detection text in a neighboring preset time range from the current text data, inputting the detection text into a second recognition model, and detecting whether violation intentions exist in the detection text; and when the second recognition model detects that the violation intention exists in the detection text, determining that the interactive text is the target violation text. When the suspected violation text is suspected semantic violation, inputting the text data to a second recognition model, and detecting whether full text in the text data has violation intention according to the second recognition model; and when the second recognition model detects that the violation intention exists in the text data, determining the interactive text as the target violation text.
The voice quality detection apparatus proposed in this embodiment further includes:
and the detection module is used for acquiring a quality inspection point of the interactive text when the interactive text is not the suspected violation text, and detecting the interactive text based on the quality inspection point.
In this embodiment, when the interactive text is not a suspected violation text, a quality inspection point of the interactive text is obtained. Different interactive texts correspond to different quality inspection points. And determining a quality inspection point associated with the current interactive text according to the node information of the interactive text, wherein the node information is the flow node information of the interactive text. And when the quality inspection point corresponding to the current interactive text is determined, detecting the interactive text based on a preset quality inspection mode of the quality inspection point.
The voice quality detection device provided by the embodiment realizes efficient and accurate detection of voice quality, improves the recognition accuracy of violation sentences, and further avoids the use of violation data by detecting whether the sentences violate rules through the model.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as computer readable instructions of a voice quality detection method. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute computer readable instructions stored in the memory 61 or process data, such as computer readable instructions for executing the voice quality detection method.
The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.
The computer equipment provided by the embodiment realizes efficient and accurate detection of voice quality, improves the identification accuracy of violation sentences, and further avoids the use of violation data by detecting whether the sentences violate the rules or not through the model.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the voice quality detection method as described above.
The computer-readable storage medium provided by the embodiment realizes efficient and accurate detection of voice quality, improves the identification accuracy of violation statements, and further avoids the use of violation data by detecting whether the statements violate rules through a model.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A voice quality detection method is characterized by comprising the following steps:
when receiving a recording file, converting the recording file into text data;
splitting the text data into a plurality of interactive texts;
acquiring a preset intention recognition model, wherein the intention recognition model comprises a first recognition model and a second recognition model, inputting the interactive text to the first recognition model, and determining whether the interactive text is a suspected illegal text according to the first recognition model;
and when the interactive text is the suspected violation text, determining whether the suspected violation text is a target violation text according to the second recognition model and the text data, and when the second recognition model detects a violation intention, determining that the suspected violation text is the target violation text.
2. The method according to claim 1, wherein the step of splitting the text data into a plurality of interactive texts specifically comprises:
identifying a character tag and a dialog period in the text data;
classifying the text data according to the role labels and the conversation period, and determining that the text data which is in the same conversation period and comprises all the role labels is the interactive text.
3. The method according to claim 1, wherein the step of determining whether the interactive text is a suspected violation text according to the first recognition model specifically includes:
inputting the interactive text to the first recognition model to obtain a text label of the interactive text;
and matching the text label with a preset label, and determining the interactive text as the suspected violation text when the text label is successfully matched with the preset label.
4. The method according to claim 1, wherein the step of determining whether the suspected violation text is a target violation text according to the second recognition model and the text data specifically includes:
inputting the suspected violation text into a preset binary identification model, and determining the suspected violation text as a middle violation text when the preset binary identification model outputs a violation label;
and acquiring the text type of the illegal intermediate text, and determining whether the illegal intermediate text is a target illegal text based on the second recognition model and the text type.
5. The method according to claim 4, wherein the step of determining whether the illegal intervening text is a target illegal text based on the second recognition model and the text type specifically comprises:
when the text type is suspected to be combined violation, acquiring a detection text of the middle violation text in the text data within a neighboring preset time range;
and inputting the detection text into the second recognition model, and determining the illegal intermediate text as the target illegal text when detecting that the detection text has illegal intentions.
6. The method according to claim 4, wherein the step of determining whether the illegal intervening text is a target illegal text based on the second recognition model and the text type further comprises:
when the text type is suspected semantic violation, inputting the text data to the second recognition model;
when the violation intent is detected to be present in the text data, determining that the intervening violation text is the target violation text.
7. The method according to claim 1, wherein after the step of determining whether the interactive text is a suspected violation text, the method further comprises:
and when the interactive text is not the suspected violation text, obtaining a quality inspection point of the interactive text, and detecting the interactive text based on the quality inspection point.
8. A speech quality detection apparatus, comprising:
the conversion module is used for converting the sound recording file into text data when the sound recording file is received;
the splitting module is used for splitting the text data into a plurality of interactive texts;
the first confirming module is used for acquiring a preset intention recognition model, the intention recognition model comprises a first recognition model and a second recognition model, the interactive text is input into the first recognition model, and whether the interactive text is a suspected illegal text is determined according to the first recognition model;
and the second confirming module is used for determining whether the suspected violation text is a target violation text according to the second recognition model and the text data when the interactive text is the suspected violation text, and determining the suspected violation text as the target violation text when the second recognition model detects a violation intention.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the speech quality detection method according to any one of claims 1 to 7.
10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the speech quality detection method according to any one of claims 1 to 7.
CN202011540509.7A 2020-12-23 2020-12-23 Voice quality detection method and device, computer equipment and storage medium Pending CN112669850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011540509.7A CN112669850A (en) 2020-12-23 2020-12-23 Voice quality detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011540509.7A CN112669850A (en) 2020-12-23 2020-12-23 Voice quality detection method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112669850A true CN112669850A (en) 2021-04-16

Family

ID=75409086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011540509.7A Pending CN112669850A (en) 2020-12-23 2020-12-23 Voice quality detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112669850A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590825A (en) * 2021-07-30 2021-11-02 平安科技(深圳)有限公司 Text quality inspection method and device and related equipment
CN117079640A (en) * 2023-10-12 2023-11-17 深圳依时货拉拉科技有限公司 Voice monitoring method, device, computer equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590825A (en) * 2021-07-30 2021-11-02 平安科技(深圳)有限公司 Text quality inspection method and device and related equipment
CN117079640A (en) * 2023-10-12 2023-11-17 深圳依时货拉拉科技有限公司 Voice monitoring method, device, computer equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US10777207B2 (en) Method and apparatus for verifying information
CN111368043A (en) Event question-answering method, device, equipment and storage medium based on artificial intelligence
CN112328761B (en) Method and device for setting intention label, computer equipment and storage medium
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
CN112836521A (en) Question-answer matching method and device, computer equipment and storage medium
CN112468658B (en) Voice quality detection method and device, computer equipment and storage medium
CN113314150A (en) Emotion recognition method and device based on voice data and storage medium
CN112084752A (en) Statement marking method, device, equipment and storage medium based on natural language
CN112669850A (en) Voice quality detection method and device, computer equipment and storage medium
CN112671985A (en) Agent quality inspection method, device, equipment and storage medium based on deep learning
CN112446209A (en) Method, equipment and device for setting intention label and storage medium
CN116912847A (en) Medical text recognition method and device, computer equipment and storage medium
CN114090792A (en) Document relation extraction method based on comparison learning and related equipment thereof
CN112182157A (en) Training method of online sequence labeling model, online labeling method and related equipment
CN115730603A (en) Information extraction method, device, equipment and storage medium based on artificial intelligence
CN115373634A (en) Random code generation method and device, computer equipment and storage medium
CN114637831A (en) Data query method based on semantic analysis and related equipment thereof
CN114724561A (en) Voice interruption method and device, computer equipment and storage medium
CN114528851A (en) Reply statement determination method and device, electronic equipment and storage medium
CN114398466A (en) Complaint analysis method and device based on semantic recognition, computer equipment and medium
CN112417886A (en) Intention entity information extraction method and device, computer equipment and storage medium
CN112396111A (en) Text intention classification method and device, computer equipment and storage medium
CN111369975A (en) University music scoring method, device, equipment and storage medium based on artificial intelligence
CN112395450A (en) Picture character detection method and device, computer equipment and storage medium
CN111899718A (en) Method, apparatus, device and medium for recognizing synthesized speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination