CN113051924A

CN113051924A - Method and system for segmented quality inspection of recorded data

Info

Publication number: CN113051924A
Application number: CN202110486357.5A
Authority: CN
Inventors: 沈超建; 魏薇郦; 刘金山; 江文乐
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-06-29

Abstract

The application provides a method and a system for segmented quality inspection of recorded data, which can be used in the technical field of cloud computing or other fields, and the method comprises the following steps: converting the audio data into text data; carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity; comparing the output similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified; the audio data is converted into text data to be subjected to similarity detection with a preset standard speech template, manual recheck is not needed to be additionally input, and the service value of the quality inspection system is improved.

Description

Method and system for segmented quality inspection of recorded data

Technical Field

The application relates to the technical field of cloud computing, in particular to a method and a system for segmented quality inspection of recorded data.

Background

In order to protect consumers, when supervising organizations such as bank insurance prisoners and certificate prisoners require financial institutions to sell their own or substitute for products such as financial, fund and insurance, the selling process needs to be synchronized by recording (double recording for short) to prevent misleading selling and private selling of 'flyer bill' and the like. The financial institution formulates double recording steps and standard tactical templates of each step when related products are sold according to the supervision requirements and self business processes, and checks whether double recording videos meet the requirements one by one after double recording is finished.

In order to reduce the human input of quality inspection and unify the quality inspection standard, part of mechanisms utilize artificial intelligence technology to carry out automated inspection to the audio and video file after the double recording is accomplished, the quality inspection point is designed according to the requirement of the self double recording step of the mechanism, the double recording step and the standard technical template of different mechanisms have differences, generally include a plurality of quality inspection points of voice and video: (1) and a voice part: the customer manager confirms the customer identity, the customer manager introduces themselves, the customer manager introduces the product conditions one by one (such as issuing subject, warranty attribute, income level, risk condition, handling fee and the like); (2) and a video part: the customer manager and the customer are in the same picture, the customer manager has display certificates and product data, and the key link customer has signature. Because the construction of the dual-recording system is earlier, the dual-recording file and the service data generally do not contain the segmentation information of the quality inspection point, that is, it cannot be known which time period from a certain quality inspection point to the dual-recording file to inspect, the dual-recording file needs to be segmented by a technical means, and the starting and ending time points of each quality inspection point in the dual-recording file are determined.

At present, the following method is generally adopted for segmentation:

(1) and a voice part: and segmenting by adopting keywords, namely, formulating unique start keywords and end keywords of each voice quality inspection point according to a standard speech technology template, converting audio into texts by using a voice recognition technology, and segmenting the texts according to the keywords of each quality inspection point.

The method has the disadvantages that on one hand, the uniqueness of the starting keywords and the ending keywords is depended on, and when the dialogues are short or the keywords with different quality inspection points are similar, the segmentation effect is poor; on the other hand, due to the fact that noise exists during double recording, the voice of a client manager/client is not large enough, local accent is heavy, and the limit of the voice recognition technology causes partial deletion or error of the converted text, segmentation error can be caused at the moment, an error quality inspection result is caused, extra manpower is required to be invested for manual review, and the service value of the quality inspection system is reduced.

(2) And a video part: checking each video quality inspection point or a segmentation result depending on voice, wherein the condition of inaccurate segmentation causes quality inspection error also exists at the moment; or the segmentation is not implemented, but the whole double-recording video is subjected to frame extraction and picture formation and then is checked one by one, and the defects of high computing resource consumption and long quality inspection time exist at the moment.

Disclosure of Invention

The method is used for accurately segmenting the recorded data, corresponding to the quality inspection points one by one, and then performing quality inspection only in the segmented data corresponding to the quality inspection points, so that the quality inspection accuracy is improved, and the time consumption and resource consumption of quality inspection are reduced.

In order to solve the technical problem, the application provides the following technical scheme:

in a first aspect, the present application provides a method for performing segmented quality inspection on recorded data, where the recorded data includes audio data, and the method includes:

converting the audio data into text data;

carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity;

and comparing the similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified.

Further, the recording data segment quality inspection method further comprises the following steps:

generating a plurality of segmented texts according to the text data and the byte length of the standard language template;

the detecting the similarity between the text data and a preset standard dialect template comprises the following steps:

and carrying out similarity detection on each segmented text, wherein the output similarity is the similarity with the maximum value in all the segmented texts.

Further, the recorded data further includes video data, and the method for segment quality inspection of recorded data further includes:

performing frame extraction on video data in a time period to obtain a plurality of pictures, wherein the time period is the time period in which the segmented text corresponding to the maximum numerical similarity is located;

and detecting the pictures one by one, and judging whether the pictures meet the quality inspection requirements.

Further, the generating a plurality of segmented texts according to the text data and the byte length of the standard phonetics template comprises:

determining at least one sliding window length from the text data and the byte length of the standard conversation template;

and performing sliding processing on the text data by adopting the lengths of all the sliding windows to obtain the plurality of segmented texts.

Further, the length of each of the sliding windows is different.

Further, the performing similarity detection on each segmented text includes:

and adopting a conversational classifier to carry out similarity detection on each segmented text, wherein the conversational classifier outputs the similarity with the maximum value and the time start and end information of the segmented text corresponding to the similarity with the maximum value, and the time start and end information is used for anchoring the corresponding segmented text.

and training the standard speech template by combining a machine learning model according to the standard speech template to obtain a speech classifier.

In a second aspect, the present application provides a segmented quality inspection system for recorded data, where the recorded data includes audio data, the segmented quality inspection system for recorded data includes:

a text conversion module: converting the audio data into text data;

a similarity contrast module: carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity;

a similarity judging module: and comparing the similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified.

Further, still include:

the recording data segment quality inspection system further comprises:

a preprocessing module: generating a plurality of segmented texts according to the text data and the byte length of the standard language template;

the similarity contrast module comprises:

a similarity detection unit: and carrying out similarity detection on each segmented text, wherein the output similarity is the similarity with the maximum value in all the segmented texts.

Further, the recorded data further includes video data, and the recorded data segment quality inspection system further includes:

the video processing module: performing frame extraction on video data in a time period to obtain a plurality of pictures, wherein the time period is the time period in which the segmented text corresponding to the maximum numerical similarity is located;

the picture quality inspection module: and detecting the pictures one by one, and judging whether the pictures meet the quality inspection requirements.

Further, the preprocessing module comprises:

a sliding window unit: determining at least one sliding window length from the text data and the byte length of the standard conversation template;

an audio segmentation unit: and performing sliding processing on the text data by adopting the lengths of all the sliding windows to obtain the plurality of segmented texts.

Further, the similarity detection unit includes:

a conversational classifier component: and adopting a conversational classifier to carry out similarity detection on each segmented text, wherein the conversational classifier outputs the similarity with the maximum value and the time start and end information of the segmented text corresponding to the similarity with the maximum value, and the time start and end information is used for anchoring the corresponding segmented text.

Further, the recording data segment quality inspection system further comprises:

the tactical classifier training module: and training the standard speech template by combining a machine learning model according to the standard speech template to obtain a speech classifier.

In a third aspect, the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the segmented quality inspection method for the recorded data when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method for segment quality inspection of recorded data.

According to the technical scheme, the method and the system for the segmented quality inspection of the recorded data comprise the following steps: converting the audio data into text data; carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity; comparing the output similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified; the audio data is converted into text data to be subjected to similarity detection with a preset standard speech template, manual recheck is not needed to be additionally input, and the service value of the quality inspection system is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating a segmented quality inspection method for recorded data according to an embodiment of the present application.

Fig. 2 is a flowchart illustrating step S110 in the method for inspecting the segmented recorded data according to the embodiment of the present application.

Fig. 3 is a schematic structural diagram of a segmented quality inspection system for recorded data in an embodiment of the present application.

Fig. 4 is a schematic structural diagram of a preprocessing module of a segmented quality inspection system for recorded data in an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a training module of a conversational classifier of a segmented quality inspection system for recorded data according to an embodiment of the present application.

Fig. 6 is a flowchart illustrating an embodiment of a segmented quality inspection system for recorded data according to the present application.

Fig. 7 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the method and system for testing the quality of the recorded data segments disclosed by the application can be used in the technical field of cloud computing, and can also be used in any field except the technical field of cloud computing.

In one or more embodiments of the present application, the recording data may be audio data or video data, or may be audio-video mixed data. The quality inspection contents of the audio and video data streams comprise that a customer manager confirms the identity of the customer and introduces the customer by self, product conditions (such as a release subject, a warranty attribute, a profit level, risk conditions, commission charge and the like) are introduced, the customer manager and the customer are positioned on the same picture, key links have customer signatures, whether the customer manager says banners or not and the like.

In consideration of the fact that the existing recorded data quality inspection method usually needs additional manpower input to perform manual review, and the service value of a quality inspection system is reduced, the application provides a recorded data segmented quality inspection method, a recorded data segmented quality inspection system, electronic equipment and a computer readable storage medium, the audio data are converted into text data to perform similarity detection with a preset standard speech technology template, the additional manpower input is not needed to perform manual review, and the service value of the quality inspection system is improved.

Based on the above, the present application further provides a segmented quality inspection system for recorded data, which is used to implement the segmented quality inspection method for recorded data provided in one or more embodiments of the present application, the segmented quality inspection system for recorded data may be in communication connection with a plurality of client terminal devices, and the segmented quality inspection system for recorded data may specifically access the client terminal devices through an application server.

The recorded data segmented quality inspection system can receive a recorded data quality inspection instruction from client terminal equipment and acquire audio data of a target needing quality inspection from the recorded data quality inspection instruction, convert the audio data into text data and detect the similarity of a standard dialect template preset by the text data to obtain a similarity; and comparing the output similarity with a set threshold value to obtain a quality inspection result of the audio data, and then sending the quality inspection result of the audio data to the client equipment for display by the recorded data segmentation quality inspection system so that a user can obtain the quality inspection result of the recorded audio data according to the client equipment.

It will be appreciated that the client devices may include smart phones, tablet electronic devices, portable computers, desktop computers, Personal Digital Assistants (PDAs), and the like.

In another practical application, the portion of performing the quality inspection of the recorded data segments may be performed in the classification processing center as described above, or all operations may be performed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. If all the operations are completed in the client device, the client device may further include a processor for performing specific processing of the segment quality inspection of the recorded data.

The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. For example, the communication unit may send the recording data segment quality inspection instruction to a server of the classification processing center, so that the server performs the recording data segment quality inspection according to the recording data segment quality inspection instruction. The communication unit may also receive the identification result returned by the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.

The server and the client device may communicate using any suitable network protocol, including network protocols not yet developed at the filing date of this application. The network protocol may include, for example, a TCP/IP protocol, a UDP/IP protocol, an HTTP protocol, an HTTPS protocol, or the like. Of course, the network Protocol may also include, for example, an RPC Protocol (Remote Procedure Call Protocol), a REST Protocol (Representational State Transfer Protocol), and the like used above the above Protocol.

The recorded data segmented quality inspection method provided by the application comprises the recorded data segmented quality inspection system, the electronic equipment and a computer readable storage medium, the audio data are converted into text data to be subjected to similarity detection with a preset standard conversation template, manual rechecking is not needed to be additionally input, and the service value of the quality inspection system is improved.

The following embodiments and application examples are specifically and respectively described.

In order to solve the problem that the existing recorded data quality inspection method usually needs to invest extra manpower for manual review and reduces the service value of a quality inspection system, the application provides an embodiment of a recorded data segmentation quality inspection method, and referring to fig. 1, the recorded data segmentation quality inspection method specifically comprises the following contents:

step S100: the audio data is converted into text data.

Step S200: and carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity.

Step S300: and comparing the similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified.

It can be understood that the audio data is converted into the text data by the voice recognition technology, the text data is compared with the preset standard dialect template for detection, the similarity between the text data and the standard dialect template is obtained, and if the similarity is greater than a set threshold, the quality inspection of the audio data is judged to be qualified.

As can be seen from the above description, the recording data segment quality inspection method provided in the embodiment of the present application converts audio data into text data; carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity; comparing the output similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified; the audio data is converted into text data to be subjected to similarity detection with a preset standard speech template, manual recheck is not needed to be additionally input, and the service value of the quality inspection system is improved.

In order to further improve the accuracy of the quality inspection of the recorded data, in an embodiment of the method for inspecting the recorded data segment provided by the present application, a preferred method for inspecting the recorded data segment is provided, referring to fig. 1, the following contents are specifically included between step S100 and step S200 in the method for inspecting the recorded data segment:

step S110: and generating a plurality of segmented texts according to the text data and the byte length of the standard language template.

It can be understood that the recorded audio data is converted into characters through a voice recognition technology, the starting and ending time points of each sentence in the whole recorded audio are obtained, the text data is segmented according to the byte length of a standard speech technology template, a plurality of segmented characters are extracted, the similarity between each segmented text and the corresponding standard speech technology is respectively judged, and if the similarity is greater than a set threshold value, the audio data is judged to be qualified in quality inspection.

As can be seen from the above description, the method for performing segmented quality inspection on recorded data, provided by the embodiment of the present application, accurately segments text data obtained by converting audio data and corresponds to a standard dialect one by one, and performs quality inspection only in segmented voices corresponding to the standard dialect, so as to improve quality inspection accuracy and reduce time consumption and resource consumption for quality inspection.

In order to further improve the accuracy of the quality inspection of the recorded data, in an embodiment of the method for inspecting the recorded data in segments provided by the present application, a preferred method for inspecting the recorded data in segments is provided, and step S200 in the method for inspecting the recorded data in segments specifically includes the following steps:

step S201: and carrying out similarity detection on each segmented text, wherein the output similarity is the similarity with the maximum value in all the segmented texts.

It can be understood that a plurality of segmented texts can be extracted from text data corresponding to one recorded audio data, similarity detection is performed on each segmented text and a corresponding standard dialect to obtain a plurality of similarities, the similarities show a trend of gradually increasing and then falling back for a plurality of times of comparison of the same standard dialect template, the text with the highest similarity before falling is retrieved and used as an optimal segmentation result of the quality inspection point, the similarity value of the optimal segmentation result is judged, and if the similarity value is greater than a set threshold value, the voice quality inspection point is judged to be qualified.

As can be seen from the above description, the method for performing segmented quality inspection on recorded data, provided by the embodiment of the present application, accurately segments text data obtained by converting audio data and corresponds to standard dialogues one to one, performs similarity detection on each segmented text and a corresponding standard dialogues template, and selects the highest similarity from a plurality of similarities to perform quality inspection judgment, thereby improving quality inspection accuracy.

In order to further improve the comprehensiveness and accuracy of the quality inspection of the recorded data, in an embodiment of the segmented quality inspection method for the recorded data provided by the present application, a preferred method for quality inspection of video data is provided, referring to fig. 1, where the recorded data further includes video data, and the segmented quality inspection method for the recorded data further includes the following steps:

step S400: and performing frame extraction on the video data in a time period to obtain a plurality of pictures, wherein the time period is the time period in which the segmented text corresponding to the maximum numerical similarity is located.

Step S500: and detecting the pictures one by one, and judging whether the pictures meet the quality inspection requirements.

It can be understood that, in step S201, a highest-score similarity is obtained, the segmented text corresponding to the highest-score similarity is an optimal segmented result, start and end time points of the optimal segmented text in the whole recorded data are obtained, video data in the period and a short period adjacent to the period are subjected to frame extraction to form a picture, and then quality inspection is performed on the picture, and different frame extraction frequencies can be adopted for different quality inspection points.

As can be seen from the above description, the method for performing segmented quality inspection on recorded data provided in the embodiment of the present application performs frame extraction and picture formation on video data in a time period corresponding to an optimal segmented text on the basis of segmenting audio data to obtain the optimal segmented text, and performs quality inspection on the picture again, so that it is not necessary to perform frame extraction and picture formation in the entire recorded video data and then perform one-by-one inspection, thereby reducing the time consumption and resource consumption for quality inspection.

In order to further improve the accuracy of the segmentation of the extracted recording data, in an embodiment of the method for quality inspection of the segmented recording data provided in the present application, a preferred manner of data segmentation is provided, referring to fig. 2, step S110 in the method for quality inspection of the segmented recording data specifically includes the following steps:

step S111: determining at least one sliding window length based on the text data and the byte length of the standard phonetics template.

It will be appreciated that the parameter information for each standard utterance is configured, including the order of the standard utterance in the entire standard utterance template, the shortest sentence length (X)₁) Longest sentence length (X)₂) Where the sentence length is different for each standard conversation.

Step S112: and performing sliding processing on the text data by adopting the lengths of all the sliding windows to obtain the plurality of segmented texts.

It can be understood that, when segmenting the text data, for the 1 st standard dialect in sequence, X is respectively taken from the first sentence of the text according to the idea of the sliding window₁、X₁+1、...、X₂Sentence characters form a plurality of first text segments, and similarity detection is carried out on the obtained first segment texts; then, starting from the second sentence of the text, respectively taking X₁、X₁+1、...、X₂Sentence characters form a plurality of second segmented texts, and similarity detection is carried out on the obtained second segmented texts; the operation of the 1 st standard dialect is not stopped until an optimal segmentation result of the standard dialect corresponding to the text data is obtained. For the 2 nd standard dialect in sequence, starting from the sentence after the optimal segmentation result of the 1 st standard dialect, the segmentation is carried out according to the same sliding window thought until the last 1 standard dialect is completed.

As can be seen from the above description, the method for segmented quality inspection of recorded data according to the embodiment of the present application obtains an optimal segmented text based on a sliding window mode, and realizes accurate segmentation of recorded data and one-to-one correspondence with a standard dialect, thereby improving quality inspection accuracy and reducing time consumption and resource consumption for quality inspection.

In order to further improve the accuracy of the quality inspection of the recorded data segments, in an embodiment of the quality inspection method for the recorded data segments provided in the present application, a preferred method for data segmentation is provided, and step S201 in the quality inspection method for the recorded data segments specifically includes the following steps:

step S2011: and adopting a conversational classifier to carry out similarity detection on each segmented text, wherein the conversational classifier outputs the similarity with the maximum value and the time start and end information of the segmented text corresponding to the similarity with the maximum value, and the time start and end information is used for anchoring the corresponding segmented text.

It can be understood that, the similarity of each segmented text and the specified standard dialect is respectively judged by adopting the dialect classifier, and the most similar text is taken as the optimal segmentation result, so as to obtain the starting and ending time points of the optimal segmentation result in the recorded data.

As can be seen from the above description, the recording data segment quality inspection method provided in the embodiment of the present application obtains video data in the time period according to the time period of the optimal segment text in the whole recording data on the basis of obtaining the optimal segment text by segmenting the audio data, and further improves the accuracy of video data segmentation.

In order to further improve the accuracy of detecting the similarity of the segmented text, in an embodiment of the method for detecting the quality of the segmented recorded data provided by the present application, a preferred method for training a conversational classifier is provided, and referring to fig. 1, the method for detecting the quality of the segmented recorded data specifically includes the following steps:

step S001: and training the standard speech template by combining a machine learning model according to the standard speech template to obtain a speech classifier.

It can be understood that, according to the quality inspection requirement, the whole double-recording standard dialect template is segmented, each audio quality inspection point corresponds to one section of standard dialect in the dialect template, and each video quality inspection point is also definitely located in a certain section of standard dialect. For example: the corresponding standard speech technology template is 'my financial manager XXX (name) which is a XXX mechanism XXX website and is my XXX certificate (such as a license card, an identity card and the like), please check', and whether the video quality inspection point of the certificate is displayed during self introduction also corresponds to the same standard speech technology template. And performing text labeling on the standard speech template corresponding to the quality inspection point, wherein the labeled contents comprise parts of speech, named entities (such as organization names, place names, person names and the like), replaceable words and sentences, deletable words and sentences and the like. Aiming at the marked dialect, under the condition of keeping the semantic unchanged, a batch of text corpora are automatically generated, and the generation method comprises synonym replacement, mechanism name/place name/person name replacement, partial non-key sentences deletion, partial sentence repetition/disorder and the like, and aims to increase the number of different samples and improve the subsequent model training effect. And aiming at the generated text corpus, training in a machine learning mode to generate a dialect classifier, wherein the classifier can judge the similarity of the input text and a standard dialect template corresponding to a certain quality inspection point.

From the above description, the recorded data segment quality inspection method provided in the embodiment of the present application trains the standard speech template and the text corpus in a machine learning manner to generate the speech classifier, and performs similarity detection on the recorded data by using the speech classifier, so as to further improve the reliability of the recorded data quality inspection result.

In terms of software, in order to solve the problems that the existing recorded data quality inspection method often needs to be manually rechecked by extra manpower, the service value of a quality inspection system is reduced, and the like, the present application provides an embodiment of a recorded data segmented quality inspection system for executing all or part of the content in the recorded data segmented quality inspection method, and referring to fig. 3, the recorded data segmented quality inspection system specifically includes the following contents:

the text conversion module 10: the audio data is converted into text data.

Similarity contrast module 20: and carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity.

The similarity determination module 30: and comparing the similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified.

It can be understood that the text information generated by the voice conversion technology in the audio in the double-recording file of the text conversion module 10 records the words spoken by the customer manager and the customer during the double-recording process. And technologies such as voice recognition and the like are adopted during conversion, and the conversion can be realized based on the prior art. The text conversion module 10 converts the audio data into text data through a voice recognition technology, the similarity comparison module 20 compares and detects the text data with a preset standard dialect template to obtain the similarity between the text data and the standard dialect template, the similarity judgment module 30 judges whether the audio data is qualified in quality inspection, and if the similarity is greater than a set threshold, the audio data is judged to be qualified in quality inspection.

The embodiment of the segmented quality inspection system for recorded data provided by the present application may be specifically configured to execute the processing procedure of the embodiment of the segmented quality inspection system for recorded data in the foregoing embodiment, and the functions thereof are not described herein again, and reference may be made to the detailed description of the embodiment of the apparatus.

As can be seen from the above description, the recording data segment quality inspection system provided in the embodiment of the present application converts audio data into text data; carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity; comparing the output similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified; the audio data is converted into text data to be subjected to similarity detection with a preset standard speech template, manual recheck is not needed to be additionally input, and the service value of the quality inspection system is improved.

In order to further improve the accuracy of the quality inspection of the recorded data, in an embodiment of the system for quality inspection of the recorded data segments provided in the present application, a preferred method for recording the data segments is provided, referring to fig. 3, the system for quality inspection of the recorded data segments further includes a preprocessing module 11, where the preprocessing module 11 is specifically configured to execute the following steps:

It can be understood that the recorded audio data is converted into characters through a voice recognition technology, and the start-stop time point of each sentence in the whole recorded audio is obtained, the preprocessing module 11 segments the text data according to the byte length of the standard dialect template, extracts a plurality of segmented texts, respectively judges the similarity between each segmented text and the corresponding standard dialect according to the plurality of segmented texts obtained by the preprocessing module 11, and if the similarity is greater than a set threshold, the audio data quality inspection is judged to be qualified.

As can be seen from the above description, the segmented quality inspection system for recorded data provided in the embodiment of the present application accurately segments text data obtained by converting audio data and corresponds to a standard dialect one by one, and then performs quality inspection only in segmented voices corresponding to the standard dialect, thereby improving quality inspection accuracy and reducing time consumption and resource consumption for quality inspection.

In order to further improve the accuracy of the quality inspection of the recorded data, in an embodiment of the system for quality inspection of the recorded data segments provided in the present application, a preferred manner of the quality inspection of the recorded data is provided, referring to fig. 2, a similarity comparison module 20 in the system for quality inspection of the recorded data segments specifically includes a similarity detection unit 2001, and the similarity detection unit 201 is configured to perform the following steps:

As can be seen from the above description, the recording data segmentation quality inspection system provided in the embodiment of the present application accurately segments text data obtained by converting audio data and corresponds to standard dialogues one to one, performs similarity detection on each segmented text and a corresponding standard dialogues template, and selects the highest similarity from a plurality of similarities to perform quality inspection judgment, thereby improving quality inspection accuracy.

In order to further improve the comprehensiveness and accuracy of the quality inspection of the recorded data, in an embodiment of the segmented quality inspection system for recorded data provided by the present application, a preferred mode of quality inspection of video data is provided, referring to fig. 3, where the recorded data further includes video data, and the segmented quality inspection system for recorded data further specifically includes the following contents:

the video processing module 40: performing frame extraction on video data in a time period to obtain a plurality of pictures, wherein the time period is the time period in which the segmented text corresponding to the maximum numerical similarity is located;

the picture quality inspection module 50: and detecting the pictures one by one, and judging whether the pictures meet the quality inspection requirements.

It can be understood that, a similarity with the highest score is obtained in the similarity detection unit, the segmented text corresponding to the similarity with the highest score is the best segmented result, the start and stop time points of the best segmented text in the whole recorded data are obtained, the video processing module 40 performs frame extraction on the video data in the time interval and the adjacent short time interval to form a picture, the picture quality inspection module 50 performs quality inspection on the picture, and different frame extraction frequencies can be adopted at different quality inspection points.

As can be seen from the above description, the recording data segmentation quality inspection system provided in the embodiment of the present application performs frame extraction and picture formation on video data in a time period corresponding to an optimal segmentation text on the basis of segmenting audio data to obtain the optimal segmentation text, and performs quality inspection on the picture, so that it is not necessary to perform frame extraction and picture formation in the entire recorded video data and then perform one-by-one inspection, thereby reducing the time consumption and resource consumption for quality inspection.

In order to further improve the accuracy of the segmentation of the extracted recording data, in an embodiment of the segmented quality inspection system for the recorded data provided by the present application, a preferred manner of data segmentation is provided, referring to fig. 4, where the preprocessing module 11 in the segmented quality inspection system for the recorded data further includes the following contents:

sliding window unit 101: determining at least one sliding window length based on the text data and the byte length of the standard phonetics template.

It will be appreciated that the sliding window unit 101 is used to configure the parameter information for each standard conversational segmentation, including the order of the standard conversational segmentation in the entire standard conversational template, the shortest sentence length (X)₁) Longest sentence length (X)₂) Wherein the sentence length of each standard conversation fragment is different.

The audio segmentation unit 102: and performing sliding processing on the text data by adopting the lengths of all the sliding windows to obtain the plurality of segmented texts.

It is understood that, when the audio segmenting unit 102 segments the text data, for the 1 st standard dialect in sequence, X is respectively taken from the first sentence of the text according to the idea of sliding window₁、X₁+1、...、X₂Sentence characters form a plurality of first text segments, and similarity detection is carried out on the obtained first segment texts; then, starting from the second sentence of the text, respectively taking X₁、X₁+1、...、X₂Sentence characters form a plurality of second segmented texts, and similarity detection is carried out on the obtained second segmented texts; the operation of the 1 st standard dialect is not stopped until an optimal segmentation result of the standard dialect corresponding to the text data is obtained. For the 2 nd standard dialect in sequence, starting from the sentence after the optimal segmentation result of the 1 st standard dialect, the segmentation is carried out according to the same sliding window thought until the last 1 standard dialect is completed.

As can be seen from the above description, the recording data segmentation quality inspection system provided in the embodiment of the present application obtains an optimal segmentation text based on a sliding window mode, and realizes accurate segmentation of the recording data and one-to-one correspondence with a standard dialect, thereby improving quality inspection accuracy and reducing time consumption and resource consumption for quality inspection.

In order to further improve the accuracy of the quality inspection of the recorded data segments, in an embodiment of the system for quality inspection of the recorded data segments provided in the present application, a preferred manner of data segmentation is provided, in which the similarity detection unit 201 in the system for quality inspection of the recorded data segments specifically includes a conversational classifier component, and the conversational classifier component is configured to execute the following steps:

It can be understood that the required number of the segments and the standard speech template corresponding to each segment are determined according to the quality inspection points, the standard speech templates of the segments are subjected to text labeling and text enhancement, and the enhanced speech segments are trained in a machine learning mode to form a speech classifier, and the classifier can judge the similarity between the input text and the standard speech of a certain segment. Aiming at the text data, setting a proper length range according to the length of a standard speech template corresponding to the quality inspection point, sliding from the front end of the text according to a sliding window mode, extracting a plurality of segmented characters, respectively judging the similarity of the standard speech corresponding to the specified quality inspection point, and taking the maximum similarity as the optimal segmentation result, thereby obtaining the starting and ending time points of the quality inspection point in the double records.

As can be seen from the above description, the recording data segmentation quality inspection system provided in the embodiment of the present application obtains video data in the time period according to the time period of the optimal segmentation text in the whole recording data on the basis of segmenting audio data to obtain the optimal segmentation text, and further improves the accuracy of video data segmentation.

In order to further improve the accuracy of detecting the similarity of the segmented text, in an embodiment of the segmented quality inspection system for recorded data provided by the present application, a preferred method for training a conversational classifier is provided, referring to fig. 3, in which the segmented quality inspection system for recorded data further specifically includes a conversational classifier training module 60, and the conversational classifier training module 60 is configured to execute the following steps:

It will be appreciated that the linguistic classifier training module 60 generates the linguistic classifier through a method of machine learning. In the actual quality inspection task, the conversational classifier participates in the quality inspection process, and the module does not participate. Referring to fig. 5, the tactical classifier training module 60 specifically includes the following: a dialect configuration unit 61, a dialect labeling unit 62, a dialect enhancement unit 63, and a dialect classification training unit 64. The dialect configuration unit 61 segments the whole dual-recording standard dialect template according to the quality inspection requirement, each audio quality inspection point corresponds to one section of standard dialect in the dialect template, and each video quality inspection point is also definitely located in a certain section of standard dialect. For example: the corresponding standard speech technology template is 'my financial manager XXX (name) which is a XXX mechanism XXX website and is my XXX certificate (such as a license card, an identity card and the like), please check', and whether the video quality inspection point of the certificate is displayed during self introduction also corresponds to the same standard speech technology template. The speech tagging unit 62 performs text tagging on the standard speech template corresponding to the quality inspection point, where the tagged content includes part of speech, named entities (such as organization name, place name, person name, etc.), replaceable words and sentences, deletable words and sentences, and the like. The dialect enhancement unit 63 automatically generates a batch of text corpora under the condition of keeping the semantic unchanged for the dialect marked in the dialect marking unit, wherein the generation method comprises synonym replacement, organization name/place name/person name replacement, partial non-key sentences deletion, partial sentence repetition/disorder and the like, and aims to increase the number of different samples and improve the subsequent model training effect. The linguistic classification training unit 64 is used for training the generated text corpus in a machine learning manner to generate a linguistic classifier, and the classifier can judge the similarity between the input text and a standard linguistic template corresponding to a certain quality inspection point.

From the above description, the recorded data segmented quality inspection system provided in the embodiment of the present application trains the standard speech template and the text corpus in a machine learning manner to generate the speech classifier, and performs similarity detection on the recorded data by using the speech classifier, so as to further improve the reliability of the recorded data quality inspection result.

The recording data segment quality inspection system will be described with reference to the following embodiments, referring to fig. 6.

Step S6101: converting the double-recording audio information into characters by a voice recognition technology;

step S6102: the preprocessing unit acquires relevant parameters from the sliding window unit and sequentially extracts possible segmented texts of each quality inspection point according to the mode of the sliding window;

step S6103: the similarity comparison module detects the similarity between the input segmented text and the standard dialect template corresponding to the quality inspection point;

step S6104: for a certain quality inspection point, the speech classifier component takes the segmented text with the highest similarity as the optimal segmentation, and informs the preprocessing unit to stop extracting the possible segmented text of the quality inspection point and start the segmentation of the next quality inspection point. For the video quality inspection point, parameters such as the front and back extension time length and the like are obtained from the audio segmentation unit, and voice quality inspection and video quality inspection are carried out along with the optimal segmentation information of the quality inspection point.

Step S6105: and the similarity judging module compares the similarity of the optimal segmentation of the quality inspection point, if the similarity is greater than a preset threshold value, the voice quality inspection point is considered to be qualified, and if the similarity is smaller than the preset threshold value, the voice quality inspection point is not qualified.

Step S6106: and the video processing module performs frame extraction on the video in the time period according to the starting and stopping time points and the front and back extension time lengths of the optimal segments in the double-recording video, and the picture quality inspection module detects whether the related pictures meet the quality inspection requirements one by one. The time length is extended before and after the setting, the video content is prevented from deviating from the audio starting and stopping time due to the fact that the audios and videos are asynchronous, and the quality inspection accuracy is improved.

In order to solve the problems that the existing recorded data quality inspection method usually needs additional manpower input for manual review, reduces the service value of a quality inspection system, and the like, the application provides an embodiment of an electronic device for realizing all or part of the content in the recorded data segmented quality inspection method, and the electronic device specifically comprises the following contents:

fig. 7 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 7, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 7 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

In one embodiment, the recorded data segment quality inspection function may be integrated into the central processor. Wherein the central processor may be configured to control:

step S100: the audio data is converted into text data.

As can be seen from the above description, the electronic device provided in the embodiment of the present application converts audio data into text data; carrying out similarity detection on the text data and a preset standard dialect template, and outputting a similarity; comparing the output similarity with a set threshold, and if the similarity is greater than the set threshold, judging that the quality of the audio data is qualified; the audio data is converted into text data to be subjected to similarity detection with a preset standard speech template, manual recheck is not needed to be additionally input, and the service value of the quality inspection system is improved. On the basis of segmenting audio data to obtain an optimal segmented text, frame extraction and picture forming are carried out on video data in a time period corresponding to the optimal segmented text, quality inspection is carried out on pictures, frame extraction and picture forming are not needed to be carried out on the video data recorded in the whole segment, and then the video data are inspected one by one, so that the time consumption and resource consumption of quality inspection are reduced.

In another embodiment, the segmented quality inspection system for the recorded data may be configured separately from the central processor 9100, for example, the segmented quality inspection system for the recorded data may be configured as a chip connected to the central processor 9100, and the segmented quality inspection function for the recorded data is realized by the control of the central processor.

As shown in fig. 7, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 7; further, the electronic device 9600 may further include components not shown in fig. 7, which may be referred to in the art.

As shown in fig. 7, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.

The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.

The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.

The memory 9140 can be a solid state memory, e.g., Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.

The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).

The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.

An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the method for quality inspection of the segmented recorded data in the foregoing embodiment, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements all the steps of the method for quality inspection of the segmented recorded data, where the execution subject of the method is a server or a client, for example, when the processor executes the computer program, the processor implements the following steps:

step S100: the audio data is converted into text data.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for testing the quality of a recorded data segment is characterized in that the recorded data comprises audio data, and the method for testing the quality of the recorded data segment comprises the following steps:

converting the audio data into text data;

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 2, wherein the recorded data further comprises video data, and the method further comprises:

4. The method of claim 2, wherein generating a plurality of segmented texts according to the text data and the byte length of the standard phonetics template comprises:

5. The method of claim 4, wherein each of the sliding windows has a different length.

6. The method for segment quality inspection of recorded data according to claim 2, wherein the performing similarity detection on each segmented text comprises:

7. The method of claim 1, wherein the method further comprises:

8. A system for segment quality inspection of recorded data, wherein the recorded data comprises audio data, the system comprising:

a text conversion module: converting the audio data into text data;

9. The system for segment quality inspection of recorded data according to claim 8, further comprising:

the similarity contrast module comprises:

10. The segmented quality inspection system for recorded data as claimed in claim 9, wherein said recorded data further comprises video data, said segmented quality inspection system further comprising:

11. The system of claim 9, wherein the preprocessing module comprises:

12. The segmented quality inspection system for recorded data according to claim 9, wherein the similarity detection unit comprises:

13. The system for segment quality inspection of recorded data according to claim 9, further comprising:

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of segment quality inspection of recorded data according to any one of claims 1 to 7 when executing the program.

15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of segment quality inspection of recorded data according to any one of claims 1 to 7.