CN117056453A - Data auditing method and device, electronic equipment and computer storage medium - Google Patents

Data auditing method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN117056453A
CN117056453A CN202210469303.2A CN202210469303A CN117056453A CN 117056453 A CN117056453 A CN 117056453A CN 202210469303 A CN202210469303 A CN 202210469303A CN 117056453 A CN117056453 A CN 117056453A
Authority
CN
China
Prior art keywords
answer
data
text
candidate
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210469303.2A
Other languages
Chinese (zh)
Inventor
王博
岳烈骥
孙伟
朱世军
侯普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Beijing Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Beijing Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210469303.2A priority Critical patent/CN117056453A/en
Publication of CN117056453A publication Critical patent/CN117056453A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the technical field of data auditing, and provides a data auditing method, a device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: performing voice transcription on the voice file to be processed through a neural network voice recognition framework of deep full-sequence convolution to obtain a transcription text; processing the transcribed text by combining a Skip-Gram algorithm with text matching to obtain to-be-analyzed problem data and problem types of the transcribed text; combining the problem data to be analyzed and the problem types through a named entity recognition algorithm to determine candidate answer sets; and determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and auditing the final answer by combining the manual processing questionnaire answer. The data auditing method provided by the embodiment of the application accurately determines the final answer of the voice file to be processed, intelligently audits the final answer and the manual processing questionnaire answer, reduces the interference of the artificial subjective factors and improves the accuracy of data auditing.

Description

Data auditing method and device, electronic equipment and computer storage medium
Technical Field
The present application relates to the field of data auditing technologies, and in particular, to a data auditing method, a data auditing device, an electronic device, and a computer storage medium.
Background
The existing data auditing method is as follows: customer complains from customer service centers of electric operators, customer service has inherent questionnaire specifications for complaints customers, and the customers score the operator services. Further, the sound recording is listened to manually, and the sound recording is manually checked with the corresponding questionnaire. Therefore, the existing data auditing method needs to manually perform spot check on the record audiometry, judge the record audiometry in a manual mode, and compare the questionnaire answers extracted after audiometry with the manual record in a manual mode. Because subjective consciousness exists in a manual mode, errors are easy to occur, and the accuracy rate of data auditing is low.
Disclosure of Invention
The application provides a data auditing method, a device, electronic equipment and a computer storage medium, which aim to improve the accuracy of data auditing.
In a first aspect, the present application provides a data auditing method, including:
performing voice transcription on the voice file to be processed through a neural network voice recognition framework of deep full-sequence convolution to obtain a transcription text;
Processing the transcribed text by combining a Skip-Gram algorithm with text matching to obtain to-be-analyzed problem data and problem types of the transcribed text;
combining the problem data to be analyzed and the problem types through a named entity recognition algorithm to determine a candidate answer set;
and determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and carrying out answer auditing by combining the final answer with a manual processing questionnaire answer.
In one embodiment, the processing the transcribed text by the Skip-Gram algorithm in combination with text matching to obtain question data and question types to be analyzed of the transcribed text includes:
performing data fault tolerance processing, punctuation mark removal processing and data standardization processing on the transcribed text to obtain a voice text of the transcribed text;
learning the voice text and the template problem data thereof through a language model of a Skip-Gram algorithm to obtain template problem semantic feature information and source problem semantic feature information;
and obtaining to-be-analyzed problem data and problem types of the transfer text based on the template problem semantic feature information and the source problem semantic feature information.
The obtaining the to-be-analyzed problem data and the problem type of the transfer text based on the template problem semantic feature information and the source problem semantic feature information comprises the following steps:
calculating semantic similarity values of the voice text problem data and the template problem data in the transcribed text through the template problem semantic feature information and the source problem semantic feature information;
comparing the semantic similarity value with a similarity threshold value to obtain a comparison result;
and outputting the problem data to be analyzed and the problem type of the transcribed text according to the comparison result.
The determining a final answer according to the measurement value of each candidate answer in the candidate answer set comprises the following steps:
calculating the metric value of each candidate answer in the candidate answer set, and comparing the magnitude of the metric value of each candidate answer to obtain a comparison result;
and determining the measurement value with the largest value in the measurement values of the candidate answers according to the comparison result, and determining the candidate answer with the largest value as the final answer.
The step of determining a candidate answer set by combining the question data to be analyzed and the question type through a named entity recognition algorithm comprises the following steps:
Determining each keyword information in the problem data to be analyzed, and determining query conditions according to each keyword information;
returning candidate segment texts of the problem data to be analyzed according to the query conditions, and performing text filtering on the candidate segment texts to obtain target problem texts;
and determining answer types according to the question types, and determining the candidate answer set by combining the target question text and the answer types through the named entity recognition algorithm.
The step of determining the candidate answer set by combining the target question text and the answer type through the named entity recognition algorithm comprises the following steps:
identifying the target question text through the named entity identification algorithm according to the answer type to obtain each candidate entity of the target question text;
and determining each candidate entity as each candidate answer, and collecting each candidate answer to obtain the candidate answer set.
And the step of carrying out answer auditing by combining the final answers with manual processing questionnaire answers comprises the following steps:
the final answer is subjected to answer consistency comparison with the manual processing questionnaire answer, and a comparison result is obtained;
If the comparison result is that the final answer is consistent with the answer of the manual processing questionnaire answer, determining that the final answer is correct;
and if the comparison result is that the answer of the final answer is inconsistent with the answer of the manual processing questionnaire answer, determining that the final answer is wrong.
In a second aspect, the present application provides a data auditing apparatus comprising:
the voice transcription module is used for carrying out voice transcription on the voice file to be processed through the neural network voice recognition framework of the deep full-sequence convolution to obtain a transcription text;
the processing module is used for processing the transcribed text through Skip-Gram algorithm and text matching, and obtaining to-be-analyzed problem data and problem types of the transcribed text;
the determining module is used for determining a candidate answer set by combining the to-be-analyzed problem data and the problem type through a named entity recognition algorithm;
and the determining auditing module is used for determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and carrying out answer auditing by combining the final answer with a manual processing questionnaire answer.
In a third aspect, the present application also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the data auditing method of the first aspect when executing the program.
In a fourth aspect, the present application also provides a non-transitory computer readable storage medium comprising a computer program which, when executed by the processor, implements the data auditing method of the first aspect.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by the processor, implements the data auditing method of the first aspect.
According to the data auditing method, the device, the electronic equipment and the computer storage medium, in the process of data auditing, the final answer of the voice file to be processed is accurately determined through the neural network voice recognition framework, the Skip-Gram algorithm, the text matching, the named entity recognition algorithm and the metric values of all candidate answers of the deep full-sequence convolution, and the final answer and the manual processing questionnaire answer are intelligently audited, so that the interference of human subjective factors is reduced, and the accuracy of data auditing is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the following description will be given with a brief introduction to the drawings used in the embodiments or the description of the prior art, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained from these drawings without the inventive effort of a person skilled in the art.
FIG. 1 is a schematic flow chart of a data auditing method according to the present application;
FIG. 2 is a second flow chart of the data auditing method according to the present application;
FIG. 3 is a third flow chart of the data auditing method according to the present application;
FIG. 4 is a flow chart of a data auditing method according to the present application;
FIG. 5 is a schematic diagram of a data auditing apparatus according to the present application;
fig. 6 is a schematic structural diagram of an electronic device provided by the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The data auditing method, the device, the electronic equipment and the computer storage medium provided by the application are described with reference to fig. 1 to 6. FIG. 1 is a schematic flow chart of a data auditing method according to the present application; FIG. 2 is a second flow chart of the data auditing method according to the present application; FIG. 3 is a third flow chart of the data auditing method according to the present application; FIG. 4 is a flow chart of a data auditing method according to the present application; FIG. 5 is a schematic diagram of a data auditing apparatus according to the present application; fig. 6 is a schematic structural diagram of an electronic device provided by the present application.
The embodiments of the present application provide embodiments of a data auditing method, it being noted that although a logic sequence is shown in the flow chart, under certain data, the steps shown or described may be accomplished in a different order than that shown or described herein.
The embodiment of the application takes the electronic equipment as an execution main body for example, and takes the data auditing system as one of the expression forms of the electronic equipment, and the embodiment of the application is not limited.
Referring to fig. 1, fig. 1 is a flow chart of a data auditing method according to the present application. The data auditing method provided by the embodiment of the application comprises the following steps:
and S10, performing voice transcription on the voice file to be processed through a neural network voice recognition framework of deep full-sequence convolution to obtain a transcription text.
When the user carries out service consultation or service complaints through the telephone, the data auditing system can generate a voice file to be processed of the service consultation or service complaints of the user.
Further, the data auditing system performs voice transcription on the voice file to be processed through a neural network voice recognition framework (Deep Fully Convolutional Neural Network, DFCNN) of deep full-sequence convolution to obtain a transcription text of the voice file to be processed. The neural network voice recognition framework based on deep full-sequence convolution directly models the whole sentence of voice signals by using a large number of convolution layers, and the voice transcription service consists of a voice transcription engine, a corresponding acoustic model and a language model.
And S20, processing the transcribed text by combining a Skip-Gram algorithm with text matching to obtain to-be-analyzed problem data and problem types of the transcribed text.
Further, the data auditing system predicts context of the transcribed text through Skip-Gram algorithm, and since the neural network training the language model can only receive numerical value input, a word string cannot be used as input, specifically: a training document is used to construct a vocabulary (vocabolar) and then the words are encoded to meet the input requirements of the neural network, representing an input word as a one-hot vector. If the size of the vocabulary to be built is N, then there are N elements for each word's corresponding vector. The position where the word appears is further set to 1 and the other positions are set to 0. The output of the neural network is a single vector, still containing N elements, with the value of each element representing the probability that for each word in the dictionary, the randomly selected word is that word in the dictionary, thereby determining the location of the question data to be analyzed in the transcribed text.
Text matching is one of the core questions in natural language understanding, meaning comparing two sentences and judging the relationship between sentences, and can be applied to a large number of natural language processing tasks, such as information retrieval, question-answering systems, question-and-answer systems, dialogue systems and machine translation. The natural language processing task can be greatly abstracted into text matching problems, for example, information retrieval can be reduced to matching of search words and document resources, a question-answering system can be reduced to matching of questions and candidate answers, a repeated question can be reduced to matching of two synonymous sentences, a dialogue system can be reduced to matching of a dialogue of a previous sentence and a reply, and machine translation can be reduced to matching of two languages.
Therefore, the data auditing system firstly needs to preprocess the transcribed text, and inputs the preprocessed transcribed text and the template identification ID into a language model based on Skip-Gram algorithm. Then, the data auditing system performs text matching by combining the preprocessed transcribed text and template problem data corresponding to the template identification ID through a language model of the Skip-Gram algorithm, and outputs problem data to be analyzed and problem types of the transcribed text, which are specifically described in step S201 to step S203. Among them, the question types include, but are not limited to, entity type questions (Entity), description type questions (Description), and non-type questions (YesNo). For entity type questions, the answer is typically a single deterministic answer, e.g., do you score a few minutes for a chinese telecom broadband portal installation service? For non-type questions, the answer is typically simpler, and the answer is yes or no, e.g., whether a home broadband related question was consulted or complained with chinese telecom in the last half year? For description type questions, the answer is generally longer, and may be a summary or abstract of multiple sentences, typical of How/why type questions, such as: what is suggested to our service you have.
Further, for the vertical field of questionnaire, the question correlation technique is compared with a fixed specification, and the ID mapping is performed on the question types according to a fixed template. Each question corresponds to a unique question ID, so the question type corresponding to the question can be obtained from the question ID.
And step S30, determining a candidate answer set by combining the to-be-analyzed question data and the question type through a named entity recognition algorithm.
Further, the data auditing system determines candidate segment texts in the problem data to be analyzed according to the keyword information in the problem data to be analyzed. And carrying out text filtering on the candidate segment texts of the data auditing system to obtain a target problem text. Finally, the data auditing system determines answer types according to the question types, and combines the target question text and the answer types through a named entity recognition algorithm to determine candidate answer sets, specifically, the steps S301 to S303 are performed.
And step S40, determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and carrying out answer auditing by combining the final answer with a manual processing questionnaire answer.
Further, the data auditing system calculates the metric value of each candidate answer in the candidate answer set, and compares the magnitude of the metric value of each candidate answer to obtain a comparison result. And then, the data auditing system determines the final candidate answer in the metric values of the candidate answers according to the comparison result, and determines the final candidate answer as the final answer. And then, the data auditing system compares the answer with the answer of the manual processing questionnaire, so that the answer auditing of the final answer is completed, and the steps are specifically described in the steps S401 to S403.
According to the data auditing method provided by the embodiment of the application, in the process of data auditing, the final answer of the voice file to be processed is accurately determined through the neural network voice recognition framework, skip-Gram algorithm, text matching, named entity recognition algorithm and metric values of each candidate answer of deep full-sequence convolution, and the final answer and the manual processing questionnaire answer are intelligently audited, so that the interference of artificial subjective factors is reduced, and the accuracy of data auditing is improved.
Further, the descriptions of step S401 to step S403 are as follows:
step S401, carrying out answer consistency comparison on the final answer and the manual processing questionnaire answer to obtain a comparison result;
step S402, if the comparison result is that the final answer is consistent with the answer of the manual processing questionnaire answer, determining that the final answer is correct;
step S403, if the comparison result is that the answer of the final answer is inconsistent with the answer of the manual processing questionnaire answer, determining that the final answer is wrong.
Specifically, the data auditing system firstly needs to perform standardized processing on the final answer according to the question type, namely, the final answer is converted into a fixed labeling form according to the question type. For entity type questions, namely scoring type questions, the labeling form of the final answers is digital scores, such as scoring three points and two points of customer service satisfaction, and the scoring is converted into less than 5 points through simple reasoning. For whether a question is of the type, the final answer is noted in the form of "yes" or "no". The case where no answer is extracted is identified by a character having a specific meaning.
It should be further noted that, the answers of the manual processing questionnaire are carried in the dat format questionnaire filled in manually, and the format of the dat format questionnaire is a standard structure of questions-answers. Therefore, the data auditing system needs to perform structural analysis on the dat format questionnaire to obtain manual processing questionnaire answers in the dat format questionnaire.
Further, the method comprises the steps of. And the data auditing system compares the answer consistency between the final answer and the manual processing questionnaire answer to obtain a comparison result, wherein the comparison result can be that the answer consistency between the final answer and the manual processing questionnaire answer is achieved, and the comparison result can also be that the answer consistency between the final answer and the manual processing questionnaire answer is achieved. It should be further noted that, if the answer is consistent, the final answer is not the same as the answer of the manual processing questionnaire, and the similarity between the final answer and the answer of the manual processing questionnaire reaches a preset similarity threshold, that is, the answer of the final answer and the answer of the manual processing questionnaire are determined to be consistent, where the preset threshold is manually set, for example, the preset similarity threshold is 90%, 95% or the like.
Further, if the comparison result is determined to be consistent with the answer of the manual processing questionnaire answer, the data auditing system determines that the final answer is correct. If the comparison result is determined to be inconsistent with the answer of the manual processing questionnaire answer, the data auditing system determines that the final answer is wrong and sends prompt information to the user terminal of the staff. And the staff carries out the auditing again on the final answer according to the prompt information.
According to the embodiment of the application, the final answers and the manual processing questionnaire answers are intelligently checked, the interference of manual subjective factors is reduced, and the accuracy of data checking is improved. Meanwhile, the final answer can be double-audited in a manual mode, so that the accuracy of data audit is improved by combining the intelligent and manual double-audit modes.
Referring to fig. 2, fig. 2 is a second flowchart of the data auditing method according to the present application. Step S20 includes steps S201 to S203:
step S201, performing data fault tolerance processing, punctuation mark removal processing and data standardization processing on the transcribed text to obtain a voice text of the transcribed text;
step S202, learning the voice text and the template problem data thereof through a language model of a Skip-Gram algorithm to obtain template problem semantic feature information and source problem semantic feature information;
step S203, obtaining to-be-analyzed problem data and problem types of the transfer text based on the template problem semantic feature information and the source problem semantic feature information.
Specifically, the data auditing system performs preprocessing on the transcribed text, including but not limited to data fault tolerance processing, punctuation mark removal processing and data standardization processing, so that it can be understood that the data auditing system performs data fault tolerance processing, punctuation mark removal processing and data standardization processing on the transcribed text to obtain a voice text of the transcribed text.
Further, the data auditing system determines template problem data according to the inputted template identification ID, inputs the voice text and the template problem data into a language model of a Skip-Gram algorithm, learns through the language model of the Skip-Gram algorithm, and extracts semantic information contained in the voice text and the template problem data, namely, the source problem semantic feature information in the voice text and the template problem semantic feature information in the template problem data can be understood as being extracted through the language model of the Skip-Gram algorithm.
Further, the data auditing system calculates semantic similarity values of the voice text problem data and the template problem data in the transcribed text by combining source problem semantic feature information in the voice text and template problem semantic feature information in the template problem data through a preset similarity calculation formula, wherein the preset similarity calculation formula can be a cosine similarity calculation formula. Finally, the data auditing system outputs an experimental result of the transcribed text according to the semantic similarity value, wherein the experimental result includes, but is not limited to, template problem data, problem data to be analyzed, the semantic similarity value and the problem type, so that the problem data to be analyzed and the problem type of the transcribed text are obtained, and the method specifically includes steps S2031 to S2033.
The text matching method based on Skip-Gram algorithm is applied to data auditing, and can well identify and match questionnaire questions in template question data and question data to be analyzed of voice texts, so that judgment errors caused by wrong question matching are avoided, and auditing accuracy is improved.
Further, the descriptions of step S2031 to step S2033 are as follows:
step S2031, calculating semantic similarity values of the speech text question data and the template question data in the transcribed text through the template question semantic feature information and the source question semantic feature information;
step S2032, comparing the semantic similarity value with a similarity threshold to obtain a comparison result;
step S2033, outputting the question data to be analyzed and the question type of the transcribed text according to the comparison result.
Specifically, the data auditing system calculates semantic similarity values of the voice text problem data and the template problem data in the transcribed text through the template problem semantic feature information and the source problem semantic feature information. And then, the data auditing system compares the semantic similarity value with a similarity threshold value in value to obtain a comparison result, wherein the comparison result can be that the semantic similarity value is larger than or equal to the similarity threshold value, and the comparison result can also be that the semantic similarity value is smaller than the similarity threshold value, and the similarity threshold value is set by a technician according to actual conditions.
Further, the data auditing system outputs a target experiment result with a semantic similarity value larger than or equal to a similarity threshold value as a comparison result, and determines source problem data and problem types in the target experiment result as problem data and problem types to be analyzed of the transcribed text.
According to the embodiment of the application, the to-be-analyzed problem data and the problem type of the transcribed text are output according to the semantic similarity values of the semantic feature information of the template problem and the semantic feature information of the source problem and the similarity threshold value, so that the questionnaire problem in the template problem data and the to-be-analyzed problem data of the voice text can be well identified and matched, the condition of judgment errors caused by the problem matching errors is avoided, and the auditing accuracy is improved.
Referring to fig. 3, fig. 3 is a third flowchart of the data auditing method according to the present application. Step S30 includes steps S301 to S303 including:
step S301, determining each keyword information in the problem data to be analyzed, and determining a query condition according to each keyword information;
step S302, returning candidate segment texts of the problem data to be analyzed according to the query conditions, and performing text filtering on the candidate segment texts to obtain target problem texts;
Step S303, determining answer types according to the question types, and determining the candidate answer sets by combining the target question text and the answer types through the named entity recognition algorithm.
It should be noted that, the query condition is composed by using the keywords of the question, and is provided for the search engine to search and return the relevant documents or paragraphs; the information retrieval module gradually converts a large amount of text information into accurate information related to the questions, so that the workload of the answer extraction module is reduced, and more accurate answers can be extracted.
Therefore, the data auditing system determines each keyword information in the problem data to be analyzed, and determines the query condition according to each keyword information. And then, the data auditing system inputs the query condition into a search engine for searching, and returns candidate segment texts of the problem data to be analyzed. Further, because the business processes in the field are relatively standard, the answer to the question is generally given in the current question q current And the next problem q next Between them. Thus, for text at the input chapter level, the current question q is passed current Next problem q next Candidate segment text pcandidate is selected. For the last question, the part of the question to the tail of the chapter is truncated as candidate segment text.
Further, the data auditing system carries out text filtering on the candidate segment text through a classifier, and filters out speech or noise irrelevant to the problem to obtain a target problem text.
It should be noted that, the answer extraction module is a key module for generating the correct answer finally by the question-answering system, and is responsible for presenting the final answer to the user, and is a core part of the question-answering system, and the advantages and disadvantages of the answer extraction algorithm in the answer extraction module directly affect the performance of the question-answering system, so that the named entity recognition algorithm is preferred as the answer extraction algorithm in the answer extraction module in the embodiment of the application.
Therefore, the data auditing system determines answer types according to the question types, wherein the question types and the answer types are corresponding, the solid type questions are corresponding to the entity type answers, the description type questions are corresponding to the description type answers, and the non-type questions are corresponding to the non-type answers.
Further, the data auditing system combines the target question text and the answer type through a named entity recognition algorithm to determine a candidate answer set, specifically, as shown in step S3031 to step S3032.
According to the embodiment of the application, the candidate answer set can be accurately determined by combining the problem data to be analyzed and the problem types through a named entity recognition algorithm, so that basic data is provided for improving the accuracy of data auditing.
Further, the descriptions of step S3031 to step S3032 are as follows:
step S3031, identifying the target question text by a named entity identification algorithm according to the answer type to obtain each candidate entity of the target question text;
step S3032, determining each candidate entity as each candidate answer, and collecting each candidate answer to obtain the candidate answer set.
Specifically, the data auditing system root identifies each candidate entity related to the answer type through a named entity identification algorithm, such as identifying each candidate entity related to the answer type as a score value type and an operator name type through a named entity identification algorithm. And then, the data auditing system classifies and gathers each candidate entity according to the answer types to obtain a candidate entity set. Further, the data auditing system determines each candidate entity as each candidate answer, so that the candidate entity set is also a candidate answer set, and it can be understood that each candidate answer is collected to obtain the candidate answer set.
According to the embodiment of the application, the candidate answer set can be accurately determined by combining the problem data to be analyzed and the problem types through a named entity recognition algorithm, so that basic data is provided for improving the accuracy of data auditing.
Referring to fig. 4, fig. 4 is a flowchart illustrating a data auditing method according to the present application. Step S40 includes steps S404 to S405 including:
step S404, calculating the metric value of each candidate answer in the candidate answer set, and comparing the magnitude of the metric value of each candidate answer to obtain a comparison result;
step S405, determining, according to the comparison result, a metric value with the largest value among the metrics of the candidate answers, and determining the candidate answer with the largest value as the final answer.
It should be noted that, through the comparison standard of customer service questions in the process of finding out the questionnaire by analyzing the data, the interviewees are matched, some questions and answers are brief, some situations are that the interviewees are not clear or do not know the answers to the questions, and the processes of clarifying, guiding and recordation of the interviewee questions exist, so the characteristics are strong. The answer to the current question is typically closer to the next question, sometimes contained in the same paragraph of the next question. Therefore, the data auditing system needs to measure each candidate answer in the candidate answer set with a parameter s, that is, calculate a measure value of each candidate answer in the candidate answer set, where each candidate answer is represented by e.
The calculation formula of the metric value s of each candidate answer e is s i =□*(d i /l+r+c+n i N), where d i To question q in candidate segment text for current entity current Is a distance of (2); l is the segment tail of the candidate segment text and question q current Is a distance of (2); r is a character information metric value; c is an error correction entity metric value; n is n i The number of occurrences of the current entity in the candidate segment text; n is the total number of occurrences of the same type of entity in the candidate segment text; and the value of the sentence pattern of the sentence where the entity is 0.1 when the sentence pattern of the sentence is a question, and the value of the sentence is 1 when the sentence pattern is a non-question.
Further, the data auditing system will eachAnd comparing the values of the metrics s of the candidate answers e to obtain a comparison result. Then, the data auditing system determines the measurement value s of each candidate answer e according to the comparison result i Metric s with the greatest median value top I.e. s top =max{s 1 ,s 2 ,...,s N }. And finally, the data auditing system determines the candidate answer of the metric value stop with the largest value as a final answer.
According to the embodiment of the application, the final answer is determined through the measurement value of each candidate answer in the candidate answer set, so that the accuracy of the determined final answer is ensured, and basic data is provided for improving the accuracy of data auditing.
Further, the data auditing device provided by the application is described below, and the data auditing device and the data auditing method can be correspondingly referred to each other.
As shown in fig. 5, fig. 5 is a schematic structural diagram of a data auditing device provided by the present application, where the data auditing device includes:
the voice transcription module 501 is configured to perform voice transcription on a voice file to be processed through a neural network voice recognition framework of deep full-sequence convolution to obtain a transcription text;
the processing module 502 is configured to process the transcribed text by combining a Skip-Gram algorithm with text matching, so as to obtain to-be-analyzed problem data and problem types of the transcribed text;
a determining module 503, configured to determine a candidate answer set by combining the to-be-analyzed question data and the question type through a named entity recognition algorithm;
and the determining auditing module 504 is configured to determine a final answer according to the metric value of each candidate answer in the candidate answer set, and perform answer auditing by combining the final answer with a manual processing questionnaire answer.
Further, the processing module 502 is further configured to:
performing data fault tolerance processing, punctuation mark removal processing and data standardization processing on the transcribed text to obtain a voice text of the transcribed text;
Learning the voice text and the template problem data thereof through a language model of a Skip-Gram algorithm to obtain template problem semantic feature information and source problem semantic feature information;
and obtaining to-be-analyzed problem data and problem types of the transfer text based on the template problem semantic feature information and the source problem semantic feature information.
Further, the processing module 502 is further configured to:
calculating semantic similarity values of the voice text problem data and the template problem data in the transcribed text through the template problem semantic feature information and the source problem semantic feature information;
comparing the semantic similarity value with a similarity threshold value to obtain a comparison result;
and outputting the problem data to be analyzed and the problem type of the transcribed text according to the comparison result.
Further, the determining module 503 is further configured to:
calculating the metric value of each candidate answer in the candidate answer set, and comparing the magnitude of the metric value of each candidate answer to obtain a comparison result;
and determining the measurement value with the largest value in the measurement values of the candidate answers according to the comparison result, and determining the candidate answer with the largest value as the final answer.
Further, the determine auditing module 504 is also configured to:
determining each keyword information in the problem data to be analyzed, and determining query conditions according to each keyword information;
returning candidate segment texts of the problem data to be analyzed according to the query conditions, and performing text filtering on the candidate segment texts to obtain target problem texts;
and determining answer types according to the question types, and determining the candidate answer set by combining the target question text and the answer types through the named entity recognition algorithm.
Further, the determine auditing module 504 is also configured to:
identifying the target question text through the named entity identification algorithm according to the answer type to obtain each candidate entity of the target question text;
and determining each candidate entity as each candidate answer, and collecting each candidate answer to obtain the candidate answer set.
Further, the determine auditing module 504 is also configured to:
the final answer is subjected to answer consistency comparison with the manual processing questionnaire answer, and a comparison result is obtained;
if the comparison result is that the final answer is consistent with the answer of the manual processing questionnaire answer, determining that the final answer is correct;
And if the comparison result is that the answer of the final answer is inconsistent with the answer of the manual processing questionnaire answer, determining that the final answer is wrong.
The specific embodiment of the data auditing device provided by the application is basically the same as the embodiments of the data auditing method, and is not described herein.
Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, where the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a data auditing method that includes:
performing voice transcription on the voice file to be processed through a neural network voice recognition framework of deep full-sequence convolution to obtain a transcription text;
processing the transcribed text by combining a Skip-Gram algorithm with text matching to obtain to-be-analyzed problem data and problem types of the transcribed text;
combining the problem data to be analyzed and the problem types through a named entity recognition algorithm to determine a candidate answer set;
And determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and carrying out answer auditing by combining the final answer with a manual processing questionnaire answer.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present application also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method of auditing data provided by the methods described above, the method comprising:
Performing voice transcription on the voice file to be processed through a neural network voice recognition framework of deep full-sequence convolution to obtain a transcription text;
processing the transcribed text by combining a Skip-Gram algorithm with text matching to obtain to-be-analyzed problem data and problem types of the transcribed text;
combining the problem data to be analyzed and the problem types through a named entity recognition algorithm to determine a candidate answer set;
and determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and carrying out answer auditing by combining the final answer with a manual processing questionnaire answer.
In yet another aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the data auditing methods provided above, the method comprising:
performing voice transcription on the voice file to be processed through a neural network voice recognition framework of deep full-sequence convolution to obtain a transcription text;
processing the transcribed text by combining a Skip-Gram algorithm with text matching to obtain to-be-analyzed problem data and problem types of the transcribed text;
Combining the problem data to be analyzed and the problem types through a named entity recognition algorithm to determine a candidate answer set;
and determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and carrying out answer auditing by combining the final answer with a manual processing questionnaire answer.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A method of auditing data, comprising:
performing voice transcription on the voice file to be processed through a neural network voice recognition framework of deep full-sequence convolution to obtain a transcription text;
processing the transcribed text by combining a Skip-Gram algorithm with text matching to obtain to-be-analyzed problem data and problem types of the transcribed text;
combining the problem data to be analyzed and the problem types through a named entity recognition algorithm to determine a candidate answer set;
and determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and carrying out answer auditing by combining the final answer with a manual processing questionnaire answer.
2. The data auditing method according to claim 1, wherein the processing the transcribed text by Skip-Gram algorithm in combination with text matching to obtain question data and question types to be analyzed of the transcribed text includes:
performing data fault tolerance processing, punctuation mark removal processing and data standardization processing on the transcribed text to obtain a voice text of the transcribed text;
learning the voice text and the template problem data thereof through a language model of a Skip-Gram algorithm to obtain template problem semantic feature information and source problem semantic feature information;
and obtaining to-be-analyzed problem data and problem types of the transfer text based on the template problem semantic feature information and the source problem semantic feature information.
3. The method for auditing data according to claim 2, wherein the obtaining the question data to be analyzed and the question type of the transcribed text based on the template question semantic feature information and the source question semantic feature information includes:
calculating semantic similarity values of the voice text problem data and the template problem data in the transcribed text through the template problem semantic feature information and the source problem semantic feature information;
Comparing the semantic similarity value with a similarity threshold value to obtain a comparison result;
and outputting the problem data to be analyzed and the problem type of the transcribed text according to the comparison result.
4. The method of claim 1, wherein determining a final answer based on the metric values for each candidate answer in the set of candidate answers comprises:
calculating the metric value of each candidate answer in the candidate answer set, and comparing the magnitude of the metric value of each candidate answer to obtain a comparison result;
and determining the measurement value with the largest value in the measurement values of the candidate answers according to the comparison result, and determining the candidate answer with the largest value as the final answer.
5. The data auditing method according to claim 1, wherein the determining a candidate answer set by a named entity recognition algorithm in combination with the question data to be analyzed and the question type comprises:
determining each keyword information in the problem data to be analyzed, and determining query conditions according to each keyword information;
returning candidate segment texts of the problem data to be analyzed according to the query conditions, and performing text filtering on the candidate segment texts to obtain target problem texts;
And determining answer types according to the question types, and determining the candidate answer set by combining the target question text and the answer types through the named entity recognition algorithm.
6. The data auditing method of claim 5, wherein said determining the candidate answer set by the named entity recognition algorithm in combination with the target question text and the answer type comprises:
identifying the target question text through the named entity identification algorithm according to the answer type to obtain each candidate entity of the target question text;
and determining each candidate entity as each candidate answer, and collecting each candidate answer to obtain the candidate answer set.
7. The method of claim 1, wherein said auditing the final answer in combination with manually processed questionnaire answers comprises:
the final answer is subjected to answer consistency comparison with the manual processing questionnaire answer, and a comparison result is obtained;
if the comparison result is that the final answer is consistent with the answer of the manual processing questionnaire answer, determining that the final answer is correct;
And if the comparison result is that the answer of the final answer is inconsistent with the answer of the manual processing questionnaire answer, determining that the final answer is wrong.
8. A data auditing apparatus, comprising:
the voice transcription module is used for carrying out voice transcription on the voice file to be processed through the neural network voice recognition framework of the deep full-sequence convolution to obtain a transcription text;
the processing module is used for processing the transcribed text through Skip-Gram algorithm and text matching, and obtaining to-be-analyzed problem data and problem types of the transcribed text;
the determining module is used for determining a candidate answer set by combining the to-be-analyzed problem data and the problem type through a named entity recognition algorithm;
and the determining auditing module is used for determining a final answer according to the measurement value of each candidate answer in the candidate answer set, and carrying out answer auditing by combining the final answer with a manual processing questionnaire answer.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data auditing method of any of claims 1 to 7 when the computer program is executed by the processor.
10. A non-transitory computer readable storage medium comprising a computer program, wherein the computer program when executed by a processor implements the data auditing method of any of claims 1 to 7.
CN202210469303.2A 2022-04-28 2022-04-28 Data auditing method and device, electronic equipment and computer storage medium Pending CN117056453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210469303.2A CN117056453A (en) 2022-04-28 2022-04-28 Data auditing method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210469303.2A CN117056453A (en) 2022-04-28 2022-04-28 Data auditing method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN117056453A true CN117056453A (en) 2023-11-14

Family

ID=88666779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210469303.2A Pending CN117056453A (en) 2022-04-28 2022-04-28 Data auditing method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN117056453A (en)

Similar Documents

Publication Publication Date Title
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
CN108304372B (en) Entity extraction method and device, computer equipment and storage medium
WO2021000408A1 (en) Interview scoring method and apparatus, and device and storage medium
EP1787288B1 (en) Automated extraction of semantic content and generation of a structured document from speech
CN110717018A (en) Industrial equipment fault maintenance question-answering system based on knowledge graph
CN110717021B (en) Input text acquisition and related device in artificial intelligence interview
WO2021208444A1 (en) Method and apparatus for automatically generating electronic cases, a device, and a storage medium
CN111177307A (en) Test scheme and system based on semantic understanding similarity threshold configuration
CN116127056A (en) Medical dialogue abstracting method with multi-level characteristic enhancement
CN111523317B (en) Voice quality inspection method and device, electronic equipment and medium
US10224036B2 (en) Automated identification of verbal records using boosted classifiers to improve a textual transcript
CN112288584B (en) Insurance report processing method and device, computer readable medium and electronic equipment
CN113535925A (en) Voice broadcasting method, device, equipment and storage medium
CN115983285A (en) Questionnaire auditing method, device, electronic equipment and storage medium
CN117056453A (en) Data auditing method and device, electronic equipment and computer storage medium
CN115510213A (en) Question answering method and system for working machine and working machine
CN112071304B (en) Semantic analysis method and device
CN113987141A (en) Question-answering system answer reliability instant checking method based on recursive query
CN111341304A (en) Method, device and equipment for training speech characteristics of speaker based on GAN
KR20200072005A (en) Method for correcting speech recognized sentence
CN117113947B (en) Form filling system, method, electronic equipment and storage medium
CN117037795A (en) Response method and device for voice problem, electronic equipment and storage medium
CN116504391A (en) Intelligent follow-up visit quality control evaluation system, method and device
CN113744737A (en) Training of speech recognition model, man-machine interaction method, equipment and storage medium
CN117935783A (en) Follow-up voice recognition self-checking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination