CN110909162B - Text quality inspection method, storage medium and electronic equipment - Google Patents

Text quality inspection method, storage medium and electronic equipment Download PDF

Info

Publication number
CN110909162B
CN110909162B CN201911118009.1A CN201911118009A CN110909162B CN 110909162 B CN110909162 B CN 110909162B CN 201911118009 A CN201911118009 A CN 201911118009A CN 110909162 B CN110909162 B CN 110909162B
Authority
CN
China
Prior art keywords
quality inspection
quality
text data
text
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911118009.1A
Other languages
Chinese (zh)
Other versions
CN110909162A (en
Inventor
聂镭
李睿
聂颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Longma Zhixin Zhuhai Hengqin Technology Co ltd
Original Assignee
Longma Zhixin Zhuhai Hengqin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Longma Zhixin Zhuhai Hengqin Technology Co ltd filed Critical Longma Zhixin Zhuhai Hengqin Technology Co ltd
Priority to CN201911118009.1A priority Critical patent/CN110909162B/en
Publication of CN110909162A publication Critical patent/CN110909162A/en
Application granted granted Critical
Publication of CN110909162B publication Critical patent/CN110909162B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

According to the quality inspection model training method provided by the invention, a trained quality inspection model can be obtained through the training method, the quality inspection model does not need to set a quality inspection expression manually, and a result of whether the quality inspection passes or not can be obtained only by inputting the dialect script and the text to be inspected into the model, so that the quality inspection process is simpler and more efficient, and a quality inspection system using the quality inspection model is more intelligent. The method for the quality inspection of the text can realize full-automatic text quality inspection without human participation, realizes automatic extraction of the quality inspection standard words of the dialect script, and simultaneously utilizes the relative distance value of the quality inspection word pair obtained by combining the quality inspection standard words in pairs as the standard of quality inspection, thereby not needing to formulate a complex quality inspection expression and leading the quality inspection mode to be more intelligent, simple and efficient.

Description

Text quality inspection method, storage medium and electronic equipment
Technical Field
The invention relates to the field of natural language processing, in particular to a text quality inspection method, a storage medium and electronic equipment.
Background
In the process of telephone sales, in order to ensure the compliance of sales, the agents need to be promoted according to laws and regulations and the regulation and regulations of companies. Industries with high compliance requirements, such as insurance, even require agents to promote strictly according to standard conversational scripts. In order to ensure the working quality of the seat personnel, the quality of the call records of the seat personnel needs to be detected by quality inspection personnel. The traditional quality inspection mode is that quality inspection personnel, according to the quality inspection main points of the phone script, carry out quality inspection through the mode of listening to the recording by hand. This method of quality testing is not only inefficient, but also can be performed only by means of spot tests, where a large number of telephone recordings cannot be quality tested.
In the prior art, in order to solve the problem of low efficiency of the manual quality inspection, various intelligent quality inspection systems using the technologies of voice recognition, natural semantic analysis and the like are provided, so that the full-quality inspection of voice can be realized, and the quality inspection efficiency is improved to a great extent. However, in the use of these quality inspection systems, quality inspection personnel need to manually set quality inspection expressions first, which requires that the quality inspection personnel can fully understand and flexibly apply these quality inspection rules, otherwise, the quality inspection expressions, once set incorrectly, will cause errors in the quality inspection results. For example, in a quality inspection system of a certain company, the quality inspection rule is very complex, so that a quality inspector using the system is easy to make mistakes. The quality inspection expression is composed of rules, conditions, inspection ranges and operator combinations, and the quality inspection expression is formed by various complex combinations, so when a quality inspector uses the system on duty, the system needs to be trained in advance, and the training cost of the quality inspector is increased. Meanwhile, due to the complex operation of the system, quality inspection personnel are inevitable when operation errors occur, and therefore the quality inspection result is inaccurate.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. In view of this, the present invention provides a method for training a quality inspection model, which includes:
acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
acquiring training data of a quality inspection model, wherein the training data comprises a positive sample and a negative sample, the positive sample is text data with a passing quality inspection result, and the negative sample is text data with a failing quality inspection result;
according to the quality inspection standard words, vectorizing the text data of the training data to obtain vectors of the text data;
and training the quality inspection model according to the vector of the text data to obtain the quality inspection model after training.
Further, the markup of the tagged dialog script includes: highlighted text and/or text marked with a shading.
Further, extracting quality control standard words from the tagged conversational script comprises:
extracting marked characters corresponding to the marks from the marked dialect script, and performing word segmentation processing on the marked characters to obtain word segmentation results; performing numerical value transformation on the word segmentation result by using an IDF method to obtain an IDF value of the word segmentation result; and obtaining the quality inspection standard word according to the IDF value.
Further, according to the quality inspection standard words, vectorizing the text data of the training data to obtain vectors of the text data, including:
positioning and marking the absolute positions of all the quality detection standard words in the text data, and if one corresponding quality detection standard word does not exist in the text data, marking the absolute position of the quality detection standard word in the text data as a specific value; combining the quality inspection standard words pairwise to obtain quality inspection word pairs, and calculating the relative distance of the quality inspection word pairs; taking the value of the relative distance as a vector element of the text data; and obtaining the vector of the text data according to the vector elements.
Further, training the quality inspection model according to the vector of the text data to obtain the quality inspection model after training, including:
inputting the vector of the text data of the positive sample into the quality inspection model, and enabling the output result of the quality inspection model to be 1;
inputting the vector of the text data of the negative sample into the quality inspection model, and enabling the output result of the quality inspection model to be 0;
and obtaining the optimal parameters of the quality inspection model by using a standard equation method, and obtaining the quality inspection model after training.
Further, the quality inspection model is a logistic regression model:
a1x1+a2x2+a3x3+…+anxn=y
where n is the number of elements of the vector of text data, a1、a2、a3…anAs a parameter of the quality inspection model, x1、x2、x3…xnAnd y is the output of the quality inspection model, the value is 1 or 0, 1 represents that the quality inspection result is pass 0, and represents that the quality inspection result is fail.
The invention also provides a text quality inspection method, which comprises the following steps:
acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
acquiring a text to be subjected to quality inspection, wherein the text to be subjected to quality inspection is text data generated by carrying out dialogue according to the dialect script;
according to the quality inspection standard words, vectorizing the text data of the text to be quality inspected to obtain a vector of the text data of the text to be quality inspected;
inputting the vector of the text data of the text to be quality-tested into a quality testing model obtained by training through the training method of any one of claims 1 to 6 to obtain an output result of the quality testing model;
and obtaining a quality inspection result according to the output result.
Further, the markup of the tagged dialog script includes: highlighting the processed characters and/or the characters marked with the shading;
extracting quality control criteria words from the tagged conversational script comprises:
extracting marked characters corresponding to the marks from the marked dialect script, and performing word segmentation processing on the marked characters to obtain word segmentation results; performing numerical value transformation on the word segmentation result by using an IDF method to obtain an IDF value of the word segmentation result; obtaining the quality inspection standard words according to the IDF value;
according to the quality inspection standard words, vectorizing the text data of the text to be quality inspected to obtain a vector of the text data of the text to be quality inspected, and the vectorizing comprises the following steps:
positioning and marking absolute positions of all the quality detection standard words in the text data of the text to be quality detected, and if one corresponding quality detection standard word does not exist in the text data of the text to be quality detected, marking the absolute position of the quality detection standard word in the text data of the text to be quality detected as a specific value; combining the quality inspection standard words pairwise to obtain quality inspection word pairs, and calculating the relative distance of the quality inspection word pairs; taking the value of the relative distance as a vector element of the text data of the text to be quality-checked; and obtaining a vector of the text data of the text to be quality-checked according to the vector elements.
The invention also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the text quality inspection method when running.
The invention also provides an electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to execute the text quality inspection method.
According to the quality inspection model training method provided by the invention, the trained quality inspection model can be obtained through the training method, the quality inspection model does not need to manually set a quality inspection expression, and a result of whether the quality inspection passes or not can be obtained only by inputting the dialect script and the text to be inspected into the model, so that the quality inspection process is simpler and more efficient, and a quality inspection system using the quality inspection model is more intelligent. The method for the quality inspection of the text can realize full-automatic text quality inspection, automatically extracts the quality inspection standard words through the dialect script, and simultaneously uses the relative distance value of a quality inspection word pair obtained by combining every two quality inspection standard words as a standard mode of quality inspection, so that a complex quality inspection expression does not need to be made manually, and the mode of quality inspection is more intelligent, simple and efficient.
The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. Various substitutions and alterations according to the general knowledge and conventional practice in the art are intended to be included within the scope of the present invention without departing from the technical spirit of the present invention as described above.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
FIG. 1 is a flow diagram of a quality inspection model training method according to one embodiment of the invention;
fig. 2 is a flowchart of a text quality inspection method according to another embodiment of the present invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth in order to avoid obscuring the nature of the present invention, and well-known methods, procedures, and components have not been described in detail.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as well as in an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
A quality inspection model training method, a text quality inspection method, a storage medium, and an electronic device according to embodiments of the present invention will be described below with reference to the accompanying drawings.
First, a quality inspection model training method according to an embodiment of the present invention will be described.
FIG. 1 is a flow chart of a quality control model training method according to an embodiment of the invention. As shown in fig. 1, the quality inspection model training method according to the embodiment of the present invention includes the following steps:
s100, acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
s200, acquiring training data of a quality inspection model, wherein the training data comprises a positive sample and a negative sample, the positive sample is text data with a passing quality inspection result, and the negative sample is text data with a non-passing quality inspection result;
s300, performing vectorization processing on the text data of the training data according to the quality inspection standard words to obtain vectors of the text data;
and S400, training the quality inspection model according to the vector of the text data to obtain the quality inspection model after training.
The following steps are specifically described.
S100, acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
it should be noted that the labeled dialog script is a basis for the seat to communicate with the client in reality, the dialog script specifies specific contents of the communication between the seat and the client and the communication business process (including the links of opening a scene, product introduction, intention confirmation, and the like), and the labeled part is often the contents of important attention, and the highlighted label reminds the seat personnel of the important contents. In this embodiment, the tagged dialog script is generally ready to use without additional operations. When the quality inspection is carried out, the standard of the quality inspection personnel for carrying out the quality inspection is the same with the marked dialogical script, and the marked part is the key point of the quality inspection, namely the requirement that the quality inspection text must meet.
In some embodiments, the tagging of the tagged dialog script in step S100 includes: highlighted text and/or text marked with a shading.
In some embodiments, the extracting of the quality control standard words from the tagged dialog script in step S100 includes:
extracting marked characters corresponding to the marks from the marked dialect script, and performing word segmentation processing on the marked characters to obtain word segmentation results; performing numerical value transformation on the word segmentation result by using an IDF method to obtain an IDF value of the word segmentation result; and obtaining the quality inspection standard word according to the IDF value.
In the embodiment of the invention, the marked characters in the conversational script are extracted, and then the marked characters are segmented by using a segmentation tool (such as jieba segmentation). For example, for a sentence in a conversational script: "the first point, the common accident in life causes the highest indemnification of the accident or disability 60 ten thousand", first, the marked part "the first point is extracted, the common accident causes the highest indemnification of the accident or disability 60 ten thousand", then, the marked part is participled, the participled result is the "first point", "common", "accident", "cause", "accident", "or", "disability", "highest", "indemnification", "60 ten thousand". Then, performing numerical value transformation on the word segmentation result by using an IDF method to obtain an IDF (inverse document frequency) value of the word segmentation result, wherein the IDF value is calculated by the following method:
Figure GDA0002537047100000061
and setting an adaptive threshold according to the IDF value of the word segmentation result, and selecting the word segmentation result with the IDF value larger than the threshold as a standard word for quality inspection. In the above example, the quality inspection standard words obtained through the above steps are "accident", "cause", "disability", "highest", "compensation" and "60 ten thousand".
S200, acquiring training data of a quality inspection model, wherein the training data comprises a positive sample and a negative sample, the positive sample is text data with a passing quality inspection result, and the negative sample is text data with a non-passing quality inspection result;
in the embodiment of the invention, the acquired training data of the quality inspection model is the historical text of the quality inspection personnel for manually performing the quality inspection, so that the training data utilizes the existing data information without performing additional marking work, and the marking cost is saved. The training data is texts subjected to manual quality inspection and comprises positive samples and negative samples, the positive samples are text data with passing quality inspection results, and the negative samples are text data with failing quality inspection results.
And S300, performing vectorization processing on the text data of the training data according to the quality inspection standard words to obtain vectors of the text data.
In some embodiments, the step S300 performs vectorization processing on the text data of the training data according to the quality inspection standard word to obtain a vector of the text data, including:
positioning and marking the absolute positions of all the quality detection standard words in the text data, and if one corresponding quality detection standard word does not exist in the text data, marking the absolute position of the quality detection standard word in the text data as a specific value; combining the quality inspection standard words pairwise to obtain quality inspection word pairs, and calculating the relative distance of the quality inspection word pairs; taking the value of the relative distance as a vector element of the text data; and obtaining the vector of the text data according to the vector elements.
In the above example, the quality inspection standard words "accident", "cause", "disability", "highest", "compensation", "60 ten thousand" are extracted from the marked dialogical script, and then the absolute positions of the quality inspection standard words in the text are found by using a regular matching method in the collected positive and negative samples, for example, the text is 3000 words long in the text data of the positive sample 1, and the words "accident", "cause", "disability", "highest", "compensation", "60 ten thousand" are respectively located at the positions of 1000 th, 1002 th, 1004 th, 1006 th, 1008 th, 1010 th and 1012 th words in the text data of the positive sample 1. And if the text data does not have a certain corresponding quality detection standard word, marking the absolute position of the quality detection standard word in the text data as a specific value. For example, if the quality inspection standard word "highest" does not exist in the positive sample 1 in the above example, the absolute position of the quality inspection standard word "highest" is recorded as-10000, so that the distance between the non-existing quality inspection standard word and other quality inspection standard words is large enough, the distinguishing features are obvious enough, and the quality inspection model is easy to learn. And then, combining the quality inspection standard words pairwise to obtain quality inspection word pairs, and calculating the relative distance of the quality inspection word pairs. Or, taking the positive sample 1 as an example for explanation, combining the quality inspection standard words two by two to obtain quality inspection word pairs, that is, the number of the quality inspection word pairs is
Figure GDA0002537047100000071
Calculating the relative distance of the quality inspection word pair, and subtracting the quality inspection standard word with the front absolute position from the quality inspection standard word with the back absolute position to obtain the relative distance, wherein the calculation result is shown in the following table 1:
TABLE 1 relative distance of pairs of quality check words in Positive sample 1
And repeating the steps to calculate the relative distance of the quality inspection pairs of all the positive samples and the negative samples. Finally, the value of the relative distance is used as a vector element of the text data, and a vector of the text data is obtained according to the vector element, for example, the vector of the positive sample 1 is: [2, 4, 6, … …, 2 ].
Through the steps S100 to S300, the preprocessing of the training data is realized, the processing can be realized automatically by a computer without manual processing, the time is saved compared with the training data required by the general model training and needing manual marking, the cost is reduced, and the efficiency is improved.
S400, training the quality inspection model according to the vector of the text data to obtain
Figure GDA0002537047100000081
And training the quality inspection model after completion.
In some embodiments of the present invention, the step S400 of training the quality inspection model according to the vector of the text data to obtain the quality inspection model after training includes:
s401, inputting the vector of the text data of the positive sample into the quality inspection model, and enabling the output result of the quality inspection model to be 1;
s402, inputting the vector of the text data of the negative sample into the quality inspection model, and enabling the output result of the quality inspection model to be 0;
and S403, obtaining the optimal parameters of the quality inspection model by using a standard equation method, and obtaining the quality inspection model after training.
In some embodiments of the invention, the quality control model is a logistic regression model:
a1x1+a2x2+a3x3+…+anxn=y
where n is the number of elements of the vector of text data, a1、a2、a3…anAs a parameter of the quality inspection model, x1、x2、x3…xnAnd y is the output of the quality inspection model, the value is 1 or 0, 1 represents that the quality inspection result is passed, and 0 represents that the quality inspection result is not passed.
The vectors of the text data of the positive sample and the negative sample obtained in step S300 are subjected to the training steps from step S401 to step S403, so as to obtain the parameter a of the quality inspection model1、a2、a3…anThe optimum parameter value of (2). It should be noted that the training process may be performed by a gradient descent method or other methods besides the solution of the optimal parameter value by using a standard equation method, which is prior art and will not be described in detail.
Through steps S100 to S400, a trained text quality inspection model can be obtained. The quality inspection model training method has the advantages that:
1. the existing data information is fully utilized, including the training data of the dialogue script and the model with the label, the existing data information does not need to carry out additional data preprocessing work, such as data labeling and the like;
2. the relative distance of the quality inspection word pair formed on the basis of the quality inspection standard words extracted from the dialect script is used as a basic characteristic, and the model training is carried out by using a standard equation method, so that the model training speed is high, the efficiency is high, and meanwhile, the quality inspection effect of the trained quality inspection model is good.
A flowchart of a text quality inspection method according to another embodiment of the present invention is described with reference to fig. 2, and the method includes:
s10, acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
s20, acquiring a text to be subjected to quality inspection, wherein the text to be subjected to quality inspection is text data generated by carrying out dialogue according to the dialogue script;
s30, vectorizing the text data of the text to be quality-tested according to the quality-testing standard words to obtain a vector of the text data of the text to be quality-tested;
s40, inputting the vector of the text data of the text to be quality-tested into the quality testing model obtained by training through the training method in each embodiment, and obtaining the output result of the quality testing model;
and S50, obtaining a quality inspection result according to the output result.
In some embodiments, the tagging of the tagged dialog script of step S10 includes: highlighting the processed characters and/or the characters marked with the shading;
in some embodiments, the step S10 of extracting the quality control criterion words from the tagged dialog script comprises:
extracting marked characters corresponding to the marks from the marked dialect script, and performing word segmentation processing on the marked characters to obtain word segmentation results; performing numerical value transformation on the word segmentation result by using an IDF method to obtain an IDF value of the word segmentation result; obtaining the quality inspection standard words according to the IDF value;
it should be noted that, in step S20, the text to be quality-checked may be obtained by performing voice recognition according to the call record of the seat, or by performing voice recognition according to the call of the seat in real time, or may be some other texts obtained by communicating with the customer in a text form, and meanwhile, the text to be quality-checked is text data generated by performing a dialogue according to a corresponding dialect script, so that the dialect script is a basis for performing quality check on the text to be quality-checked.
In some embodiments, the step S30 performs vectorization processing on the text data of the text to be quality-tested according to the quality testing standard words to obtain a vector of the text data of the text to be quality-tested, including:
positioning and marking absolute positions of all the quality detection standard words in the text data of the text to be quality detected, and if one corresponding quality detection standard word does not exist in the text data of the text to be quality detected, marking the absolute position of the quality detection standard word in the text data of the text to be quality detected as a specific value; combining the quality inspection standard words pairwise to obtain quality inspection word pairs, and calculating the relative distance of the quality inspection word pairs; taking the value of the relative distance as a vector element of the text data of the text to be quality-checked; and obtaining a vector of the text data of the text to be quality-checked according to the vector elements.
Step S10 is the same as step S100, and the contents of the specific steps are completely the same; the contents of step S30 are the same as those of step S300 except that the objects of the steps are different, the object of step S300 is the vectorization process of the training data, and the object of step S30 is the vectorization process of the file to be quality-checked.
In steps S40 and S50, the vector of the text data of the text to be quality-tested is input into the quality testing model obtained by training with the training method described in each embodiment, so as to obtain the output result of the quality testing model, and the quality testing result is obtained according to the output result.
For example, in some embodiments, the output result of the quality inspection model includes two cases, including that the output result is 1 and the output result is 0, and when the output result is 1, the quality inspection result is passed, and when the output result is 0, the quality inspection result is not passed.
According to the text quality inspection method provided by the embodiment of the invention, full-automatic text quality inspection can be realized, the automatic extraction of the quality inspection standard words through the dialect script is realized, and meanwhile, the relative distance value of the quality inspection word pair obtained by combining every two quality inspection standard words is used as the basic characteristic, so that a complex quality inspection expression does not need to be manually formulated, and the quality inspection mode is more intelligent, simple and efficient. Compared with the quality inspection method in the prior art, the quality inspection method provided by the embodiment of the invention can realize complete automatic quality inspection without manually setting a complex quality inspection expression by a quality inspection person, thereby avoiding the error condition of a quality inspection result caused by possible human errors.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s10, acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
s20, acquiring a text to be subjected to quality inspection, wherein the text to be subjected to quality inspection is text data generated by carrying out dialogue according to the dialogue script;
s30, vectorizing the text data of the text to be quality-tested according to the quality-testing standard words to obtain a vector of the text data of the text to be quality-tested;
s40, inputting the vector of the text data of the text to be quality-tested into the quality testing model obtained by training through the training method in each embodiment, and obtaining the output result of the quality testing model;
and S50, obtaining a quality inspection result according to the output result.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s10, acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
s20, acquiring a text to be subjected to quality inspection, wherein the text to be subjected to quality inspection is text data generated by carrying out dialogue according to the dialogue script;
s30, vectorizing the text data of the text to be quality-tested according to the quality-testing standard words to obtain a vector of the text data of the text to be quality-tested;
s40, inputting the vector of the text data of the text to be quality-tested into the quality testing model obtained by training through the training method in each embodiment, and obtaining the output result of the quality testing model;
and S50, obtaining a quality inspection result according to the output result.
Optionally, the storage medium is further configured to store program codes for executing steps included in the method in the foregoing embodiment, which is not described in detail in this embodiment.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having appropriate combinational logic gates, Programmable Gate Arrays (PGAs), Field Programmable Gate Arrays (FPGAs), and the like, may be implemented using any one or combination of techniques known in the art.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A method for training a quality control model, the method comprising:
acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
acquiring training data of a quality inspection model, wherein the training data comprises a positive sample and a negative sample, the positive sample is text data with a passing quality inspection result, and the negative sample is text data with a failing quality inspection result;
locating and marking absolute positions of all the quality inspection standard words in the text data; combining the quality inspection standard words pairwise to obtain quality inspection word pairs, and calculating the relative distance of the quality inspection word pairs; taking the value of the relative distance as a vector element of the text data; obtaining a vector of the text data according to the vector element;
and training the quality inspection model according to the vector of the text data to obtain the quality inspection model after training.
2. The training method of claim 1, wherein the tagging of the tagged dialog script comprises: highlighted text and/or text marked with a shading.
3. The training method of claim 2, wherein extracting quality criteria words from the tagged conversational script comprises:
extracting marked characters corresponding to the marks from the marked dialect script, and performing word segmentation processing on the marked characters to obtain word segmentation results; performing numerical value transformation on the word segmentation result by using an IDF method to obtain an IDF value of the word segmentation result; and obtaining the quality inspection standard word according to the IDF value.
4. The training method as claimed in claim 1, wherein if there is no corresponding one of the quality inspection standard words in the text data, an absolute position of the quality inspection standard word in the text data is marked as a specific value.
5. The training method according to any one of claims 1 to 4, wherein training the quality inspection model according to the vector of the text data to obtain the quality inspection model after training comprises:
inputting the vector of the text data of the positive sample into the quality inspection model, and enabling the output result of the quality inspection model to be 1;
inputting the vector of the text data of the negative sample into the quality inspection model, and enabling the output result of the quality inspection model to be 0;
and obtaining the optimal parameters of the quality inspection model by using a standard equation method, and obtaining the quality inspection model after training.
6. A training method as claimed in any one of claims 1 to 4, wherein the quality control model is a logistic regression model:
Figure DEST_PATH_IMAGE001
wherein n is the number of elements of the vector of text data,
Figure 250387DEST_PATH_IMAGE002
are the parameters of the quality inspection model,
Figure 583279DEST_PATH_IMAGE003
is an element of a vector of the text data,
Figure 397651DEST_PATH_IMAGE004
and taking the value of 1 or 0 for the output of the quality inspection model, wherein 1 represents that the quality inspection result is passed 0 and represents that the quality inspection result is not passed.
7. A text quality inspection method is characterized by comprising the following steps:
acquiring a marked dialect script, and extracting a quality inspection standard word from the marked dialect script, wherein the quality inspection standard word comprises a plurality of quality inspection words;
acquiring a text to be subjected to quality inspection, wherein the text to be subjected to quality inspection is text data generated by carrying out dialogue according to the dialect script;
locating and marking absolute positions of all the quality inspection standard words in the text data; combining the quality inspection standard words pairwise to obtain quality inspection word pairs, and calculating the relative distance of the quality inspection word pairs; taking the value of the relative distance as a vector element of the text data; obtaining a vector of the text data according to the vector element;
inputting the vector of the text data of the text to be quality-tested into a quality testing model obtained by training through the training method of any one of claims 1 to 6 to obtain an output result of the quality testing model;
and obtaining a quality inspection result according to the output result.
8. The method of claim 7, wherein:
the markup of the tagged dialog script includes: highlighting the processed characters and/or the characters marked with the shading;
extracting quality control criteria words from the tagged conversational script comprises:
extracting marked characters corresponding to the marks from the marked dialect script, and performing word segmentation processing on the marked characters to obtain word segmentation results; performing numerical value transformation on the word segmentation result by using an IDF method to obtain an IDF value of the word segmentation result; obtaining the quality inspection standard words according to the IDF value;
and if the text data of the text to be quality-tested does not have a certain corresponding quality-testing standard word, marking the absolute position of the quality-testing standard word in the text data of the text to be quality-tested as a specific value.
9. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of claim 7 or 8 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of claim 7 or 8.
CN201911118009.1A 2019-11-15 2019-11-15 Text quality inspection method, storage medium and electronic equipment Active CN110909162B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911118009.1A CN110909162B (en) 2019-11-15 2019-11-15 Text quality inspection method, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911118009.1A CN110909162B (en) 2019-11-15 2019-11-15 Text quality inspection method, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110909162A CN110909162A (en) 2020-03-24
CN110909162B true CN110909162B (en) 2020-10-27

Family

ID=69816859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911118009.1A Active CN110909162B (en) 2019-11-15 2019-11-15 Text quality inspection method, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110909162B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696527B (en) * 2020-06-15 2020-12-22 龙马智芯(珠海横琴)科技有限公司 Method and device for positioning voice quality inspection area, positioning equipment and storage medium
CN112101030B (en) * 2020-08-24 2024-01-26 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for establishing term mapping model and realizing standard word mapping

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7624006B2 (en) * 2004-09-15 2009-11-24 Microsoft Corporation Conditional maximum likelihood estimation of naïve bayes probability models
CN108446388A (en) * 2018-03-22 2018-08-24 平安科技(深圳)有限公司 Text data quality detecting method, device, equipment and computer readable storage medium
CN109189901B (en) * 2018-08-09 2021-05-18 北京中关村科金技术有限公司 Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN109151218B (en) * 2018-08-21 2021-11-19 平安科技(深圳)有限公司 Call voice quality inspection method and device, computer equipment and storage medium
CN109933670B (en) * 2019-03-19 2021-06-04 中南大学 Text classification method for calculating semantic distance based on combined matrix
CN110347863B (en) * 2019-06-28 2023-09-22 腾讯科技(深圳)有限公司 Speaking recommendation method and device and storage medium

Also Published As

Publication number Publication date
CN110909162A (en) 2020-03-24

Similar Documents

Publication Publication Date Title
CN110163478B (en) Risk examination method and device for contract clauses
US11113477B2 (en) Visualizing comment sentiment
CN110597964B (en) Double-recording quality inspection semantic analysis method and device and double-recording quality inspection system
US20170116557A1 (en) System and method for performing root cause analysis on unstructured data
CN106294466A (en) Disaggregated model construction method, disaggregated model build equipment and sorting technique
CN110909162B (en) Text quality inspection method, storage medium and electronic equipment
CN111563377A (en) Data enhancement method and device
CN111177351A (en) Method, device and system for acquiring natural language expression intention based on rule
Braz et al. Document classification using a Bi-LSTM to unclog Brazil's supreme court
CN107908783A (en) Retrieve appraisal procedure, device, server and the storage medium of text relevant
CN111444718A (en) Insurance product demand document processing method and device and electronic equipment
EP4057193A1 (en) Method and system for identifying mislabeled data samples using adversarial attacks
US11416556B2 (en) Natural language dialogue system perturbation testing
CN114969334B (en) Abnormal log detection method and device, electronic equipment and readable storage medium
CN115983285A (en) Questionnaire auditing method, device, electronic equipment and storage medium
CN112989050B (en) Form classification method, device, equipment and storage medium
CN107577760A (en) A kind of file classification method and device based on constrained qualification
CN114266239A (en) Data set generation method and device
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
CN110427330B (en) Code analysis method and related device
CN112749079A (en) Defect classification method and device for software test and computing equipment
Revina et al. Towards a business process complexity analysis framework based on textual data and event logs
CN110866394A (en) Company name identification method and device, computer equipment and readable storage medium
CN111652229B (en) Information input method and device, electronic equipment and storage medium
CN118035468A (en) Deep learning-based equal-protection evaluation result record knowledge graph extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 519031 office 1316, No. 1, lianao Road, Hengqin new area, Zhuhai, Guangdong

Patentee after: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd.

Address before: Room 417, 418, 419, building 20, creative Valley, 1889 Huandao East Road, Hengqin New District, Zhuhai City, Guangdong Province

Patentee before: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder