CN113361265A - Data quality inspection method, data quality inspection device, electronic equipment and storage medium - Google Patents

Data quality inspection method, data quality inspection device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113361265A
CN113361265A CN202110773499.XA CN202110773499A CN113361265A CN 113361265 A CN113361265 A CN 113361265A CN 202110773499 A CN202110773499 A CN 202110773499A CN 113361265 A CN113361265 A CN 113361265A
Authority
CN
China
Prior art keywords
text
test
determined
information
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110773499.XA
Other languages
Chinese (zh)
Other versions
CN113361265B (en
Inventor
孙中科
徐健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lexuebang Network Technology Co ltd
Original Assignee
Beijing Lexuebang Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lexuebang Network Technology Co ltd filed Critical Beijing Lexuebang Network Technology Co ltd
Priority to CN202110773499.XA priority Critical patent/CN113361265B/en
Publication of CN113361265A publication Critical patent/CN113361265A/en
Application granted granted Critical
Publication of CN113361265B publication Critical patent/CN113361265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a data quality checking method, a data quality checking device, an electronic device and a storage medium, wherein the data quality checking method comprises the following steps: receiving a text to be checked, wherein the text to be checked is a proofreading result text of the test text, and the test text is provided with a preset number of pieces of information to be proofread; when the first number of the target positions, different from the content of the test text, in the text to be audited is determined to be qualified, and the first number is larger than or equal to the preset number, the problem that other error contents in the test text can be solved in the process of checking the to-be-checked information preset in the test text by a worker is considered, wherein the larger the first number of the target positions, different from the content of the test text, in the text to be audited is, the smaller the number of the remaining error contents in the text to be audited is, and therefore when the first number is determined to be larger than or equal to the preset number, the text to be audited is determined to be qualified, the higher quality of the text determined to be qualified is ensured, manual auditing is not needed in the process, the auditing efficiency is high, and the auditing cost is low.

Description

Data quality inspection method, data quality inspection device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data quality inspection method and apparatus, an electronic device, and a storage medium.
Background
In the related art, when a proofreading person proofreads text contents, problems of missed detection and false detection easily occur, for example, omission and errors occur in the proofreading process of test questions. At this time, the auditor needs to audit the text after the proofreading of the proofreader one by one, which will greatly increase the cost of manual audit and has low efficiency; if the auditor only performs the trial and error, the trial and error may be caused, and the text quality cannot be guaranteed.
Disclosure of Invention
In view of the above, an object of the embodiments of the present application is to provide a data quality inspection method, apparatus, electronic device and storage medium, so as to solve the above problems.
In a first aspect, an embodiment of the present application provides a data quality inspection method, where the method includes: receiving a text to be checked, wherein the text to be checked is a proofreading result text of a test text, and a preset number of pieces of information to be proofread are arranged in the test text; determining a first number of target positions in the text to be audited, wherein the target positions are different from the content of the test text; and when the first number is determined to be larger than or equal to the preset number, determining that the text to be audited is qualified.
In the implementation process, in consideration of the fact that other error contents in the test text can be solved in the process of checking the to-be-checked information preset in the test text by a worker, it can be understood that the larger the first number of the target positions in the to-be-checked text, which are different from the content of the test text, the larger the error contents checked by the worker, the smaller the error contents remaining in the to-be-checked text, and therefore, by setting the preset number of the to-be-checked information in the test text, and when the first number of the target positions in the to-be-checked text, which are different from the content of the test text, is determined to be greater than or equal to the preset number, the to-be-checked text is determined to be qualified, the quality of the text determined to be qualified is ensured to be higher, the process does not need manual checking, the checking efficiency is high, and the manual checking cost is not needed.
Based on the first aspect, in a possible design, when it is determined that the first number is greater than or equal to the preset number, determining that the text to be audited is qualified includes: when the first number is determined to be larger than or equal to the preset number, determining a second number of the information to be collated, the position of which in the test text is the same as the target position; and when the ratio of the second quantity to the preset quantity is determined to be larger than or equal to a first preset threshold value, determining that the text to be audited is qualified.
In the implementation process, the larger the ratio of the second number to the preset number is, the smaller the number of the remaining pieces of information to be corrected in the text to be checked is, and then, the smaller the number of the remaining error contents in the text to be checked is, so that when the ratio of the second number to the preset number is determined to be greater than or equal to the first preset threshold value, the text to be checked is determined to be qualified, and the quality of the text determined to be qualified is ensured.
Based on the first aspect, in a possible design, when it is determined that the first number is greater than or equal to the preset number, determining that the text to be audited is qualified includes: when the first number is determined to be larger than or equal to the preset number, determining the residual number of the information to be corrected in the text to be checked; and when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold value, determining that the text to be audited is qualified.
In the implementation process, by determining the ratio of the remaining number to the preset number, it can be understood that the smaller the ratio is, the smaller the probability of representing that the text to be checked has the wrong content is, and conversely, the greater the probability of having the wrong content is, therefore, when the ratio is determined to be less than or equal to the second preset threshold, the text to be checked is determined to be qualified, and then the quality of the text determined to be qualified subsequently is ensured.
Based on the first aspect, in a possible design, when it is determined that the first number is greater than or equal to the preset number, determining the remaining number of the to-be-collated information in the to-be-audited text includes: and when the first number is determined to be larger than or equal to the preset number, determining the residual number based on the information to be collated in the predetermined test text.
In the implementation process, if the audit text contains the information to be collated, which is the same as the information to be collated in the test text, the audit text is considered to contain the information to be collated, and the more the same information to be collated is, the larger the residual quantity is, so that the residual quantity of the information to be collated in the text to be audited can be accurately determined according to the predetermined information to be collated in the test text.
Based on the first aspect, in a possible design, when it is determined that the first number is greater than or equal to the preset number, determining the remaining number of the to-be-collated information in the to-be-audited text includes: when the first number is determined to be larger than or equal to the preset number, comparing the test text with the content corresponding to the position in the text to be checked based on the position of the information to be checked in the test text which is determined in advance; and determining the residual quantity according to the comparison result.
In the implementation process, based on the position of the to-be-corrected information in the predetermined test text, the content corresponding to the position in the test text and the to-be-checked text is compared, and it can be understood that if the comparison result represents that the content corresponding to the position in the test text and the to-be-checked text is the same, the content corresponding to the position in the to-be-checked text is represented as the to-be-corrected information, and otherwise, the content corresponding to the position in the to-be-checked text is represented as the not-to-be-corrected information, so that the remaining amount can be accurately determined according to the comparison result.
In a possible design based on the first aspect, the method further includes: and when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold, if the residual quantity is determined not to be zero, checking the to-be-checked information in the to-be-checked text.
In the implementation process, when the ratio of the residual quantity to the preset quantity is determined to be less than or equal to a second preset threshold value, the text to be checked is represented as a qualified text, and at this time, if it is determined that the residual quantity of the information to be checked in the text to be checked is not zero, the information to be checked in the text to be checked is checked, and then it is ensured that the information to be checked does not exist in the text to be checked, so that the quality of the text determined to be qualified is further ensured.
In a possible design based on the first aspect, the receiving the text to be audited includes: acquiring an initial text; performing interference processing on the initial text to obtain the test text; sending the test text to a proofreading device; and receiving the text to be audited returned by the proofreading equipment based on the test text.
In the implementation process, the initial text is subjected to interference processing to ensure that the test text comprises the information to be checked, and then the quality of the text to be checked can be accurately determined according to the text to be checked returned by the equipment to be checked based on the test text.
Based on the first aspect, in a possible design, the performing interference processing on the initial text to obtain the test text includes: setting predetermined interference information in the initial text to obtain the test text; and the interference information is information to be corrected in the test text.
In the implementation process, the interference information in the test text is predetermined, so that the residual quantity of the information to be corrected in the text to be checked can be accurately determined subsequently.
In a second aspect, an embodiment of the present application provides a data quality inspection apparatus, including: the text receiving unit is used for receiving a text to be checked, wherein the text to be checked is a proofreading result text of a test text, and information to be proofread is set in the test text; the quantity determining unit is used for determining a first quantity of target positions, which are different from the content of the test text, in the text to be audited; and the result determining unit is used for determining that the text to be audited is qualified when the first quantity is determined to be greater than or equal to the preset quantity.
Based on the second aspect, in a possible design, the result determining unit is specifically configured to determine, when it is determined that the first number is greater than or equal to the preset number, a second number that positions of the information to be collated in the test text are the same as the target positions; and when the ratio of the second quantity to the preset quantity is determined to be larger than or equal to a first preset threshold value, determining that the text to be audited is qualified.
In a possible design based on the second aspect, the result determination unit includes: a remaining number determining unit, configured to determine a remaining number of the to-be-collated information in the to-be-audited text when it is determined that the first number is greater than or equal to the preset number; and the result determining subunit is used for determining that the text to be audited is qualified when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold value.
Based on the second aspect, in a possible design, the remaining number determining unit is specifically configured to determine the remaining number based on the predetermined information to be collated in the test text when it is determined that the first number is greater than or equal to the preset number.
Based on the second aspect, in one possible design, the remaining number determination unit includes: the content comparison unit is used for comparing the test text with the content corresponding to the position in the text to be checked based on the position of the information to be checked in the test text which is determined in advance when the first number is determined to be larger than or equal to the preset number; and the residual quantity determining subunit is used for determining the residual quantity according to the comparison result.
Based on the second aspect, in one possible design, the apparatus further includes: and the checking unit is used for checking the information to be checked in the text to be checked if the residual quantity is not zero when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold value.
In a possible design based on the second aspect, the text receiving unit includes: an initial text acquisition unit for acquiring an initial text; the interference unit is used for carrying out interference processing on the initial text to obtain the test text; the sending unit is used for sending the test text to the proofreading equipment; and the receiving subunit is used for receiving the text to be audited returned by the proofreading device based on the test text.
Based on the second aspect, in a possible design, the interference unit is specifically configured to set predetermined interference information in the initial text to obtain the test text; and the interference information is the information to be corrected in the test text.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory connected to the processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, the electronic device is caused to perform the method of the first aspect.
In a fourth aspect, an embodiment of the present application provides a storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method of the first aspect.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a data quality inspection method according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a data quality inspection apparatus according to an embodiment of the present application.
Icon: 100-an electronic device; 101-a processor; 102-a memory; 103-a communication interface; 300-data verification means; 310-a text receiving unit; 320-quantity determination unit; 330-result determination unit.
Detailed Description
The technical solution in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
In order to facilitate understanding of the present application, a brief description is provided below for an application scenario of the embodiment of the present application, where the application scenario may be an online education scenario, an internet education scenario, or the like.
Because the sources of the test questions used for constructing the test question bank are more, the unified test question format and the test question quality become key points and difficulties for constructing the test question bank, and the verification and the audit of the test questions to be put in the bank are usually performed in a manual mode at present in the face of massive test questions.
However, when the proof reader is checking the test question contents, the problems of missed detection and wrong detection are easy to occur, for example, the omission and the error occur in the process of checking the test questions. At the moment, the examiner needs to examine the test question text after the proofreading of the proofreader one by one, which greatly increases the manual examination cost and has low efficiency; if the auditor only performs the trial and error, the trial and error may be caused, and the text quality cannot be guaranteed.
It should be noted that the application scenario in the embodiment of the present application is not limited to the above application scenario, and may also be applicable to any field of text proofreading and auditing, which is not described herein again.
Referring to fig. 1, an embodiment of the present application provides a schematic structural diagram of an electronic device 100, where the electronic device 100 may be a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.
The Memory 102 is used for storing various data such as a computer program instruction corresponding to the data quality inspection method and apparatus provided in the embodiment of the present application, where the Memory 102 may be, but is not limited to, a random access Memory (ram), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like.
The processor 101 is configured to read and run a computer program instruction corresponding to the data quality inspection method and apparatus stored in the memory, so as to receive a text to be inspected, where the text to be inspected is a proofreading result text of a test text, and the test text is provided with a preset amount of proofreading information to be proofread; determining a first number of target positions in the text to be audited, wherein the target positions are different from the content of the test text; and when the first number is determined to be larger than or equal to the preset number, determining that the text to be audited is qualified.
The processor 101 may be an integrated circuit chip having signal processing capability. The Processor 101 may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 2, fig. 2 is a flowchart of a method for checking quality of quantity according to an embodiment of the present application, the method is applied to the electronic device 100 shown in fig. 1, and the flowchart shown in fig. 2 will be described in detail below, and the method includes the steps of: S21-S23.
S21: receiving a text to be checked, wherein the text to be checked is a proofreading result text of the test text, and the test text is provided with a preset number of pieces of proofreading information to be checked.
S22: and determining a first number of target positions in the text to be audited, which are different from the content of the test text.
S23: and when the first number is determined to be larger than or equal to the preset number, determining that the text to be audited is qualified.
The above method is described in detail below.
S21: receiving a text to be checked, wherein the text to be checked is a proofreading result text of the test text, and the test text is provided with a preset number of pieces of proofreading information to be checked.
The preset number is a positive integer greater than or equal to 1, the preset number is set according to actual requirements, and the larger the preset number is, the higher the quality of the text which is finally determined to be qualified is.
As an embodiment, S21 includes steps A1-A4.
A1: an initial text is obtained.
In this embodiment, the initial text may include one test question, two test questions, or multiple test questions. In other embodiments, program code, or other types of textual content, may be included in the initial text.
The initial text may have some potential error contents, and in order to prevent missed detection and false detection caused by carelessness of the proofreading person, some known errors may be added to the initial text, so that supervision of the proofreading person can be realized. That is, in the present application, it is necessary to perform interference processing on the initial text (that is, to bury a lightning spot in the initial text, for example, to set an error content, etc.) to obtain the test text, so that the test text includes the information to be collated, which is convenient for subsequently and accurately checking the quality of the text to be audited corresponding to the test text, and it can be understood that the larger the number of target positions in the text to be audited, which are different from the content of the test text, the smaller the number of error contents in the text to be audited, the higher the quality of the text to be audited is, and therefore, after the initial text is obtained, step a2 is executed.
A2: and carrying out interference processing on the initial text to obtain the test text.
As an embodiment, a2 may be implemented as follows, and a preset number of randomly generated interference contents are added to the initial text to obtain a test text.
As an embodiment, a2 may be implemented as follows, adding randomly generated interference content to a preset position in the initial text to obtain a test text. Wherein the number of preset positions is equal to the preset number.
When the initial text includes the test questions, as an embodiment, a2 may be implemented in a manner that a preset number of randomly generated interference contents are randomly added to each test question in the initial text.
As an implementation manner, a2 may be implemented in such a way that characters at a preset number of positions in the initial text are replaced with randomly generated interference content, so as to obtain a test text; the preset number of positions may be randomly determined or predetermined.
As an embodiment, a2 may be implemented as follows, replacing a preset number of preset characters in the initial text with randomly generated interference content to obtain a test text.
It should be noted that the interference content in the above embodiment is to-be-corrected information in the test text.
In this embodiment, one interference content is one character or one segment of characters.
As an embodiment, step a2 includes: setting predetermined interference information in the initial text to obtain the test text; and the interference information is information to be corrected in the test text.
Specifically, adding interference information at a preset number of positions in the initial text to obtain a test text, where the content of the interference information added at each position may be the same or different. The predetermined number of positions may be predetermined or randomly determined.
When the test questions are included in the initial text, as an implementation manner, setting the predetermined interference information in the initial text may be implemented in a manner that the interference information is randomly added at a preset number of positions in each test question of the initial text, where the content of the interference information added in each test question may be the same or different.
As an embodiment, the setting of the predetermined interference information in the initial text may be implemented in such a way that characters at a preset number of positions in the initial text are randomly replaced with the interference information, wherein the content of the interference information at each position in the test text may be the same or different.
When the initial text includes the test questions, as an implementation manner, the setting of the predetermined interference information in the initial text may be implemented in a manner that randomly replaces the characters at the preset number of positions in each test question of the initial text with the interference information, where the number and/or content of the interference information in each test question may be the same or different.
As an implementation manner, the setting of the predetermined interference information in the initial text may be implemented in a manner that a preset number of preset characters in the initial text are replaced with respective corresponding interference information, where the interference information corresponding to each preset character may be the same or different.
After the test text is obtained, step a3 is performed.
A3: and sending the test text to a proofreading device.
In practical implementation, a3 may be implemented as follows, and the test text is sent to the proof reading device in real time or not.
As an implementation manner, a3 may be implemented in such a manner that a subject to which the content of the test text belongs is determined, and the test text is sent to the collation apparatus corresponding to the subject.
Through the implementation mode, the test text can be accurately sent to the proofreading equipment of the staff who can correctly proofread the test text.
A4: and receiving the text to be audited returned by the proofreading equipment based on the test text.
In practical implementation, a3 may be implemented as follows, and receives the text to be checked returned by the collation device based on the test text in real time or not. The text to be audited comprises proofreading traces of the test text so as to determine the first quantity subsequently. Wherein the proofreading traces may include: the check icon, the check label, etc. are only required to be detected, and are not described in detail.
For example, there are 10 errors in the test text, and the proofreading personnel revise 10 errors, and the revising manner may include: directly modify and replace, mark out errors, add error labels, add error icons, etc., which are not described in detail.
After receiving the text to be audited, S12 is executed.
S22: and determining a first number of target positions in the text to be audited, which are different from the content of the test text.
In an actual implementation, S22 may be implemented by comparing the text to be audited with the content at the corresponding position in the test text, and determining, according to the comparison result, a first number of target positions different from the content in the test text from the text to be audited. The target positions with different contents described herein may refer to: different positions of the characters, the positions of revisions added, and the like, which are not described in detail.
S23: and when the first number is determined to be larger than or equal to the preset number, determining that the text to be audited is qualified.
As an embodiment, S23 includes steps B1-B2.
B1: and when the first number is determined to be larger than or equal to the preset number, determining a second number of the positions of the information to be collated in the test text, which are the same as the target positions.
In practical implementation, B1 may be implemented in such a way that, when it is determined that the first number is equal to or greater than the preset number, for each target position, it is determined whether the target position is the same as the position of one piece of information to be collated in the test text, and then a second number of target positions that are the same as the position of the piece of information to be collated in the test text are determined from the first number of target positions.
B2: and when the ratio of the second quantity to the preset quantity is determined to be larger than or equal to a first preset threshold value, determining that the text to be audited is qualified.
In an actual implementation process, B2 may be implemented by determining a ratio of the second quantity to a preset quantity, comparing the ratio with a first preset threshold, and determining that the text to be reviewed is qualified when the ratio is determined to be less than or equal to the first preset threshold.
Wherein the value range of the first preset threshold is 0.5-1; the first preset threshold is set according to actual requirements, and the larger the first preset threshold is, the lower the probability that the text which is finally determined to be qualified has quality problems is; in this embodiment, the first preset threshold is 0.8, and in other embodiments, the first preset threshold may also be 0.5, 0.7, 1, and the like.
As an embodiment, S23 includes steps C1-C2.
C1: and when the first number is determined to be larger than or equal to the preset number, determining the residual number of the information to be collated in the text to be checked.
As an embodiment, C1 includes: and when the first number is determined to be larger than or equal to the preset number, determining the residual number based on the information to be collated in the predetermined test text.
Specifically, when the first number is determined to be greater than or equal to the preset number, comparing the to-be-checked information with the content in the to-be-checked text for each to-be-checked information in the predetermined test text, and if the to-be-checked information is the same as the content in the to-be-checked text, determining that the to-be-checked information exists in the to-be-checked text; if not, determining that the to-be-checked information does not exist in the to-be-checked text; and then determining the residual quantity of the information to be checked existing in the text to be checked.
As another embodiment, C1 includes steps C11-C12.
C11: and when the first number is determined to be larger than or equal to the preset number, comparing the content corresponding to the position in the test text and the text to be checked based on the position of the information to be checked in the test text which is determined in advance.
Specifically, when the first number is determined to be greater than or equal to the preset number, for the position of each piece of information to be collated in the predetermined test text, a first content corresponding to the position is determined from the test text, a second content corresponding to the position is determined from the text to be checked, and whether the first content and the second content are the same or not is compared.
C12: and determining the residual quantity according to the comparison result.
Specifically, for each position of the information to be collated in the predetermined test text, when the comparison result represents that the first content and the second content corresponding to the position are different, determining that the information to be collated exists at the position in the text to be checked, otherwise, determining that the information to be collated does not exist at the position in the text to be checked; and then determining the residual quantity of the information to be corrected in the text to be checked.
C2: and when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold value, determining that the text to be audited is qualified.
Wherein the value range of the second preset threshold is 0-0.2; the second preset threshold is set according to actual requirements, and the smaller the second preset threshold is, the lower the probability that the text which is finally determined to be qualified has quality problems is; in this embodiment, the second preset threshold is 0.1, and in other embodiments, the second preset threshold may also be 0, 0.05, 0.2, and the like.
As an embodiment, the method further comprises: and when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold, if the residual quantity is determined not to be zero, checking the to-be-checked information in the to-be-checked text.
Specifically, when it is determined that the ratio of the remaining number to the preset number is smaller than or equal to a second preset threshold, if it is determined that the remaining number is not zero, deleting the information to be collated at the position according to the position of the information to be collated in the text to be checked.
As an implementation manner, the information to be collated in the text to be checked is collated, which may be implemented in a manner that, based on a predetermined correspondence relationship between the information to be collated and the correct information, the information to be collated in the text to be checked is replaced with the corresponding correct information.
As an implementation manner, the information to be collated in the text to be checked is collated, which may be implemented in a manner that, based on the predetermined correct information corresponding to the information to be collated at each position in the text to be checked, the information to be collated at each position in the text to be checked is replaced with the corresponding correct information.
As an embodiment, the method further comprises: and when the first number is smaller than the preset number, sending prompt information for indicating that the verification is not passed to a sending end of the text to be verified, wherein the prompt information is used for prompting that the text to be verified needs to be checked again.
As an embodiment, after step a2, the method further comprises: and determining the position of the information to be corrected in the test text based on the initial text and the test text.
Specifically, the content of the corresponding position in the initial text and the test text is compared, the content different from the content of the initial text in the test text is determined as the information to be collated, and then the position of the information to be collated in the test text is determined.
As an embodiment, after step a2, the method further comprises: and determining the information to be corrected in the test text based on the initial text and the test text.
Specifically, the content of the test text different from the content of the initial text is determined as the information to be checked by comparing the content of the initial text with the content of the test text at the corresponding position.
Referring to fig. 3, fig. 3 is a block diagram of a data quality inspection apparatus 300 according to an embodiment of the present disclosure. The apparatus is stored in the electronic device 100 shown in fig. 1, and the block diagram shown in fig. 3 is described below, and the apparatus includes:
the text receiving unit 310 is configured to receive a text to be checked, where the text to be checked is a proofreading result text of a test text, and information to be proofread is set in the test text.
The number determination unit 320 is configured to determine a first number of target positions in the text to be audited, where the target positions are different from the content of the test text.
And the result determining unit 330 is configured to determine that the text to be audited is qualified when it is determined that the first number is greater than or equal to the preset number.
As an embodiment, the result determining unit 330 is specifically configured to determine, when it is determined that the first number is greater than or equal to the preset number, a second number that positions of the information to be collated in the test text are the same as the target position; and when the ratio of the second quantity to the preset quantity is determined to be larger than or equal to a first preset threshold value, determining that the text to be audited is qualified.
As an embodiment, the result determining unit 330 includes: a remaining number determining unit, configured to determine a remaining number of the to-be-collated information in the to-be-audited text when it is determined that the first number is greater than or equal to the preset number; and the result determining subunit is used for determining that the text to be audited is qualified when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold value.
As an embodiment, the remaining number determining unit is specifically configured to determine the remaining number based on the predetermined information to be collated in the test text when it is determined that the first number is greater than or equal to the preset number.
As an embodiment, the remaining number determining unit includes: the content comparison unit is used for comparing the test text with the content corresponding to the position in the text to be checked based on the position of the information to be checked in the test text which is determined in advance when the first number is determined to be larger than or equal to the preset number; and the residual quantity determining subunit is used for determining the residual quantity according to the comparison result.
As an embodiment, the apparatus further comprises: and the checking unit is used for checking the information to be checked in the text to be checked if the residual quantity is not zero when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold value.
As an embodiment, the text receiving unit 310 includes: an initial text acquisition unit for acquiring an initial text; the interference unit is used for carrying out interference processing on the initial text to obtain the test text; the sending unit is used for sending the test text to the proofreading equipment; and the receiving subunit is used for receiving the text to be audited returned by the proofreading device based on the test text.
As an implementation manner, the interference unit is specifically configured to set predetermined interference information in the initial text to obtain the test text; and the interference information is the information to be corrected in the test text.
Please refer to the content described in the embodiment shown in fig. 2 for the process of implementing each function of each functional unit in this embodiment, which is not described herein again.
In addition, a storage medium is provided in an embodiment of the present application, and a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer is caused to execute the method provided in any embodiment of the present application.
In summary, in the data quality inspection method, the apparatus, the electronic device, and the storage medium provided in the embodiments of the present application, it is considered that, in the process of checking the to-be-checked information preset in the test text, the staff also solves other error contents in the test text, and it can be understood that, when the first number of target positions in the to-be-checked text different from the content of the test text is larger, it indicates that the error contents checked by the staff are larger, and the remaining error contents in the to-be-checked text are smaller, therefore, by setting the preset number of to-be-checked information in the test text, and when it is determined that the first number of target positions in the to-be-checked text different from the content of the test text is greater than or equal to the preset number, the to-be-checked text is determined to be qualified, it is ensured that the number of error contents existing in the text determined to be qualified is smaller, and the above process does not need to be manually checked, the auditing efficiency is high, and the manual auditing cost is not needed.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

Claims (11)

1. A method of data quality inspection, the method comprising:
receiving a text to be checked, wherein the text to be checked is a proofreading result text of a test text, and a preset number of pieces of information to be proofread are arranged in the test text;
determining a first number of target positions in the text to be audited, wherein the target positions are different from the content of the test text;
and when the first number is determined to be larger than or equal to the preset number, determining that the text to be audited is qualified.
2. The method of claim 1, wherein determining that the text to be reviewed is qualified when it is determined that the first number is greater than or equal to the preset number comprises:
when the first number is determined to be larger than or equal to the preset number, determining a second number of the information to be collated, the position of which in the test text is the same as the target position;
and when the ratio of the second quantity to the preset quantity is determined to be larger than or equal to a first preset threshold value, determining that the text to be audited is qualified.
3. The method of claim 1, wherein determining that the text to be reviewed is qualified when it is determined that the first number is greater than or equal to the preset number comprises:
when the first number is determined to be larger than or equal to the preset number, determining the residual number of the information to be corrected in the text to be checked;
and when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold value, determining that the text to be audited is qualified.
4. The method according to claim 3, wherein determining the remaining number of the to-be-collated information in the to-be-audited text when determining that the first number is greater than or equal to the preset number comprises:
and when the first number is determined to be larger than or equal to the preset number, determining the residual number based on the information to be collated in the predetermined test text.
5. The method according to claim 3, wherein determining the remaining number of the to-be-collated information in the to-be-audited text when determining that the first number is greater than or equal to the preset number comprises:
when the first number is determined to be larger than or equal to the preset number, comparing the test text with the content corresponding to the position in the text to be checked based on the position of the information to be checked in the test text which is determined in advance;
and determining the residual quantity according to the comparison result.
6. The method of claim 3, further comprising:
and when the ratio of the residual quantity to the preset quantity is determined to be smaller than or equal to a second preset threshold, if the residual quantity is determined not to be zero, checking the to-be-checked information in the to-be-checked text.
7. The method of claim 1, wherein the receiving text to be reviewed comprises:
acquiring an initial text;
performing interference processing on the initial text to obtain the test text;
sending the test text to a proofreading device;
and receiving the text to be audited returned by the proofreading equipment based on the test text.
8. The method of claim 7, wherein the performing the interference processing on the initial text to obtain the test text comprises:
setting predetermined interference information in the initial text to obtain the test text; and the interference information is the information to be corrected in the test text.
9. A data quality inspection apparatus, characterized in that the apparatus comprises:
the text receiving unit is used for receiving a text to be checked, wherein the text to be checked is a proofreading result text of a test text, and information to be proofread is set in the test text;
the quantity determining unit is used for determining a first quantity of target positions, which are different from the content of the test text, in the text to be audited;
and the result determining unit is used for determining that the text to be audited is qualified when the first quantity is determined to be greater than or equal to the preset quantity.
10. An electronic device comprising a memory and a processor, the memory having stored therein computer program instructions that, when read and executed by the processor, perform the method of any of claims 1-8.
11. A storage medium having stored thereon computer program instructions which, when read and executed by a computer, perform the method of any one of claims 1-8.
CN202110773499.XA 2021-07-08 2021-07-08 Data quality inspection method, device, electronic equipment and storage medium Active CN113361265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110773499.XA CN113361265B (en) 2021-07-08 2021-07-08 Data quality inspection method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110773499.XA CN113361265B (en) 2021-07-08 2021-07-08 Data quality inspection method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113361265A true CN113361265A (en) 2021-09-07
CN113361265B CN113361265B (en) 2024-05-28

Family

ID=77538656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110773499.XA Active CN113361265B (en) 2021-07-08 2021-07-08 Data quality inspection method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113361265B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491536A (en) * 2017-08-22 2017-12-19 广东小天才科技有限公司 A kind of examination question method of calibration, examination question calibration equipment and electronic equipment
CN109062950A (en) * 2018-06-22 2018-12-21 北京奇艺世纪科技有限公司 A kind of method and device of text marking
CN109858014A (en) * 2018-12-10 2019-06-07 西南石油大学 Language message active critique system and its active proofreading method
CN110674633A (en) * 2019-09-18 2020-01-10 平安科技(深圳)有限公司 Document review proofreading method and device, storage medium and electronic equipment
CN110968730A (en) * 2019-12-16 2020-04-07 Oppo(重庆)智能科技有限公司 Audio mark processing method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491536A (en) * 2017-08-22 2017-12-19 广东小天才科技有限公司 A kind of examination question method of calibration, examination question calibration equipment and electronic equipment
CN109062950A (en) * 2018-06-22 2018-12-21 北京奇艺世纪科技有限公司 A kind of method and device of text marking
CN109858014A (en) * 2018-12-10 2019-06-07 西南石油大学 Language message active critique system and its active proofreading method
CN110674633A (en) * 2019-09-18 2020-01-10 平安科技(深圳)有限公司 Document review proofreading method and device, storage medium and electronic equipment
CN110968730A (en) * 2019-12-16 2020-04-07 Oppo(重庆)智能科技有限公司 Audio mark processing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113361265B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN111401609B (en) Prediction method and prediction device for traffic flow time series
CN111240994B (en) Vulnerability processing method and device, electronic equipment and readable storage medium
CN109033772B (en) Verification information input method and device
CN110764993A (en) Automatic testing method and terminal equipment
CN110647523B (en) Data quality analysis method and device, storage medium and electronic equipment
CN110245087B (en) State checking method and device of manual client for sample auditing
CN110990276A (en) Automatic testing method and device for interface field and storage medium
CN112184143B (en) Model training method, device and equipment in compliance audit rule
CN111522741A (en) Interface test code generation method and device, electronic equipment and readable storage medium
CN111783636A (en) OCR-based international reimbursement network application data processing method and device
CN111985936A (en) Method, device and equipment for checking merchant certificate information
CN112559369A (en) Automatic testing method, automatic testing equipment and storage medium
CN113361265B (en) Data quality inspection method, device, electronic equipment and storage medium
CN112363929B (en) System online method and device, computer equipment and storage medium
CN114065762A (en) Text information processing method, device, medium and equipment
CN112580334A (en) File processing method, file processing device, server and storage medium
CN110598527B (en) Method and related equipment for identifying claim warranty number based on machine learning
CN114238138A (en) Test data processing method, device, equipment and storage medium
CN112966671A (en) Contract detection method and device, electronic equipment and storage medium
CN113238940A (en) Interface test result comparison method, device, equipment and storage medium
CN113220594A (en) Automatic testing method, device, equipment and storage medium
CN110532173A (en) A kind of test preprocess method, device, computer system and readable storage medium storing program for executing
CN113515588A (en) Form data detection method, computer device and storage medium
CN115249017B (en) Text labeling method, training method of intention recognition model and related equipment
CN112949262B (en) Method, device, computer equipment and storage medium for processing review sheets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant