CN114792085B - Data processing system for error correction of label text - Google Patents

Data processing system for error correction of label text Download PDF

Info

Publication number
CN114792085B
CN114792085B CN202210710576.1A CN202210710576A CN114792085B CN 114792085 B CN114792085 B CN 114792085B CN 202210710576 A CN202210710576 A CN 202210710576A CN 114792085 B CN114792085 B CN 114792085B
Authority
CN
China
Prior art keywords
text
list
error correction
referred
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210710576.1A
Other languages
Chinese (zh)
Other versions
CN114792085A (en
Inventor
张正义
林方
刘宸
傅晓航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Yuchen Technology Co Ltd
Original Assignee
Zhongke Yuchen Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Yuchen Technology Co Ltd filed Critical Zhongke Yuchen Technology Co Ltd
Priority to CN202210710576.1A priority Critical patent/CN114792085B/en
Publication of CN114792085A publication Critical patent/CN114792085A/en
Application granted granted Critical
Publication of CN114792085B publication Critical patent/CN114792085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a data processing system for error correction of a label text, which comprises: a database, a processor and a memory storing a computer program which, when executed by the processor, performs the steps of: when the number of the marked texts is smaller than the text number threshold value, acquiring any marked text as a test set and a text set corresponding to the marked text as a training set; when the number of the marked texts is not less than the text number threshold value, dividing the marked text list into a plurality of intermediate marked text lists, acquiring any one of the intermediate marked text lists as a test set and a text set corresponding to the intermediate marked text list as a training set, and training a preset model according to the training set so as to determine all abnormal marks corresponding to abnormal texts based on the trained preset model and the trained test set; the abnormal text can be rapidly and accurately determined, only the proofreading personnel are needed to proofread the abnormal text, the workload is simplified, and the efficiency of text calibration is improved.

Description

Data processing system for error correction of label text
Technical Field
The invention relates to the technical field of text error correction, in particular to a data processing system for error correction of a marked text.
Background
Currently, the process of labeling the text includes: the marking personnel proofread the marked text and the proofreading personnel proofread the marked text, when the number of the text is large, the marking personnel and the proofreading personnel can carry out a large amount of work, the working efficiency is low, and the personnel cost is high.
In the prior art, a text error correction model is adopted to correct errors of marked texts, but the error correction accuracy of the text error correction model is low, and meanwhile, each marked text needs to be corrected, which results in low working efficiency.
Meanwhile, for errors which often occur in the text, for example, characters are missed in English words or wrong characters of names of people and places, and the like, a labeling person cannot know the labeling error, so that the workload of a proofreading person is increased, and the working efficiency is low.
Disclosure of Invention
In order to solve the above technical problems, the technical solution adopted by the present invention is a data processing system for error correction of a labeled text, the system comprising: a database, a processor, and a memory storing a computer program, wherein the database comprises: annotation text list a = { a = { (a) 1 ,……,A i ,……,A m },A i I =1 … … m, where m is the number of the annotation texts, when the computer program is executed by a processor, the following steps are implemented:
s100, when m is less than a preset text quantity threshold value m 0 Then, a first specified text set G = { G } corresponding to a is obtained 1 ,……,G i ,……,G m H, the ith first specified text set G i ={A i ,B i In which A i Corresponding first text list B i ={B i1 ,……,B ir ,……,B is },B ir Refers to the first r text, r =2 … … s is the first text number, and A is i As the ith first target test set in G and B in G i As the ith first target training set;
s200, when m is more than or equal to m 0 Then, according to A, obtaining an intermediate text set D = { D = { (D) } 1 ,……,D j ,……,D n },D j ={D j1 ,……,D jt ,……,D jk },D jt The method is characterized in that the method refers to the t-th intermediate text in the j-th intermediate text list, j =2 … … n, n is the number of intermediate text lists, t =1 … … k, k is the number of intermediate texts in any intermediate text list, wherein n meets the following conditions:
Figure 100002_DEST_PATH_IMAGE002
s300, obtaining a second specified text set G ' = { G ' corresponding to A ' 1 ,……,G' j ,……,G' n }, jth second specified text set G' j ={D j ,C j In which D is j Corresponding second text set C j ={C j1 ,……,C jq ,……,C jp },C jq Refers to the qth second text list, q =2 … … p is the number of the second text lists, and D is j As the jth second target test set in G' and C j As the jth second target training set in G', where C jq The q-th second text list is referred to, and q =2 … … p is the number of the second text lists;
s400, obtaining a target training set, training a preset text error correction model based on the target training set, and entering the target file error correction model to enable the target training set to be input into the target file error correction model according to a target test set to obtain an abnormal text corresponding to the A, wherein the target training set comprises a first target training set or a second target training set, the target test set comprises a first target test set or a second target test set, and the target test set and the target training set are in a corresponding relation;
s500, obtaining an abnormal text list H = { H) corresponding to A 1 ,……,H g ,……,H z },H g G =1 … … z, z is the number of abnormal texts, and for H g Performing text error correction to obtain H g And marking all corresponding exceptions.
Compared with the prior art, the invention has obvious advantages and beneficial effects. By the technical scheme, the data processing system for correcting the error of the label text can achieve considerable technical progress and practicability, has wide industrial utilization value and at least has the following advantages:
the data processing system for error correction of the label text comprises the following components: a database, a processor, and a memory storing a computer program, wherein the database comprises: annotating a text list, the computer program when executed by a processor implementing the steps of: when the number of the marked texts in the marked text list is smaller than a preset text number threshold value, acquiring any marked text as a test set and a text set corresponding to the marked text as a training set; when the number of the labeled texts in the labeled text list is not less than a preset text number threshold value, dividing the labeled text list into a plurality of intermediate labeled text lists, wherein each intermediate labeled text list comprises labeled texts with the same number, acquiring any one of the intermediate labeled text lists as a test set and a text set corresponding to the intermediate labeled text list as a training set, training the preset model according to the training set, acquiring abnormal texts and performing text error correction processing according to the abnormal texts on the basis of the trained preset model and the trained test set to obtain all abnormal labels corresponding to the abnormal texts; the abnormal text can be rapidly and accurately determined, only the proofreading personnel are needed to proofread the abnormal text, the workload is simplified, and the efficiency of text calibration is improved.
In addition, when the entity in the abnormal text is a Chinese entity or an English entity, different methods are determined to obtain the similarity, so that the similarity is accurately determined, and further, the wrong labeling of the abnormal text is prompted.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
FIG. 1 is a flowchart illustrating steps executed by a data processing system for error correction of annotated texts according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given with reference to the accompanying drawings and preferred embodiments of a data processing system for acquiring a target position and its effects.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Examples
The embodiment provides a data processing system for correcting errors of a label text, which comprises: a database, a processor, and a memory storing a computer program, wherein the database comprises: annotation text list a = { a = { (a) 1 ,……,A i ,……,A m },A i I =1 … … m, where m is the number of the annotation texts, when the computer program is executed by the processor, the following steps are implemented, as shown in fig. 1:
s100, when m is less than a preset text quantity threshold value m 0 Then, a first specified text set G = { G ] corresponding to A is obtained 1 ,……,G i ,……,G m H, ith first designated text set G i ={A i ,B i In which A i Corresponding first text list B i ={B i1 ,……,B ir ,……,B is },B ir Refers to the first r text, r =2 … … s is the first text number, and A is i As the ith first target test set in G and B in G i As the ith first target training set.
Specifically, the annotation text refers to an annotated text.
In particular, m 0 Is in the range of 10 to 50, preferably m 0 Is 30.
Specifically, B i The first text in A isExcept for A i Any other than the tagged text.
Specifically, s satisfies the following condition: s = m-1.
S200, when m is more than or equal to m 0 Then, according to A, obtaining an intermediate text set D = { D = { (D) } 1 ,……,D j ,……,D n },D j ={D j1 ,……,D jt ,……,D jk },D jt The method is characterized in that the method refers to the t-th intermediate text in the j-th intermediate text list, j =2 … … n, n is the number of intermediate text lists, t =1 … … k, k is the number of intermediate texts in any intermediate text list, wherein n meets the following conditions:
Figure 55569DEST_PATH_IMAGE003
specifically, in step S200, a is divided to generate an intermediate text list, which can be understood as: and randomly selecting k labeled texts from the A to construct an intermediate text list, wherein no repeated labeled texts exist in any two intermediate text lists.
S300, obtaining a second specified text set G ' = { G ' corresponding to A ' 1 ,……,G' j ,……,G' n }, jth second specified text set G' j ={D j ,C j In which D is j Corresponding second text set C j ={C j1 ,……,C jq ,……,C jp },C jq Refers to the qth second text list, q =2 … … p is the number of the second text lists, and D is j As the jth second target test set in G' and C j As the jth second target training set in G', where C jq Refers to the q-th second text list, and q =2 … … p is the number of the second text lists.
Specifically, C j The second text list refers to dividing D by D in D j Any intermediate text list other than the text list.
Specifically, p satisfies the following condition: p = n-1.
S400, obtaining a target training set, training a preset text error correction model based on the target training set, and entering the target file error correction model to enable the target training set to be input into the target file error correction model according to a target test set to obtain an abnormal text corresponding to the A, wherein the target training set comprises a first target training set or a second target training set, the target test set comprises a first target test set or a second target test set, and the target test set and the target training set are in a corresponding relation; it can be understood that: when training is carried out based on the first target training set, only the first target test is tested, or when training is carried out based on the second target training set, only the second target test is tested, the consistency between the training set and the testing set can be ensured, and the accuracy of determining the abnormal text is facilitated.
Specifically, a person skilled in the art may adopt any text error correction model as the preset text error correction model, and preferably, the preset text error correction model is a neural network model, where the neural network model includes: any one of CNN, LSTM, AlexNet, ZFNET, VGGNet, GoogLeNet, ResNet, UNet, SRCNN and BilSTM-CRF.
Further, those skilled in the art are aware of the training process of the neural network model, and will not be described herein.
Specifically, the step S400 further includes the steps of:
s401, when m is less than m 0 While, traverse G and according to G i And a target file error correction model for judging each A i Whether the text exception condition is preset or not can be set by a person skilled in the art according to requirements, and details are not described herein.
S403, when A i When the preset text abnormal condition is met, determining A i Is an abnormal text.
S405, when m is more than or equal to m 0 While, traverse G 'and according to G' j And a target file error correction model, judgment D j Each of D in jt If the text exception condition is preset, a person skilled in the art can set the text exception condition according to requirements, which is not described herein again.
S403, when D jt When the preset text abnormal condition is met, determining D jt Is an abnormal text.
In the method, a group of or one marked text is selected as a test set in the marked text list, other marked texts are used as training sets, the text error correction model is trained, the abnormal text can be determined quickly and accurately, and only proofreading personnel are required to proofread the abnormal text, so that the workload is simplified, and the text calibration efficiency is improved.
S500, obtaining an abnormal text list H = { H) corresponding to A 1 ,……,H g ,……,H z },H g The abnormal text corresponding to the g-th A is referred to, g =1 … … z, z is the number of the abnormal texts corresponding to the A, and for H g Performing text error correction to obtain H g And marking all corresponding exceptions.
Specifically, the database further includes: entity type set L = { L = { (L) 1 ,……,L y ,……L w },L y The method refers to an entity list corresponding to the y-th entity type, y =1 … … w, w is the number of entity types corresponding to texts, wherein the entity types can be understood as ontologies and the ontologies comprise multiple types, such as characters, place names, toys and the like.
Specifically, the step S500 further includes the steps of:
s501, obtaining H g Corresponding list of tagged entities U g ={U g 1 ,……,U g x ,……,U g βg },U g x Refers to the x-th annotated entity, x =1 … … β g ,β g The number of entities marked in the g-th abnormal text is referred to.
S503, according to U g x Corresponding entity type, obtaining U from L g x Corresponding entity list L y ={L y 1 ,……,L y e ,……,L y vy },L y e Is referred to as L y E =1 … … v, of the e-th entity y ,v y Is referred to as L y The number of intermediate entities.
S505 according to U g x And L y e Obtaining U g x Target similarity F of g x
Specifically, the step S505 further includes the steps of:
s5051, when U g x In the case of Chinese entities, from L y In obtaining L y Corresponding Chinese entity list T y ={T y 1 ,……,T y a ,……,T y by },T y a Is referred to as L y The a-th chinese entity, a =1 … … b y ,b y Is referred to as L y The number of Chinese entities.
S5053 according to U g x And T y a Obtaining U g x And T y a List of similarities between E gy x ={E gy x1 ,……,E gy xa ,……,E gy xby And from E gy x To obtain the maximum similarity as F g x Wherein E is gy xa Is referred to as U g x And T y a Similarity between them, E gy xa The following conditions are met:
Figure DEST_PATH_IMAGE005
wherein, in the step (A),
MK gy γ is referred to as U g x Corresponding vector MK gy Middle gamma bit value, NK ya γ Means T y a Corresponding vector NK gy Middle gamma bit value, preferably MK gy And NK gy All vectors are 768-dimensional vectors, i.e., Φ =768, and those skilled in the art know the method for obtaining the vectors corresponding to the entities, which is not described herein again.
S5055, when U g x In the case of Chinese entities, from L y In obtaining L y Corresponding non-Chinese entity list R y ={R y 1 ,……,R y c ,……,R y dy },R y c Is referred to as L y C-th non-chinese entity, c =1 … … d y ,d y Is referred to as L y Number of Chinese and non-Chinese entities.
S5057 according to U g x And R y Obtaining U g x And R y List of similarities between F gy x ={F gy x1 ,……,F gy xc ,……,F gy xdy And from F gy x To obtain the maximum similarity as F g x Wherein F is gy xc Is referred to as U g x And R y c Similarity between them, F gy xc The following conditions are met:
Figure DEST_PATH_IMAGE007
wherein λ is gy xc Is referred to as U g x And R y c Edit distance, η between gy xc Is referred to as in U g x Number of characters and R y c The maximum number of characters between the numbers of characters of (c).
Specifically, through the step S605, when the entity in the abnormal text is a chinese entity or an english entity, it is determined that the similarity is obtained by adopting different methods, so that the similarity is accurately determined, and then the wrong labeling of the abnormal text is prompted.
S507, when F g x =F 0 Then, determine U g x Is labeled as non-abnormal, wherein F 0 Is a predetermined first similarity threshold value and F 0 Is 1.
Specifically, the non-abnormal annotation refers to a correct annotation in the text.
S509, when F g x ≠F 0 Then, to U g x Marking to determine U g x And marking the abnormity.
Specifically, the exception label refers to an error label in the text.
Specifically, the step S509 further includes the steps of:
S5091、F g x >F' 0 then, to U g x Marking to determine U g x Annotate for exceptions and assign F g x Corresponding L y The middle entity is marked in the text as a reference entity, so that the abnormal marking can be prompted to be used, a relatively correct reference entity is provided, and the error correction can be conveniently carried out by a marking person.
S5093、F g x ≤F' 0 While to U g x Marking to determine U g x And marking the abnormity.
The embodiment provides a data processing system for correcting errors of a label text, which comprises: a database, a processor, and a memory storing a computer program, wherein the database comprises: annotating a text list, the computer program when executed by a processor implementing the steps of: when the number of the marked texts in the marked text list is smaller than a preset text number threshold value, acquiring any marked text as a test set and a text set corresponding to the marked text as a training set; when the number of the labeled texts in the labeled text list is not less than a preset text number threshold value, dividing the labeled text list into a plurality of intermediate labeled text lists, wherein each intermediate labeled text list comprises labeled texts with the same number, acquiring any one of the intermediate labeled text lists as a test set and a text set corresponding to the intermediate labeled text list as a training set, training the preset model according to the training set, acquiring abnormal texts and performing text error correction processing according to the abnormal texts on the basis of the trained preset model and the trained test set to obtain all abnormal labels corresponding to the abnormal texts; abnormal texts can be determined quickly and accurately, only proofreading personnel are needed to proofread the abnormal texts, workload is simplified, and text calibration efficiency is improved.
In addition, when the entity in the abnormal text is a Chinese entity or an English entity, different methods are determined to obtain the similarity, so that the similarity is accurately determined, and further, the wrong labeling of the abnormal text is prompted.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A data processing system for error correction of annotated text, the system comprising: a database, a processor, and a memory storing a computer program, wherein the database comprises: annotation text list a = { a = { (a) 1 ,……,A i ,……,A m },A i I =1 … … m, where m is the number of the labeled texts, when the computer program is executed by a processor, the following steps are implemented:
s100, when m is less than a preset text quantity threshold value m 0 Then, a first specified text set G = { G ] corresponding to A is obtained 1 ,……,G i ,……,G m H, the ith first specified text set G i ={A i ,B i In which A i Corresponding first text list B i ={B i1 ,……,B ir ,……,B is },B ir Refers to the first r text, r =2 … … s is the first text number, and A is i As the ith first target test set in G and B in G i As the ith first target training set;
s200, when m is more than or equal to m 0 Then, according to A, an intermediate text set D = { D) is obtained 1 ,……,D j ,……,D n },D j ={D j1 ,……,D jt ,……,D jk },D jt The method is characterized in that the method refers to the t-th intermediate text in the j-th intermediate text list, j =2 … … n, n is the number of intermediate text lists, t =1 … … k, k is the number of intermediate texts in any intermediate text list, wherein n meets the following conditions:
Figure DEST_PATH_IMAGE002
s300, obtaining a second specified text set G ' = { G ' corresponding to A ' 1 ,……,G' j ,……,G' n }, jth second specified text set G' j ={D j ,C j In which D is j Corresponding second text set C j ={C j1 ,……,C jq ,……,C jp },C jq Refers to the qth second text list, q =2 … … p is the number of the second text lists, and D is j As the jth second target test set in G' and C j As the jth second target training set in G', where C jq The q-th second text list is referred to, and q =2 … … p is the number of the second text lists;
s400, obtaining a target training set, training a preset text error correction model based on the target training set, and entering the target file error correction model to enable the target training set to be input into the target file error correction model according to a target test set to obtain an abnormal text corresponding to the A, wherein the target training set comprises a first target training set or a second target training set, the target test set comprises a first target test set or a second target test set, and the target test set and the target training set are in a corresponding relation;
s500, obtaining an abnormal text list H = { H) corresponding to A 1 ,……,H g ,……,H z },H g G =1 … … z, z is the number of abnormal texts, and for H g Performing text error correction to obtain H g Marking all corresponding exceptions;
wherein, still include in the said database: entity type set L = { L = { (L) 1 ,……,L y ,……L w },L y Refers to the entity list corresponding to the y-th entity type, y =1 … … w, w is corresponding to the textNumber of entity types, which when executed by a processor, further comprises the following step in step S500:
s501, obtaining H g Corresponding list of tagged entities U g ={U g 1 ,……,U g x ,……,U g βg },U g x Refers to the x-th annotated entity, x =1 … … β g ,β g The number of entities marked in the g-th abnormal text is referred to;
s503, according to U g x Corresponding entity type, obtaining U from L g x Corresponding entity list L y ={L y 1 ,……,L y e ,……,L y vy },L y e Is referred to as L y E =1 … … v, of the e-th entity y ,v y Is referred to as L y The number of intermediate entities;
s505 according to U g x And L y e Obtaining U g x Target similarity F of g x (ii) a Wherein, the step S505 further includes the following steps:
s5051, when U g x In the case of Chinese entities, from L y In obtaining L y Corresponding Chinese entity list T y ={T y 1 ,……,T y a ,……,T y by },T y a Is referred to as L y The a-th chinese entity, a =1 … … b y ,b y Is meant for L y The number of Chinese entities;
s5053 according to U g x And T y a Obtaining U g x And T y a List of similarities between E gy x ={E gy x1 ,……,E gy xa ,……,E gy xby And from E gy x To obtain the maximum similarity as F g x Wherein E is gy xa Is referred to as U g x And T y a The similarity betweenDegree, E gy xa The following conditions are met:
Figure DEST_PATH_IMAGE004
wherein, in the step (A),
MK gx γ is referred to as U g x Corresponding vector MK gx Middle gamma bit value, NK ya γ Means T y a Corresponding vector NK ya The middle gamma bit value;
s5055, when U g x In the case of Chinese entities, from L y In obtaining L y Corresponding non-Chinese entity list R y ={R y 1 ,……,R y c ,……,R y dy },R y c Is referred to as L y C-th non-chinese entity, c =1 … … d y ,d y Is referred to as L y The number of Chinese and non-Chinese entities;
s5057 according to U g x And R y Obtaining U g x And R y List of similarities between F gy x ={F gy x1 ,……,F gy xc ,……,F gy xdy And from F gy x To obtain the maximum similarity as F g x Wherein F is gy xc Is referred to as U g x And R y c Similarity between them, F gy xc The following conditions are met:
Figure DEST_PATH_IMAGE006
wherein λ is gy xc Is referred to as U g x And R y c Edit distance, η between gy xc Is referred to as in U g x Number of characters and R y c A maximum number of characters between the numbers of characters of (a);
s507, when F g x =F 0 While determining U g x Is labeled as non-abnormal, wherein F 0 Is a preset first similarity threshold value F 0 Is 1;
s509, when F g x ≠F 0 While to U g x Marking to determine U g x And marking the abnormity.
2. The system of claim 1, wherein the annotated text is referred to as annotated text.
3. The data processing system for error correction of annotated text as claimed in claim 1, wherein m is 0 The value range of (A) is 10-50.
4. The data processing system for error correction of annotated text as claimed in claim 1, wherein B is i The first text in A means that A is divided by A i Any other than the tagged text.
5. The data processing system for error correction of annotation text of claim 1, wherein the intermediate text refers to any annotation text in the intermediate text list divided based on A.
6. The data processing system for error correction of annotated text as in claim 1, wherein C is j The second text list refers to dividing D by D in D j Any intermediate text list other than the text list.
7. The data processing system for error correction of markup text according to claim 1, wherein s satisfies the following condition: s = m-1.
8. The data processing system for error correction of annotated text as claimed in claim 1, wherein p satisfies the condition: p = n-1.
CN202210710576.1A 2022-06-22 2022-06-22 Data processing system for error correction of label text Active CN114792085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210710576.1A CN114792085B (en) 2022-06-22 2022-06-22 Data processing system for error correction of label text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210710576.1A CN114792085B (en) 2022-06-22 2022-06-22 Data processing system for error correction of label text

Publications (2)

Publication Number Publication Date
CN114792085A CN114792085A (en) 2022-07-26
CN114792085B true CN114792085B (en) 2022-09-16

Family

ID=82463241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210710576.1A Active CN114792085B (en) 2022-06-22 2022-06-22 Data processing system for error correction of label text

Country Status (1)

Country Link
CN (1) CN114792085B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662930A (en) * 2012-04-16 2012-09-12 乐山师范学院 Corpus tagging method and corpus tagging device
CN112712118A (en) * 2020-12-29 2021-04-27 银江股份有限公司 Medical text data oriented filtering method and system
WO2021212612A1 (en) * 2020-04-23 2021-10-28 平安科技(深圳)有限公司 Intelligent text error correction method and apparatus, electronic device and readable storage medium
CN113806565A (en) * 2021-11-18 2021-12-17 中科雨辰科技有限公司 Data processing system for text labeling
CN114579675A (en) * 2022-05-05 2022-06-03 中科雨辰科技有限公司 Data processing system for determining common finger event

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662930A (en) * 2012-04-16 2012-09-12 乐山师范学院 Corpus tagging method and corpus tagging device
WO2021212612A1 (en) * 2020-04-23 2021-10-28 平安科技(深圳)有限公司 Intelligent text error correction method and apparatus, electronic device and readable storage medium
CN112712118A (en) * 2020-12-29 2021-04-27 银江股份有限公司 Medical text data oriented filtering method and system
CN113806565A (en) * 2021-11-18 2021-12-17 中科雨辰科技有限公司 Data processing system for text labeling
CN114579675A (en) * 2022-05-05 2022-06-03 中科雨辰科技有限公司 Data processing system for determining common finger event

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于条件随机场模型和文本纠错的微博新词词性识别研究;韩彦昭 等;《南京大学学报(自然科学)》;20160331;第52卷(第2期) *

Also Published As

Publication number Publication date
CN114792085A (en) 2022-07-26

Similar Documents

Publication Publication Date Title
CN102662930B (en) Corpus tagging method and corpus tagging device
Shi et al. Revisiting the model size effect in structural equation modeling
JP5963328B2 (en) Generating device, generating method, and program
CN114925692B (en) Data processing system for acquiring target event
CN103324620B (en) A kind of method and apparatus that annotation results is rectified a deviation
CN109863487A (en) Non- fact type question answering system and method and the computer program for it
CN109241997B (en) Method and device for generating training set
CN107608951B (en) Report generation method and system
CN107527070B (en) Identification method of dimension data and index data, storage medium and server
CN110163252B (en) Data classification method and device, electronic equipment and storage medium
EP3163519A1 (en) Methods for detecting one or more aircraft anomalies and devices thereof
Li et al. A simple categorical chart for detecting location shifts with ordinal information
WO2021164301A1 (en) Medical text structuring method and apparatus, computer device and storage medium
JP5682448B2 (en) Causal word pair extraction device, causal word pair extraction method, and causal word pair extraction program
CN110110334A (en) A kind of remote medical consultation with specialists recording text error correction method based on natural language processing
EP4057193A1 (en) Method and system for identifying mislabeled data samples using adversarial attacks
CN113255583A (en) Data annotation method and device, computer equipment and storage medium
CN116401464A (en) Professional user portrait construction method, device, equipment and storage medium
CN115100668A (en) Method and device for identifying table information in image
CN114792085B (en) Data processing system for error correction of label text
CN113342909B (en) Data processing system for identifying identical solid models
CN113806565B (en) Data processing system for text labeling
CN115129951B (en) Data processing system for acquiring target statement
CN117312138A (en) Software defect detection method, device, computer equipment, storage medium and product
CN117501275A (en) Method, computer program product and computer system for analyzing data consisting of a large number of individual messages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant