CN117396899A - System and method for extracting fields from unlabeled data - Google Patents

System and method for extracting fields from unlabeled data Download PDF

Info

Publication number
CN117396899A
CN117396899A CN202280036060.1A CN202280036060A CN117396899A CN 117396899 A CN117396899 A CN 117396899A CN 202280036060 A CN202280036060 A CN 202280036060A CN 117396899 A CN117396899 A CN 117396899A
Authority
CN
China
Prior art keywords
field
ple
tag
words
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280036060.1A
Other languages
Chinese (zh)
Inventor
M·高
Z·陈
R·徐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shuo Power Co
Original Assignee
Shuo Power Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/484,623 external-priority patent/US20220366317A1/en
Application filed by Shuo Power Co filed Critical Shuo Power Co
Priority claimed from PCT/US2022/014013 external-priority patent/WO2022245407A1/en
Publication of CN117396899A publication Critical patent/CN117396899A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2178Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
    • G06F18/2185Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor the supervisor being an automated module, e.g. intelligent oracle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments describe a field extraction system that does not require field level annotations for training. Specifically, the training process is guided by mining pseudo tags from unlabeled forms using simple rules. The transformer-based structure is then used to model interactions between text tokens in the input form and predict field tags for each token accordingly. The pseudo tag is used for supervised transducer training. Since the pseudo tag is noisy, a refinement module containing a branching sequence is used to refine the pseudo tag. Each of the refinement branches performs a field tagging and generates a refinement tag. At each stage, the branches are optimized by the tags from all previous branch sets to reduce tag noise.

Description

System and method for extracting fields from unlabeled data
Cross reference
The instant application claims priority from U.S. non-provisional application Ser. Nos. 17/484,618 and 17/484,623, both filed on month 9 of 2021, and U.S. Ser. No. 63/189,579, filed on month 5 of 2021, and claims priority from 35U.S. C.119.
All of the above applications are expressly incorporated herein by reference in their entirety.
Technical Field
Embodiments relate generally to machine learning systems and computer vision, and more particularly, to mechanisms for extracting fields from forms having unlabeled data.
Background
Form-like documents, such as bills, payroll and patient referral forms, are commonly used in daily business workflow. Extraction from fields in various forms is often a challenging task. For example, even for the same form type, if the forms are issued by different suppliers, the document layout and text representations may be different, e.g., bills from different companies may have significantly different designs, payroll from different systems (e.g., ADP and Workday) may have different text representations for similar information, and/or the like. Traditionally, extracting information from such form documents has required a significant amount of human effort. For example, a list of form fields that are typically intended by the staff, such as purchase orders, bill numbers, amounts, etc., based on an understanding of the form, their corresponding values are extracted based on these fields.
Thus, there is a need for an efficient system for extracting information from a form document.
Drawings
Fig. 1 is a simplified diagram illustrating an example of extracting fields from a bill according to one embodiment described herein.
Fig. 2 is a simplified diagram illustrating an overall self-supervised training framework of a field extraction system, according to embodiments described herein.
Fig. 3 is a block diagram illustrating an example framework for refining the field extraction framework depicted in fig. 2 with a pseudo tag set (PLE) according to embodiments described herein.
Fig. 4 is a simplified diagram of a computing device implementing a field extraction framework in accordance with some embodiments described herein.
FIG. 5 is a simplified diagram of a method for field extraction from a form with unlabeled data via a field extraction model, in accordance with some embodiments.
FIG. 6 is a simplified diagram of a method for tag refinement in field extraction from forms with unlabeled data through a field extraction model, according to some embodiments.
FIG. 7 is a data table of an exemplary key list and date type of a training dataset that provides unlabeled form data, in accordance with some embodiments.
Fig. 8A-8B are diagrams illustrating exemplary unlabeled forms according to some embodiments.
Fig. 9-16 provide exemplary results of data experiments of the field extraction model described in fig. 1-6, according to some embodiments.
In the drawings, elements having the same name have the same or similar functions.
Detailed Description
Machine learning systems have been widely used in computer vision, e.g., pattern recognition, object localization, etc. Some recent machine learning methods formulate form field extraction as field value pairs or field labels. For example, some existing systems employ a representation learning method that takes field and value candidates as inputs and utilizes metric learning techniques to implement high pairing scores for positive field-value pairs and low scores for negative field-value pairs. Another system uses a pre-trained transducer with text and its location as input. However, these existing methods typically require a large number of field level annotations to train. Acquiring field level annotations of forms can be quite expensive and labor intensive, sometimes even impossible, because (1) forms often contain sensitive information and thus limited public data is available for training purposes; and (2) the use of external annotators is also not feasible due to the risk of exposing private information.
In view of the need for an efficient system for extracting information from a form document, embodiments describe a field extraction system that does not require field level annotations for training. Specifically, the training process is guided by mining pseudo tags from unlabeled forms using simple rules. The transformer-based structure is then used to model interactions between text tokens in the input form and predict field labels for each token accordingly. The pseudo tag is used for supervised transducer training. Since the pseudo tag is noisy, a refinement module containing a branching sequence is used to refine the pseudo tag. Each of the refinement branches performs a field tagging and generates a refinement tag. At each stage, the branches are optimized by the tags from all previous branch sets to reduce tag noise.
For example, the field extraction system trains on self-supervising pseudo tags from unlabeled data. Specifically, the field extraction system detects sets of words and their locations in the form and identifies field values based on geometric rules between words, e.g., the fields and field values may be generally horizontally aligned and separated by a colon. The identified field values may then be used as pseudo tags to train a transformer network that encodes the detected words and locations for classification.
In some embodiments, multiple pseudo tag set (PLE) branches may be used to refine the pseudo tags used for training. Specifically, PLE branches are operated in parallel to generate prediction classifications from encoded representations of detected words and locations. At each branch, a penalty component is calculated by comparing the refined label at that branch to the predicted label generated by the "previous" PLE as a pseudo label. The loss components on the PLE branches are then summed to jointly update the PLE.
As used herein, the term "network" may include any hardware or software based framework including any artificial intelligence network or system, neural network or system, and/or any training or learning model implemented thereon or therewith.
As used herein, the term "module" may include a hardware or software based framework that performs one or more functions. In some embodiments, the module may be implemented on one or more neural networks.
Fig. 1 is a simplified diagram 100 illustrating an example of extracting fields from a bill according to one embodiment described herein. Traditionally, in form processing, a list of form fields intended by a staff member, such as purchase orders, bill numbers, and amounts, is typically given, and the goal is to extract their corresponding values based on an understanding of the form. Keys, such as bill #, PO number, and total, refer to a specific text representation of a field in a form, and are important indicators of value location. Keys are often the most important feature of value location. Thus, the field extraction system aims to automatically extract field values from irrelevant information in a form, which is important for improving processing efficiency and reducing manpower.
As shown in chart 100, the form contains various phrases such as "bill#", "1234," "PO number," "000001," and the like. The field extraction system may identify that "PO number" 102 is a location key and then determine whether any of the values "1234"104, "00000001"103, or "100.00"105 match the location key. Such a match may be determined based on the geometric relationship between the positioning key 102 and the values 103 to 105. For example, a rule-based algorithm may be applied to determine the match, e.g., the value "0000001"103 is more likely to be the value corresponding to the navigation key 102, because the value 103 has a position vertically aligned with the position of the navigation key 102.
Unlike previous approaches that have access to large scale marked forms, rule-based approaches can be used to generate noisy pseudo tags (e.g., fields and values) from unmarked data. A rule-based algorithm is constructed based on the following observations: (1) The field value (e.g., 103 in FIG. 1) is typically displayed in a form with some keys (e.g., 102 in FIG. 1) and the keys (e.g., 102 in FIG. 1) are a specific textual representation of the field; (2) The keys and their corresponding values have a strong geometric relationship (as shown in fig. 1, the keys are mostly next to their values vertically or horizontally); (3) Although the layout of the forms is very diverse, some key-text is typically used in different form instances (e.g., the key-text of a field purchase order may be "PO number", "po#", etc.); and (4) the field value is always associated with a certain date type (e.g., the data type of the value of "billing date" is date, and the data type of the value of "total" is amount or number).
Thus, rule-based methods can be used to generate useful pseudo tags for each field of interest from a large scale form. As shown in fig. 1, key location 102 is first performed based on string-matches between possible key strings of text and fields in a form. The values 103 to 105 are then estimated based on the text data type and its geometrical relationship to the navigation key 102.
Fig. 2 is a simplified diagram illustrating an overall self-supervising training framework 200 of a field extraction system according to embodiments described herein. The framework 200 includes an optical character recognition module 205, a transformer network 210, and a classifier 220. Untagged forms 202, e.g., checks, bills, payroll, and/or the like, may include field information in a predefined list, { fd 1 ,fd 2 ,...,fd N }. Given a form as input, a generic OCR detection and recognition module 205 is applied to the unlabeled form 202 to obtain a word set, { w 1 ,w 2 ,...,w M Their positions are represented as bounding boxes { b } 1 ,b 2 ,…,b M }. Thus, the field extraction method aims at, if stored in the input formIn the field, then from the massive word candidates { w } 1 ,w 2 ,…,w M Automatic extraction in } and field fd i Matched target value v i
The word and bounding box position pair { w may then be paired i ,B i The input to the transformer encoder 210 is characterized by encoding. Pair { w }, can also be i ,B i The pseudo tag inference module 215 is configured to perform key location and value estimation, the key location identifying a location of a key corresponding to each predefined field, the value estimation determining a corresponding field value of the located key.
For example, since a key and value may contain multiple words, upon receipt of a word and bounding box position pair { W } i ,B i At the time of }, the pseudo tag inference module 215 may use the DBSCAN algorithm (Ester et al, 1996) to group nearby recognized words based on their locations to obtain phrase candidates [ ph ] i 1 ,ph i 2 ,...,ph i T ]And their positions [ B ] i 1 ,B i 2 ,...,B i T ]。
For each field of interest, fd i List of common keys, [ k ] i 1 ,k i 2 ,...,k i L ]Is determined based on domain knowledge. For example, a field name may be used as the only key in the list. Module 215 may then measure phrase candidates ph i j And each designed key k i r The string distance between them is d (ph i j ,k i r ). The module 215 may calculate the key score for each phrase candidate indicating the likelihood of the candidate being a key of a field using the following equation:
the key is then located by finding the candidate with the largest key score as follows:
the pseudo tag inference module 215 may then determine the value (or values, if applicable) of the location key. Specifically, the values are estimated according to two criteria. First, their data type should be consistent with their field. Second, their positions should be very consistent with the location keys. For each field, a list of qualified data types may be predetermined. For example, for the data field "bill number," the data type may include a string or an integer. The data type of each phrase candidate may be predicted using a pre-trained BERT-based model, and only candidates ph with the correct data type are retained i j
In one embodiment, for each qualified candidate ph i j The value score was determined as follows:
wherein key_scoreIndicating key fraction of the positioning key, +.>A geometric score between the candidate and the location key is indicated. The key (e.g., 102 in fig. 1) and its value (e.g., 103 in fig. 1) are typically close to each other, and the value may be directly below the key or to the right of them. Thus, geometric relationships (such as distance and angle) are determined to measure key-value relationships:
wherein the method comprises the steps ofIndicating the distance between two phrases, +.>Indication slave ph i j To ph i r And phi (|μ, σ) indicates a gaussian function with μ as the mean and σ as the standard deviation. Here, μ a Is set to 0, sigma b Sum sigma a Is fixed at 0.5. To reward candidates whose angle relative to the key is close to 0 or pi/2, the maximum angle score towards these two options is as follows:
thus, if the value score of the candidate is the largest of all candidates and the score exceeds a threshold, e.g., θ, as in equation (5) v =0.1, then the candidate is determined as the predicted value of the field.
In one embodiment, the output of the pseudo tag inference module 215, e.g., as an estimate of the fields of the pseudo tag, may be used as a separate field extraction output. In another embodiment, the estimates of the fields may be used as pseudo tags for pilot training to further improve field extraction performance. Specifically, to predict the target tag of a word, it is necessary to learn the meaning of this word and its interactions with the surrounding context. A transformer-based architecture (e.g., layoutLM,2020 described in Xu et al) may be used to learn representations of words because of its strong ability to model context information. In addition to semantic representation, the location of the words and the overall layout of the input form are also important and can be used to capture distinguishing features of the words. The transformer encoder 210 may generate a pair { W from the input i ,B i Extracting features:
[f 1 ,f 2 ,…,f M ]=T([(w 1 ,b 1 ),(w 2 ,b 2 ),...,(w M ,b M )]),(6)
where T () represents the transducer-based feature extractor, f i Indicating the character of word i.
The classifier 220 for token classification may receive input of the encoded feature representation from the transformer encoder 210, which generates a prediction field including the context of each token from the original unlabeled form 202. Specifically, classifier 220 projects features into field space ({ background, fd) by passing through the Full Connection (FC) layer 1 ,fd 2 ,…,fd N ) Generating a field prediction score s) k . The predicted field scores from classifier 220 and the generated pseudo tags from pseudo tag reasoning 215 may then be compared at penalty module 230 to generate training objectives. The transformer 210 and classifier 220 may be further updated with training objectives via a reverse propagation path (shown by dashed lines).
In one embodiment, as further described in fig. 3, multiple progressive pseudo tag sets (PLE) may be used for guided training.
Fig. 3 is a block diagram illustrating an example framework 300 for refining the field extraction framework described in fig. 2 with PLE according to embodiments described herein. As depicted in fig. 2, the transformer 210 receives input 302 of words extracted from the unlabeled form 202 and the location of the bounding box surrounding the words, (w) 1 ,b 1 ),(w 2 ,b 2 ),...,(w M ,b M ) Based on the initial word-level field tag (also referred to asGuide tag) is obtained by an estimated pseudo tag at pseudo tag inference module 215. Therefore, cross entropy loss L(s) k ,/>) To optimize the transformer network 210, the cross entropy penalty being calculated based on the field prediction scores from the classifier 220 and the generated guide tags.
However, using only noisy guide tags as a ground truth in training may be degradedLow model performance. A refinement module 304 comprising a plurality of PLE is employed after the transformer 210, each PLE serving as a classification branch. Specifically, at each branch j, PLEs independently perform field classification and refine pseudo tags based on their predictionsThe post-stage branches are optimized using refinement tags obtained from the previous branches.
For example, at branch k, a refinement tag is generated according to the following steps: (1) By argmax(s) kc ) Finding predictive field tags for each wordAnd (2) for each field, retaining the word only if the predictive score of the word is highest among all words and greater than a threshold (fixed to 0.1). For example, suppose PLE module 304 includes branches 304a through 304n. The first PLE branch 304a may receive the pseudo tag +_ generated from the pseudo tag inference module 215 >The FC layer generates a field classification score s based thereon 1 Then it is converted into a pseudo tag +.>Then, guide tag->And output pseudo tag->Is fed to the second PLE branch 304b, on the basis of which the fc layer generates a field classification score s 2 Then it is converted into a pseudo tag +.>In a similar way, the kth PLE branch receives the guidance tag +.>And all generated pseudo tags->The FC layer generates a field classification score s based thereon k Then it is converted into a pseudo tag +.>
Thus, the final loss aggregates all losses, calculated as follows:
where β is a super parameter that controls the contribution of the original pseudo tag.
In this way, progressive refinement of the tag reduces tag noise. However, using only refinement tags at each level yields limited performance improvement because although the tags become more accurate after refinement, some low confidence values are filtered out, which results in lower recall. To alleviate this problem, each branch is improved using the set labels of all previous stages. The aggregate labels not only maintain a better balance between accuracy and recall, but also are more diverse and can be used as regularization for model optimization. During reasoning, the average score predicted from all branches may be used. A similar procedure may be applied to obtain the final field value, such as generating a refinement tag.
Computer environment
Fig. 4 is a simplified diagram of a computing device 400 implementing a field extraction framework in accordance with some embodiments described herein. As shown in fig. 4, computing device 400 includes a processor 410 coupled to a memory 420. The operation of computing device 400 is controlled by processor 410. And although computing device 400 is shown with only one processor 410, it should be appreciated that processor 410 may represent one or more central processing units, multi-core processors, microprocessors, microcontrollers, digital signal processors, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), graphics Processing Units (GPUs), and/or the like in computing device 400. Computing device 400 may be implemented as a stand-alone subsystem, a board added to the computing device, and/or a virtual machine.
Memory 420 may be used to store software executed by computing device 400 and/or one or more data structures used during operation of computing device 400. Memory 420 may include one or more types of machine-readable media. Some common forms of machine-readable media may include floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, flash EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer can read.
The processor 410 and/or the memory 420 may be arranged in any suitable physical arrangement. In some embodiments, processor 410 and/or memory 420 may be implemented on the same board, the same package (e.g., a system in a package), the same chip (e.g., a system on a chip), etc. In some embodiments, processor 410 and/or memory 420 may include distributed, virtualized, and/or containerized computing resources. Consistent with these embodiments, processor 410 and/or memory 420 may be located in one or more data centers and/or cloud computing facilities.
In some embodiments, memory 420 may include a non-transitory, tangible, machine-readable medium comprising executable code that, when executed by one or more processors (e.g., processor 410), may cause the one or more processors to perform the methods described in further detail herein. For example, as shown, memory 420 includes instructions for field extraction module 430, which may be used to implement and/or simulate systems and models, and/or implement any of the methods further described herein. In some embodiments, field extraction module 430 may receive input 440 via data interface 415, for example, an unlabeled image instance such as a form. The data interface 415 may be a user interface that receives form image instances uploaded by a user, or may be any one of a communication interface that receives or retrieves previously stored form image instances from a database. The field extraction module 430 may generate an output 450, such as an extracted field of the input 440.
In some embodiments, the field extraction module 430 may further include a pseudo tag reasoning module 431 and a PLE module 432. The pseudo tag inference module 431 uses a rule-based approach to mine noise pseudo tags from forms, e.g., as described in fig. 2. PLE module 432 (similar to refinement module 304 in fig. 3) may learn a data driven model during training, implemented as a token classification task, with inputs of a set of tokens extracted from a form and outputs of predicted fields including context for each token, using the estimates of the fields as pseudo tags. Further details of PLE module 432 will be discussed in conjunction with fig. 3.
Field extraction workflow
Fig. 5 is a simplified diagram of a method 500 for field extraction from a form with unlabeled data via a field extraction model, in accordance with some embodiments. One or more of the processes of method 500 may be implemented at least in part in the form of executable code stored on a non-transitory, tangible, machine-readable medium, which when executed by one or more processors, may cause the one or more processors to perform one or more of the processes. In some embodiments, method 500 corresponds to the operation of field extraction module 430 (fig. 4) to perform a method of field extraction or training a field extraction model. As shown, the method 500 includes a plurality of enumerated steps, but aspects of the method 500 may include additional steps before, after, and between the enumerated steps. In certain aspects, one or more of the enumerated steps may be omitted or performed in a different order.
At step 502, an unlabeled form including a plurality of fields and a plurality of field values is received via a data interface (e.g., 415 in FIG. 4). For example, an unlabeled form may take a form similar to the form shown in fig. 8A-8B.
At step 504, a set of words and a set of locations are detected within an unlabeled form of the set of words. For example, words and locations may be detected by OCR module 205 in FIG. 2.
At step 506, field values for the fields are identified from the word set and the location set based at least in part on the geometric relationship between the word sets. For example, the field value may be identified by applying a first rule, i.e. one or more words in the form of keys are associated with the field name of the field. For another embodiment, the field value may be identified by applying a second rule, i.e., pairs of words that are horizontally or vertically aligned are keys of the field and field value. For another embodiment, the field value may be identified by applying a third rule, i.e. the word from the word set that matches the predefined key text is the key of the field.
In one implementation, key locations corresponding to fields are determined. For example, by grouping nearby recognized words, a set of phrase candidates is determined from the set of words, and a corresponding set of phrase positions is determined from the set of positions. A key score for each phrase candidate is calculated, the key score indicating a likelihood that the corresponding phrase candidate is a key of a field. The key score is calculated based on the string distance between the respective phrase candidate and the predefined key, see, for example, equation (1). The key of the field is then determined based on the maximum key score in the phrase candidate set, see, for example, equation (2).
In particular, to calculate the key score, a neural model may be used to predict the respective data type for each phrase candidate. A subset of phrase candidates having a data type that matches the predefined data type of the field is then determined. A value score is calculated for each phrase candidate in the subset, the value score indicating a likelihood that the corresponding phrase candidate is a field value of the field. A value score, e.g., equation (3), is calculated based on the key score of the location key corresponding to the field and the geometric relationship metric between the corresponding phrase candidate and the location key. A geometric relationship metric, e.g., equation (4), is calculated based on the string distance and angle between the respective phrase candidate and the positioning key. The field value is then determined based on the maximum score in the phrase candidate subset.
At step 508, an encoder (e.g., transformer encoder 210 in fig. 2) may encode the pair of first words and the first position corresponding to the field value into a first representation.
At step 510, a classifier (e.g., classifier 220 in fig. 2) may generate a field classification distribution from the first representation by the classifier.
At step 512, a first penalty objective is calculated by comparing the field classification distribution to the field values that are pseudo tags.
At step 514, the encoder is updated via back propagation based on the first loss target.
Fig. 6 is a simplified diagram of a method 600 for tag refinement in field extraction from forms with unlabeled data through a field extraction model, in accordance with some embodiments. One or more of the processes of method 600 may be implemented at least in part in the form of executable code stored on a non-transitory, tangible, machine-readable medium, which when executed by one or more processors, may cause the one or more processors to perform one or more of the processes. In some embodiments, method 600 corresponds to the operation of field extraction module 430 (fig. 4) to perform a method of field extraction or training a field extraction model. As shown, method 600 includes a number of enumerated steps, but aspects of method 600 may include additional steps before, after, and between the enumerated steps. In certain aspects, one or more of the enumerated steps may be omitted or performed in a different order.
At step 602, an unlabeled form including a plurality of fields and a plurality of field values is received via a data interface (e.g., 415 in fig. 4). For example, unlabeled forms may take forms similar to those shown in FIGS. 8A-8B.
At step 604, a first word and a first location of the first word are detected within the unlabeled form. For example, words and locations may be detected by OCR module 205 in FIG. 2.
At step 606, an encoder (e.g., the transformer encoder 210 of fig. 2) encodes the pair of first words and the first position into a first representation, e.g., equation (6).
At step 608, a plurality of progressive tag set (PLE) branches (see, e.g., 304 a-304 n in fig. 3) generate, in parallel, a plurality of predictive tags based on the first representation, respectively. Each of the plurality of PLE branches includes a respective classifier that generates a respective prediction tag based on the first representation. The predictive tag at one PLE branch is generated by: the method includes projecting a first representation into a set of field prediction scores via one or more fully connected layers, and generating a prediction tag based on a maximum field prediction score in the set of words. When the maximum field prediction score is greater than a predefined threshold, for a field of the plurality of fields, a word corresponding to the maximum field prediction score is selected from the set of words.
At step 610, one PLE branch calculates a loss component by comparing the predicted tag at one PLE branch to the predicted tag from the previous PLE branch as a pseudo tag.
At step 612, the loss target is calculated as the sum of the loss components on multiple PLE branches, e.g., equation (7).
At step 614, the plurality of PLE branches are updated via back propagation based on the loss objective. In one embodiment, a first PLE branch from the plurality of PLE branches uses an identification field value from the field of step 506 in fig. 5 as a first pseudo tag. A joint loss target is calculated by adding the loss target to the first loss target calculated at step 512 in fig. 5. The encoder and the plurality of PLE branches are then jointly updated based on the joint loss objective.
Exemplary Properties
An example training data set may include real bills collected from different suppliers. For example, the training set contains 7664 unlabeled billing forms of 2711 templates. The validation set contains 348 marked bills of 222 templates. The test set contained 339-marked bills of 222 templates. There are a maximum of 5 images per template in each set. Consider 7 common fields including a billing number, purchase order, billing date, expiration amount, total, and total tax.
For the tobacco test set, 350 bills were collected from the tobacco collection for the publicly-published industry literature library 2. The validation and test sets of the data sets within the internal bill have similar field statistical distributions, while the common tobacco test set is different. For example, the bill of the tobacco set (as shown in fig. 8A) may have a lower resolution and a more cluttered background than the other bills in the training data set (as shown in fig. 8B).
The end-to-end macroscopic average F1 score over the field is used as a measure of the assessment model. In particular, the exact string matches between our predicted and ground truth values are used to calculate true positives, false positives, and false negatives. The precision recall and F1 score for each field are obtained accordingly. The reported scores were averaged over 5 runs to reduce the impact of randomness.
Since there is no existing method to perform field extraction using only unlabeled data, the following baselines were constructed to verify our method: guide tag (B tag): the initial pseudo tag using the proposed simple rule reasoning can be used to directly perform field extraction without training data. The transducer is trained using the B tag: since the transformer is used as the backbone to extract word features, the transformer model is trained using the B-tags as the baseline to evaluate the performance gains from (1) the data driven model in the pipeline and (2) the refinement module. The content of the text and its location are important for field prediction. An example of a transformer backbone is a LayoutLM, which takes as input both text and location. In addition, two popular transducer models are used, BERT and RoBERTa, which accept text only as input.
The OCR engine is used to detect words and their locations and then order the words in the order of reading. An example key list and date type for each dataset is shown in table 1 of fig. 7. Key lists and data types are very broad. Alpha is set to 4.0 in equation (4). To further remove false positives, the candidate is removed if the positioning key is not in its neighboring region. Specifically, the adjacent region around the candidate value extends all the way to the left of the image, with four candidate heights above it and one candidate height below it. The refinement branch number k=3 for all experiments. When the number of stages>1, at the time of divisionA hidden FC layer is added before the class, with 768 cells. For all billing experiments, β in equation (7) was set to 1.0, except for β=5.0 for the BERT-based refinement in table 4 of fig. 11, because of its better performance in the validation set. For the field extraction model and baseline described herein, the model with the best F1 score was selected in the validation set. To prevent overfitting, a two-step training strategy is employed in which the first branch of the model is trained using pseudo tags, which are then fixed with the feature extractor during refinement. Batch size was set to 8 and learning rate was used to 5e 5 Adam optimizer of (a).
The proposed model is then validated using the intra-bill dataset because it contains large-scale unlabeled training data and a sufficient amount of valid/test data, which is more appropriate for our experimental setup. The proposed training method is first verified using a LayoutLM as the backbone. The comparison results are shown in table 2 of fig. 9 and table 3 of fig. 10. The guide tag (B-Labels) baseline reached F1 scores of 43.8% and 44.1% in the active and test sets, respectively, indicating that our B-Labels were reasonably accurate but still noisy. When the B-tag is used to train the LayoutLM converter, a significant performance improvement is obtained-the active set is increased by about 15% and the test set is increased by about 17%. The PLE refinement module is added to remarkably improve the model precision, namely the effective set is about 6%, the test set is about 7%, the recall rate is slightly reduced, the effective set is about 2.5%, and the test set is about 3%. This is because refinement of the labels becomes more and more confident at later stages, resulting in higher model accuracy. However, the refinement stage also removes some false negatives of low confidence that result in a lower recall. Overall, PLE refinement module further improved performance, 3% improvement in F1 score.
The LayoutLM is then used as the default feature backbone, as both text and its location are important to our task. Furthermore, to learn the effect of different transducer models as a backbone, two other models, BERT and RoBERTa, were evaluated, with text only used as input. The comparison results are shown in table 4 of fig. 11 and table 5 of fig. 12. It was observed that when BERT and RoBERTa were trained directly using the B-Labels and PLE refinement modules, a significant improvement was achieved, continuously improving the baseline results for different transducer selections with different numbers of parameters (basic or large). However, layoutLM still produces higher results than the other two backbones, indicating that text location is indeed very important for achieving good task performance.
The proposed model was then tested using the tobacco test set introduced in table 6 of fig. 13. A simple rule-based approach achieves an F1 score of 25.1%, which is reasonable but much lower than the results for the data set in our internal bill. The reason is that the tobacco test set is visually noisy, which results in more text recognition errors. When B-Labels were used, the LayoutLM baseline was significantly improved. In addition, the PLE refinement module further increases the F1 score by about 2%. The results show that the proposed method is well adapted to different scenarios. In fig. 8A to 8B, it is shown that the proposed method achieves good performance, although the sample bill for the different templates is very diverse, with a cluttered background and low resolution.
Ablation studies were further performed on the billing dataset with the LayoutLM-based backbone. Effect of progression: the proposed model is refined in k stages, while k=3 is fixed in all experiments. It is evaluated with different progression. Figure 15 shows that the model generally performs better on both the active set and the test set as the number of levels k increases. The performance of the multi-stage is always higher than the single stage model (our transducer baseline). Model performance is highest when k=3. As shown in fig. 16, during model refinement, accuracy improves as recall decreases. When k=3, an optimal balance is obtained between accuracy and recall. When k >3, the recall drops more than the precision improves, so a worse F1 score is observed.
Influence of refinement tags (R-Labels): to analyze the impact of this design, all refinement tags were removed in the final penalty, and only three branches were trained independently using B-Labels, and predictions were aggregated during reasoning. As shown in table 7 of fig. 14, removal of refinement tags resulted in a 2.2% and 2.6% drop in F1 score in the active set and test set, respectively.
Influence of B-Labels regularization. At each stage, B-Labels are used as a regularization type to prevent model overfitting to the over-confidence refinement Labels. By setting β=0 in equation (7), B-Labels are used in the refinement stage. Without this regularization, the model performance was reduced by about 2% in the F1 score, as shown in table 7 of fig. 14.
Influence of two-step training strategy: to avoid overfitting the noisy Labels, a two-step training strategy was employed in which the trunk with the first branch was trained using B-Labels and then fixed during refinement. This effect is analyzed by a single step training model. In the active and test sets, single step training resulted in a 1.8% and 1.4% drop in F1 score, respectively.
Some embodiments of a computing device, such as computing device 400, may include a non-transitory, tangible, machine-readable medium comprising executable code that, when executed by one or more processors (e.g., processor 410), may cause the one or more processors to perform the processes of method 400. For example, some common form of machine-readable medium that may comprise the processes of method 400 may be a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, a flash EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
The description and drawings that illustrate aspects, embodiments, implementations, or applications of the present invention are not to be considered limiting. Various mechanical, compositional, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present description and claims. In some instances, well-known circuits, structures, or techniques have not been shown or described in detail in order not to obscure the embodiments of the disclosure. Like numbers in two or more figures represent the same or similar elements.
In this specification, specific details are set forth describing some embodiments consistent with the disclosure. Numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent, however, to one skilled in the art, that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are intended to be illustrative, but not limiting. Those skilled in the art may implement other elements, although not specifically described herein, within the scope and spirit of the disclosure. Furthermore, to avoid unnecessary repetition, one or more features shown and described in connection with one embodiment may be incorporated into other embodiments unless specifically described otherwise, or if one or more features would render the embodiment inoperative.
This application is further described with reference to the appendices in appendix I, entitled "extraction of fields from forms with unlabeled data," page 9, which is considered part of the disclosure, the entire contents of which are incorporated by reference.
While illustrative embodiments have been shown and described, a wide range of modifications, changes, and substitutions are contemplated in the foregoing disclosure and, in some instances, some features of the embodiments may be employed without a corresponding use of the other features. Those of ordinary skill in the art will recognize many variations, alternatives, and modifications. Accordingly, the scope of the invention should be limited only by the attached claims and it is appropriate that the claims be construed broadly and in a manner consistent with the scope of the embodiments disclosed herein.

Claims (40)

1. A method for field extraction from a form having unlabeled data by a field extraction model, the method comprising:
receiving, via the data interface, an unlabeled form including a plurality of fields and a plurality of field values;
detecting, by a processor, a set of words and a set of locations of the set of words within the untagged form;
identifying field values for fields from the set of words and the set of locations based at least in part on geometric relationships between the set of words;
encoding, by an encoder, a pair of a first word and a first position corresponding to the field value into a first representation;
generating, by a classifier, a field classification distribution from the first representation;
calculating a first penalty target by comparing the field classification distribution with the field value as a pseudo tag; and
the encoder is updated via back propagation based on the first loss target.
2. The method of claim 1, wherein identifying the field value of the field comprises applying a first rule: one or more words in the form of keys are associated with the field name of the field.
3. The method of claim 2, wherein identifying the field value of the field comprises applying a second rule: pairs of words that are horizontally or vertically aligned are keys of the field and the field value.
4. The method of claim 3, wherein identifying the field value of the field comprises applying a third rule: words from the set of words that match the predefined key text are the keys of the field.
5. The method of claim 1, further comprising:
determining a set of phrase candidates from the set of words and a corresponding set of phrase positions from the set of positions by grouping nearby recognized words;
calculating a key score for each phrase candidate indicating a likelihood that the corresponding phrase candidate is a key of the field; and
the key of the field is determined based on a maximum key score in the phrase candidate set.
6. The method of claim 5, wherein the key score is calculated based on string distances between respective phrase candidates and predefined keys.
7. The method of claim 5, further comprising:
predicting, via a neural model, a respective data type for each phrase candidate;
determining a subset of phrase candidates having a data type that matches a predefined data type of the field;
calculating a value score for each phrase candidate in the subset that indicates a likelihood that the respective phrase candidate is the field value of the field; and
The field value is determined based on a maximum score in the subset of phrase candidates.
8. The method of claim 7, wherein the value score is calculated based on a key score of a location key corresponding to the field and a geometric relationship metric between a respective phrase candidate and the location key.
9. The method of claim 8, wherein the geometric relationship metric is calculated based on a string distance and an angle between a respective phrase candidate and the positioning key.
10. The method of claim 1, further comprising:
generating a plurality of predictive labels based on the first representation, respectively, in parallel through a plurality of progressive label set (PLE) branches; and
at one PLE branch, a loss component is calculated by comparing the predicted tag at the one PLE branch with the predicted tag from the previous PLE branch as a pseudo tag,
wherein a first PLE branch from the plurality of PLE branches receives an identified field value of the field as a first pseudo tag.
11. A system for field extraction from a form having unlabeled data via a field extraction model, the system comprising:
a data interface that receives an unlabeled form including a plurality of fields and a plurality of field values;
A memory storing instructions for execution by the plurality of processors; and
a processor that executes instructions that the processor executes to perform operations comprising:
detecting a set of words and a set of locations of the set of words within the untagged form;
identifying field values for fields from the set of words and the set of locations based at least in part on geometric relationships between the set of words;
encoding, by an encoder, a pair of a first word and a first position corresponding to the field value into a first representation;
generating, by a classifier, a field classification distribution from the first representation;
calculating a first penalty target by comparing the field classification distribution with the field value as a pseudo tag; and
the encoder is updated via back propagation based on the first loss target.
12. The system of claim 11, wherein identifying the field value of the field comprises applying a first rule: one or more words in the form of keys are associated with the field name of the field.
13. The system of claim 12, wherein identifying the field value of the field comprises applying a second rule: pairs of words that are horizontally or vertically aligned are keys of the field and the field value.
14. The system of claim 13, wherein identifying the field value of the field comprises applying a third rule: words from the set of words that match the predefined key text are the keys of the field.
15. The system of claim 11, wherein the operations further comprise:
determining a set of phrase candidates from the set of words and a corresponding set of phrase positions from the set of positions by grouping nearby recognized words;
calculating a key score for each phrase candidate indicating a likelihood that the corresponding phrase candidate is a key of the field; and
the key of the field is determined based on a maximum key score in the phrase candidate set.
16. The system of claim 15, wherein the key score is calculated based on string distances between respective phrase candidates and predefined keys.
17. The system of claim 15, wherein the operations further comprise:
predicting, via a neural model, a respective data type for each phrase candidate;
determining a subset of phrase candidates having a data type that matches a predefined data type of the field;
calculating a value score for each phrase candidate in the subset that indicates a likelihood that the respective phrase candidate is the field value of the field; and
The field value is determined based on a maximum score in the subset of phrase candidates.
18. The system of claim 17, wherein the value score is calculated based on a key score of a location key corresponding to the field and a geometric relationship metric between a respective phrase candidate and the location key.
19. The system of claim 18, wherein the geometric relationship metric is calculated based on a string distance and an angle between a respective phrase candidate and the positioning key.
20. The system of claim 1, wherein the operations further comprise:
generating a plurality of predictive labels based on the first representation, respectively, in parallel through a plurality of progressive label set (PLE) branches; and
at one PLE branch, a loss component is calculated by comparing the predicted tag at the one PLE branch with the predicted tag from the previous PLE branch as a pseudo tag,
wherein a first PLE branch from a plurality of PLE branches receives the identified field value of the field as a first pseudo tag.
21. A method for field extraction from a form having unlabeled data by a field extraction model, the method comprising:
receiving, via the data interface, an unlabeled form including a plurality of fields and a plurality of field values;
Detecting, by a processor, a first word and a first position of the first word within the unlabeled form;
encoding, by an encoder, the pair of the first word and the first position into a first representation;
generating a plurality of predictive labels based on the first representation, respectively, in parallel through a plurality of progressive label set (PLE) branches;
at one PLE branch, calculating a loss component by comparing a prediction tag at the one PLE branch with a prediction tag from a previous PLE branch as a pseudo tag;
calculating a loss target as a sum of loss components on the plurality of PLE branches; and
the plurality of PLE branches is updated via back propagation based on the loss target.
22. The method of claim 21, wherein each PLE branch of the plurality of PLE branches includes a respective classifier that generates a respective prediction tag based on the first representation.
23. The method of claim 21, wherein the predictive tag at the one PLE branch is generated by:
projecting the first representation into a set of field prediction scores via one or more fully connected layers; and
the predictive tag is generated based on a maximum field predictive score in the word set.
24. The method of claim 23, further comprising:
when the maximum field prediction score is greater than a predefined threshold, for a field of the plurality of fields, a word corresponding to the maximum field prediction score is selected from the set of words.
25. The method of claim 21, further comprising:
detecting, by a processor, a set of words and a set of locations of the set of words within the untagged form;
identifying field values for fields from the set of words and the set of locations based at least in part on geometric relationships between the set of words;
generating, by a classifier, a field classification distribution from the first representation; and
a first penalty objective is calculated by comparing the field classification distribution with the field value as a pseudo tag.
26. The method of claim 25, wherein a first PLE branch from the plurality of PLE branches uses an identified field value of the field as a first pseudo tag.
27. The method of claim 25, further comprising:
calculating a joint loss target by adding the loss target to the first loss target; and
the encoder and the plurality of PLE branches are jointly updated via back propagation based on the joint loss objective.
28. The method of claim 25, further comprising:
the encoder is updated via back propagation based on the first loss target.
29. The method of claim 28, further comprising:
the plurality of PLE branches are updated via back propagation based on the loss target while parameters of the encoder are fixed after updating the encoder.
30. A system for field extraction from a form having unlabeled data via a field extraction model, the system comprising:
a data interface that receives an unlabeled form including a plurality of fields and a plurality of field values;
a memory storing instructions for execution by the plurality of processors; and
a processor that executes instructions that the processor executes to perform operations comprising:
detecting a first word and a first position of the first word within the unlabeled form;
encoding, by an encoder, the pair of the first word and the first position into a first representation;
generating a plurality of predictive labels based on the first representation, respectively, in parallel through a plurality of progressive label set (PLE) branches;
at one PLE branch, calculating a loss component by comparing a prediction tag at the one PLE branch with a prediction tag from a previous PLE branch as a pseudo tag;
Calculating a loss target as a sum of loss components on a plurality of PLE branches; and
the plurality of PLE branches is updated via back propagation based on the loss target.
31. The system of claim 30, wherein each PLE branch of the plurality of PLE branches includes a respective classifier that generates a respective predictive label based on the first representation.
32. The system of claim 30, wherein the predictive tag at the one PLE branch is generated by:
projecting the first representation into a set of field prediction scores via one or more fully connected layers; and
the predictive tag is generated based on a maximum field predictive score in the word set.
33. The system of claim 32, wherein the operations further comprise:
when the maximum field prediction score is greater than a predefined threshold, for a field of the plurality of fields, a word corresponding to the maximum field prediction score is selected from the set of words.
34. The system of claim 30, wherein the operations further comprise:
detecting, by a processor, a set of words and a set of locations of the set of words within the untagged form;
identifying field values for fields from the set of words and the set of locations based at least in part on geometric relationships between the set of words;
Generating, by a classifier, a field classification distribution from the first representation; and
a first penalty objective is calculated by comparing the field classification distribution with the field value as a pseudo tag.
35. The system of claim 34, wherein a first PLE branch from the plurality of PLE branches uses an identified field value of the field as a first pseudo tag.
36. The system of claim 34, wherein the operations further comprise:
calculating a joint loss target by adding the loss target to the first loss target; and
the encoder and the plurality of PLE branches are jointly updated via back propagation based on the joint loss objective.
37. The system of claim 36, wherein the operations further comprise:
the encoder is updated via back propagation based on the first loss target.
38. The system of claim 37, wherein the operations further comprise:
the plurality of PLE branches are updated via back propagation based on the loss target while parameters of the encoder are fixed after updating the encoder.
39. A non-transitory storage processor-readable medium storing processor-executable instructions for field extraction from a form having unlabeled data by a field extraction model, the instructions being executable by a processor to perform operations comprising:
Receiving, via the data interface, an unlabeled form including a plurality of fields and a plurality of field values;
detecting, by a processor, a first word and a first position of the first word within the unlabeled form;
encoding, by an encoder, the pair of the first word and the first position into a first representation;
generating a plurality of predictive labels based on the first representation, respectively, in parallel through a plurality of progressive label set (PLE) branches;
at one PLE branch, calculating a loss component by comparing a prediction tag at the one PLE branch with a prediction tag from a previous PLE branch as a pseudo tag;
calculating a loss target as a sum of loss components on a plurality of PLE branches; and
the plurality of PLE branches is updated via back propagation based on the loss target.
40. The non-transitory storage processor-readable medium of claim 39, wherein each PLE branch of the plurality of PLE branches includes a respective classifier that generates a respective prediction tag based on the first representation, and
wherein the predictive tag at the one PLE branch is generated by:
projecting the first representation into a set of field prediction scores via one or more fully connected layers; and
The predictive tag is generated based on a maximum field predictive score in the word set.
CN202280036060.1A 2021-05-17 2022-01-27 System and method for extracting fields from unlabeled data Pending CN117396899A (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US202163189579P 2021-05-17 2021-05-17
US63/189,579 2021-05-17
US17/484,618 2021-09-24
US17/484,623 2021-09-24
US17/484,623 US20220366317A1 (en) 2021-05-17 2021-09-24 Systems and methods for field extraction from unlabeled data
US17/484,618 US12086698B2 (en) 2021-05-17 2021-09-24 Systems and methods for field extraction from unlabeled data
PCT/US2022/014013 WO2022245407A1 (en) 2021-05-17 2022-01-27 Systems and methods for field extraction from unlabeled data

Publications (1)

Publication Number Publication Date
CN117396899A true CN117396899A (en) 2024-01-12

Family

ID=89473672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280036060.1A Pending CN117396899A (en) 2021-05-17 2022-01-27 System and method for extracting fields from unlabeled data

Country Status (3)

Country Link
EP (1) EP4341872A1 (en)
JP (1) JP2024522063A (en)
CN (1) CN117396899A (en)

Also Published As

Publication number Publication date
JP2024522063A (en) 2024-06-11
EP4341872A1 (en) 2024-03-27

Similar Documents

Publication Publication Date Title
CN113822494B (en) Risk prediction method, device, equipment and storage medium
CN110532542B (en) Invoice false invoice identification method and system based on positive case and unmarked learning
CN109583468B (en) Training sample acquisition method, sample prediction method and corresponding device
Xie et al. Detecting duplicate bug reports with convolutional neural networks
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
US11853706B2 (en) Generative language model for few-shot aspect-based sentiment analysis
US11900250B2 (en) Deep learning model for learning program embeddings
US20230154213A1 (en) Systems and methods for open vocabulary object detection
US20240119743A1 (en) Pre-training for scene text detection
CN115565038A (en) Content audit, content audit model training method and related device
WO2021160822A1 (en) A method for linking a cve with at least one synthetic cpe
JP2019212115A (en) Inspection device, inspection method, program, and learning device
CN114662477A (en) Stop word list generating method and device based on traditional Chinese medicine conversation and storage medium
CN111898528B (en) Data processing method, device, computer readable medium and electronic equipment
CN117009516A (en) Converter station fault strategy model training method, pushing method and device
CN115936003A (en) Software function point duplicate checking method, device, equipment and medium based on neural network
CN117396899A (en) System and method for extracting fields from unlabeled data
US12086698B2 (en) Systems and methods for field extraction from unlabeled data
CN114036289A (en) Intention identification method, device, equipment and medium
CN113836297A (en) Training method and device for text emotion analysis model
WO2022245407A1 (en) Systems and methods for field extraction from unlabeled data
Yang et al. Study of agricultural finance policy information extraction based on ELECTRA-BiLSTM-CRF
CN116863313B (en) Target re-identification method and system based on label increment refining and symmetrical scoring
CN118052205B (en) Enterprise evaluation report generation method and device based on technical information data
Jony et al. Domain specific fine tuning of pre-trained language model in NLP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination