CN116152843A - Category identification method, device and storage medium for contract template to be filled-in content - Google Patents

Category identification method, device and storage medium for contract template to be filled-in content Download PDF

Info

Publication number
CN116152843A
CN116152843A CN202211464102.XA CN202211464102A CN116152843A CN 116152843 A CN116152843 A CN 116152843A CN 202211464102 A CN202211464102 A CN 202211464102A CN 116152843 A CN116152843 A CN 116152843A
Authority
CN
China
Prior art keywords
filled
contract
content
category
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211464102.XA
Other languages
Chinese (zh)
Other versions
CN116152843B (en
Inventor
顾敏
杜向阳
王丽颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Aegis Information Technology Co ltd
Original Assignee
Nanjing Aegis Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Aegis Information Technology Co ltd filed Critical Nanjing Aegis Information Technology Co ltd
Priority to CN202211464102.XA priority Critical patent/CN116152843B/en
Publication of CN116152843A publication Critical patent/CN116152843A/en
Application granted granted Critical
Publication of CN116152843B publication Critical patent/CN116152843B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Computer Graphics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a category identification method, a device and a storage medium of contents to be filled in a contract template, wherein the method comprises the following steps: acquiring massive contract data; determining labeling rules of contract behavior labels and category labels of contents to be filled according to the contract behavior knowledge graph, and labeling data sets of the contents to be filled in the contract according to the labeling rules to obtain labeled data sets; training by using the contract behavior knowledge graph, the marked data set and a preset neural network model to obtain a category identification model of the content to be filled; and acquiring a target contract template, identifying the target contract template by using the category identification model of the content to be filled, and outputting an identification result corresponding to each content to be filled in the target contract template, wherein the identification result comprises a category label and a position. By the technical scheme, contract generation efficiency is improved, and pressure of contract editing personnel is relieved.

Description

Category identification method, device and storage medium for contract template to be filled-in content
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for identifying a category of a content to be filled in by a contract template, and a storage medium.
Background
With the continuous rise of economic level, the number of enterprise contracts is increasing, and the demands for efficient and reliable contract parsing, generation and auditing are becoming urgent. For the contract generation task, since the business related business is relatively fixed, a new contract can be generated by operating a fixed contract template so as to meet different business scene requirements. Contract editors in enterprises can fill in the content of the appointed position in the contract template through various terminal devices to generate a complete contract. For example: in the buy and sell contract template, "buyer: _____ statutory representative: _____ "first" _____ "fills in" buyer name "text and second" _____ "fills in" buyer statutory representative "text. In the process of making the contract, the content of the designated position of the contract template is changed according to the actual situation, so that one contract template can adapt to the requirements of different scenes. In the contract generation process, the appointed position of the contract template and the attribute label of the content to be filled are automatically identified, automatic analysis and archiving management of the contract can be assisted, and the contract generation efficiency is effectively improved.
Currently, the determination of the specified location of the contract template depends on manual operation by contract editors. The contract editor needs to find the appointed position to be filled in the contract template, marks the appointed position through the predefined placeholder, and then adds the information to be filled in to the appointed position. The method only supports a single contract template position form, and cannot automatically give the content types required to be filled in at each designated position. The type of the content to be filled refers to semantic attributes of the content to be filled in at a specified position of the contract template, for example: in the buy and sell contract template, "buyer: _____ "_____" identifies the "buyer name" type.
In the type recognition of the content to be filled in of the contract template, three problems are faced: (1) manual editing of template files is inefficient; (2) Placeholders of specified positions in the contract template are diversified, and common places include __, blank spaces and the like; (3) Ambiguity is easily generated between attribute tags of the content to be filled, for example: the first party hires the second party to work as a school (17_working post), the contract term is the year (16_contract effective period), and the positions of the contract start date and the contract end date are easily confused from the year (14_contract start date) to the month (15_contract end date).
Disclosure of Invention
In order to overcome the problems in the related art, the invention provides a category identification method, a category identification device and a storage medium for contents to be filled in a contract template, which can meet the requirements of different contract types, automatically identify the positions of the contents to be filled in and attribute labels of the texts to be filled in, provide support for legal text understanding and generating tasks such as contract examination, contract automatic generation and the like, improve the generation efficiency of contracts and lighten the pressure of contract editors.
According to a first aspect of an embodiment of the present invention, there is provided a method for identifying a category of contents to be filled in a contract template, the method including:
acquiring massive contract data;
determining labeling rules of contract behavior labels and category labels of contents to be filled according to the contract behavior knowledge graph, and labeling data sets of the contents to be filled in the contract according to the labeling rules to obtain labeled data sets;
training by using the contract behavior knowledge graph, the marked data set and a preset neural network model to obtain a category identification model of the content to be filled;
and acquiring a target contract template, identifying the target contract template by using the category identification model of the content to be filled, and outputting an identification result corresponding to each content to be filled in the target contract template, wherein the identification result comprises a category label and a position.
In one embodiment, preferably, determining labeling rules of the contract behavior label and the category label of the content to be filled according to the contract behavior knowledge graph includes:
determining a position criterion of a plurality of contents to be filled according to the massive contract data and aiming at the associated problems of the contents to be filled;
determining category labels corresponding to the contents to be filled in according to the existing element table and clause table aiming at different contract types;
and determining a contract behavior map according to the existing element table and the clause table, and classifying the category of the content to be filled under the corresponding contract behavior label to obtain a labeling rule of the contract behavior label and the category label of the content to be filled.
In one embodiment, preferably, the identifying the target contract template by using the category identification model of the content to be filled out, and outputting the category corresponding to each content to be filled out in the target contract template includes:
dividing the target contract template into a plurality of paragraphs with paragraph identifications, wherein each paragraph is divided into a plurality of sentences;
for each sentence, determining a corresponding sentence vector representation using a pre-trained BERT model;
inputting each sentence vector representation to a sentence level Bi-LSTM, and splicing the output sentence vector representations to obtain sentence level vector representations;
for each paragraph, determining a paragraph vector representation for the paragraph using a pre-trained BERT model;
inputting each paragraph vector representation to a paragraph level Bi-LSTM, and splicing the output paragraph vector representations to obtain a document level vector representation;
for each character in each sentence, fusing according to the sentence-level vector representation and the document-level vector representation through a gating mechanism to obtain a final sentence vector representation;
performing contract behavior classification according to the final sentence vector representation to determine a contract behavior label corresponding to each sentence;
and taking the contractual behavior label corresponding to each sentence as prior information, and determining the position and the category label of the content to be filled corresponding to each sentence by adopting a reading and understanding model.
In one embodiment, preferably, determining the location and category label of the content to be filled corresponding to each sentence by using a reading understanding model includes:
taking a question formed by the contract behavior label and the category label of the content to be filled as a question of the reading understanding model, encoding by a pre-trained BERT model, and inputting the question into the reading understanding model so as to output the position and the category label of the content to be filled.
In one embodiment, preferably, the contractual behavior includes a collection behavior, a payment behavior, and the like.
In one embodiment, preferably, the method further comprises:
after determining the identification result corresponding to each content to be filled in the target contract template, predicting the buyer and the seller, and correcting the identification result corresponding to each content to be filled in according to the predicted result of the buyer and the seller.
According to a second aspect of the embodiment of the present invention, there is provided a category identifying device for contents to be filled in of a contract template, the device comprising:
the acquisition module is used for acquiring massive contract data;
the labeling module is used for determining labeling rules of contract behavior labels and category labels of the contents to be filled according to the contract behavior knowledge graph so as to label the data sets of the contents to be filled in the contract according to the labeling rules and obtain labeled data sets;
the training module is used for training by utilizing the contract behavior knowledge graph, the marked data set and the preset neural network model to obtain a category identification model of the content to be filled;
the identification module is used for acquiring a target contract template, identifying the target contract template by using the category identification model of the content to be filled, and outputting an identification result corresponding to each content to be filled in the target contract template, wherein the identification result comprises a category label and a position.
In one embodiment, preferably, the labeling module is configured to:
determining a position criterion of a plurality of contents to be filled according to the massive contract data and aiming at the associated problems of the contents to be filled;
determining category labels corresponding to the contents to be filled in according to the existing element table and clause table aiming at different contract types;
and determining a contract behavior map according to the existing element table and the clause table, and classifying the category of the content to be filled under the corresponding contract behavior label to obtain a labeling rule of the contract behavior label and the category label of the content to be filled.
In one embodiment, preferably, the identification module is configured to:
dividing the target contract template into a plurality of paragraphs with paragraph identifications, wherein each paragraph is divided into a plurality of sentences;
for each sentence, determining a corresponding sentence vector representation using a pre-trained BERT model;
inputting each sentence vector representation to a sentence level Bi-LSTM, and splicing the output sentence vector representations to obtain sentence level vector representations;
for each paragraph, determining a corresponding paragraph vector representation using a pre-trained BERT model;
inputting each paragraph vector representation to a paragraph level Bi-LSTM, and splicing the output paragraph vector representations to obtain a document level vector representation;
for each character in each sentence, fusing according to the sentence-level vector representation and the document-level vector representation through a gating mechanism to obtain a final sentence vector representation;
performing contract behavior classification according to the final sentence vector representation to determine a contract behavior label corresponding to each sentence;
and taking the contractual behavior label corresponding to each sentence as prior information, and determining the position and the category label of the content to be filled corresponding to each sentence by adopting a reading and understanding model.
In one embodiment, preferably, determining the location and category label of the content to be filled corresponding to each sentence by using a reading understanding model includes:
taking a question formed by the contract behavior label and the category label of the content to be filled as a question of the reading understanding model, encoding by a pre-trained BERT model, and inputting the question into the reading understanding model so as to output the position and the category label of the content to be filled.
In one embodiment, preferably, the contractual activity includes a collection activity, a payment activity, and the like.
In one embodiment, preferably, the apparatus further comprises:
the determining module is used for determining whether the target contract template comprises the contents of the buyer and the seller after determining the identification result corresponding to each to-be-filled content in the target contract template;
and the prediction module is used for predicting the buyers and sellers when the target contract template comprises the contents of the buyers and sellers, and correcting the identification result corresponding to each content to be filled according to the prediction result of the buyers and sellers.
According to a third aspect of the embodiment of the present invention, there is provided a category identifying device for contents to be filled in of a contract template, the device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring massive contract data;
determining labeling rules of contract behavior labels and category labels of contents to be filled according to the contract behavior knowledge graph, and labeling data sets of the contents to be filled in the contract according to the labeling rules to obtain labeled data sets;
training by using the contract behavior knowledge graph, the marked data set and a preset neural network model to obtain a category identification model of the content to be filled;
and acquiring a target contract template, identifying the target contract template by using the category identification model of the content to be filled, and outputting an identification result corresponding to each content to be filled in the target contract template, wherein the identification result comprises a category label and a position.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method as in any of the embodiments of the second aspect.
The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects:
the invention provides a category identification method of contents to be filled in a contract template, which realizes automatic identification of categories of the contents to be filled in the contract template through a serialization labeling model and can identify various forms of contents to be filled in. Aiming at the ambiguity problem between the template labels, and combining expert knowledge of the contract behavior knowledge graph, a category recognition method of the content to be filled in of the contract template constrained by the contract behavior knowledge is provided, and the contract behavior category is used as external knowledge to guide the model to predict the category labels. Meanwhile, in order to utilize more context information at the document level, a context coding method is introduced to capture the context information and optimize the behavior recognition effect. Therefore, automatic analysis and archiving management of the contract can be assisted, and the contract generation efficiency is effectively improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a flow chart illustrating a method of identifying categories of content to be filled in by a contract template, according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a method for identifying categories of contents to be filled in by a contract template according to an exemplary embodiment, step S102.
Fig. 3 is a graph showing contractual behavior and category labels for content to be filled in, according to an exemplary embodiment.
Fig. 4 is a flowchart showing a method for identifying categories of contents to be filled in by a contract template in accordance with an exemplary embodiment at step S104.
Fig. 5 is a specific flowchart illustrating a method for identifying categories of content to be filled in by a contract template according to an exemplary embodiment.
Fig. 6 is a block diagram illustrating a category identification device for content to be filled in by a contract template according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
The BERT model is a pre-trained language model containing character-level, sentence-level features, and to capture context information, BERT uses a bi-directional transducer as an encoder to model text through a mechanism of attention. The input of the BERT model is to splice word embedding, position embedding and sentence embedding, input the character extraction in the stacked Transformer model, and further obtain an output sequence vector as character embedding. The BERT model structure is shown in fig. 5.
Wherein, [ CLS ]]And [ SEP ]]Marker representing BERT vs. sequence [ CLS ]]Identify the sequence start position, [ SEP ]]Identifying inter-sentence segmentation; x is x i Representing each character.
Fig. 1 is a flow chart illustrating a method of identifying categories of content to be filled in by a contract template, according to an exemplary embodiment.
As shown in fig. 1, according to a first aspect of an embodiment of the present invention, there is provided a category identification method of contents to be filled in a contract template, the method including:
step S101, acquiring massive contract data;
step S102, determining labeling rules of contract behavior labels and category labels of contents to be filled according to contract behavior knowledge maps, and labeling data sets of the contents to be filled in a contract according to the labeling rules to obtain labeled data sets;
step S103, training by using the contract behavior knowledge graph, the marked data set and a preset neural network model to obtain a category identification model of the content to be filled;
step S104, a target contract template is obtained, the target contract template is identified by using the category identification model of the content to be filled, and an identification result corresponding to each content to be filled in the target contract template is output, wherein the identification result comprises a category label and a position.
The purpose of the content to be filled is to identify the range and the corresponding attribute category of the content to be filled, which is similar to the NER task, but the label prediction of the NER task can utilize the information of the NER phrase acquired currently, and the content to be filled cannot utilize the information of the space field, so that more information for utilizing the context is needed for prediction. Aiming at the problem that ambiguity is easy to occur between attribute labels of contents to be filled, a contract template slot recognition method of contract behavior knowledge constraint is provided, and the contract behavior type is used as external knowledge to guide a model to predict the content type labels to be filled.
Fig. 2 is a flowchart illustrating a method for identifying categories of contents to be filled in by a contract template according to an exemplary embodiment, step S102.
As shown in fig. 2, in one embodiment, preferably, step S102 includes:
step S201, determining a position criterion of a plurality of contents to be filled according to the massive contract data and aiming at the associated problems of the contents to be filled;
step S202, determining category labels corresponding to various contents to be filled in according to the existing element table and clause table and aiming at different contract types;
step S203, determining a contract behavior map according to the existing element table and the clause table, and classifying the category of the content to be filled under the corresponding contract behavior label to obtain a labeling rule of the contract behavior label and the category label of the content to be filled.
Fig. 3 is a relationship diagram of contractual behavior and category labels of content to be filled, the contractual behavior referring to a generic term of double right obligations in a contract, for example: the "principal", "one-time payment date" belongs to payment behavior, and the "account name", "account opening line" belongs to collection behavior. Wherein, since the payment behavior may lead to the occurrence of an offending situation, the offending-related category belongs to the payment behavior.
As shown in fig. 4 and 5, in one embodiment, preferably, step S104 includes:
step S401, dividing the target contract template into a plurality of paragraphs with paragraph identification, wherein each paragraph is divided into a plurality of sentences;
step S402, for each sentence, determining a corresponding sentence vector representation by adopting a pre-training BERT model;
step S403, inputting each sentence vector representation to the sentence level Bi-LSTM, and splicing the output sentence vector representations to obtain the sentence level vector representation;
step S404, for each paragraph, determining a paragraph vector representation for each paragraph using a pre-trained BERT model;
step S405, inputting each paragraph vector representation to a paragraph level Bi-LSTM, and splicing the output paragraph vector representations to obtain a document level vector representation;
the character sequences are encoded using a BERT (Bidirectional Encoder Representation from Transformers, BERT) model, generating a high-dimensional dense vector representation.
For a sentence:
Xk={x 1 ,x 2 ,......,x n }
the vector representation is obtained based on a pre-training model BERT.
{e 1 ,e 2 ,......,e n }=BERT(X k )
Dividing the contract into paragraph levels by paragraph identification, coding sentences sentence by sentence, inputting the sentence into a sentence level Bi-LSTM, and splicing the obtained codes to obtain the representation of k sentences at the sentence level
Figure SMS_1
Figure SMS_2
And encoding the k sentences together, and then simultaneously inputting the k sentences into a paragraph level Bi-LSTM to obtain the representation of the k sentences at the document level.
Figure SMS_3
Step S406, for each character in each sentence, fusing according to the sentence-level vector representation and the document-level vector representation by a gating mechanism to obtain a final sentence vector representation;
fusing the ith character in the jth sentence through a gating mechanism to obtain a final sentence representation
Figure SMS_4
And classifying.
Figure SMS_5
Figure SMS_6
/>
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_7
for the gating cell output, sigmoid is the selected activation function, +.>
Figure SMS_8
For the representation at the sentence level,
Figure SMS_9
is a representation of the document level.
Step S407, performing contract behavior classification according to the final sentence vector representation to determine a contract behavior label corresponding to each sentence;
in step S408, the contractual behavior label corresponding to each sentence is used as prior information, and the reading understanding model MRC (MachineReading Comprehension) is used to determine the position and the category label of the content to be filled corresponding to each sentence.
As shown in fig. 5, in one embodiment, determining the location and category label of the content to be filled corresponding to each sentence using a reading understanding model preferably includes:
taking a question formed by the contract behavior label and the category label of the content to be filled as a question of the reading understanding model, encoding by a pre-trained BERT model, and inputting the question into the reading understanding model so as to output the position and the category label of the content to be filled.
For example, for a sentence:
X k ={x 1 ,x 2 ,......,x n }
assume that its belonging behavior tag is
Figure SMS_10
Combining the behavior tag and the slot tag into a query, and splicing the query with sentences to obtain X' k Input to the BERT model for encoding.
For example: the second party pays the price of the contract to the first party after signing the contract corresponding to the year month day, and counts the RMB (capital) _ ("the element). What are the "payment actions" their corresponding query { what are the contract's total price lowercase for payment of the type
The vector representation is obtained based on a pre-training model BERT.
{e′ 1 ,e′ 2 ,......,e′ n }=BERT(X′ k )
Calculating the probability of each word as a start and end position:
P s =softmax(W s X′+b s )
P e =softmax(W e X′+b e )
training is performed using cross entropy loss.
Figure SMS_11
In one embodiment, preferably, the method further comprises:
after determining the identification result corresponding to each content to be filled in the target contract template, predicting the buyer and the seller, and correcting the identification result corresponding to each content to be filled in according to the predicted result of the buyer and the seller.
Since the buyer-seller prediction is performed using a separate buyer-seller identification model for confusable category labels, such as "buyer name", "seller name", the identification result is checked.
Fig. 6 is a block diagram illustrating a category identification device for content to be filled in by a contract template according to an exemplary embodiment.
As shown in fig. 6, according to a second aspect of the embodiment of the present invention, there is provided a category identifying device for contents to be filled in of a contract template, the device comprising:
an acquisition module 61, configured to acquire massive contract data;
the labeling module 62 is configured to determine labeling rules of the contract behavior label and the category label of the content to be filled according to the contract behavior knowledge graph, so as to label the data set of the content to be filled in the contract according to the labeling rules, and obtain a labeled data set;
the training module 63 is configured to train by using the contract behavior knowledge graph, the labeled data set and a preset neural network model to obtain a category identification model of the content to be filled;
the identifying module 64 is configured to obtain a target contract template, identify the target contract template using the category identifying model of the content to be filled, and output an identifying result corresponding to each content to be filled in the target contract template, where the identifying result includes a category tag and a location.
In one embodiment, the labeling module 52 is preferably configured to:
determining a position criterion of a plurality of contents to be filled according to the massive contract data and aiming at the associated problems of the contents to be filled;
determining category labels corresponding to the contents to be filled in according to the existing element table and clause table aiming at different contract types;
and determining a contract behavior map according to the existing element table and the clause table, and classifying the category of the content to be filled under the corresponding contract behavior label to obtain a labeling rule of the contract behavior label and the category label of the content to be filled.
In one embodiment, the identification module 54 is preferably configured to:
dividing the target contract template into a plurality of paragraphs with paragraph identifications, wherein each paragraph is divided into a plurality of sentences;
for each sentence, determining a corresponding sentence vector representation using a pre-trained BERT model;
inputting each sentence vector representation to a sentence level Bi-LSTM, and splicing the output sentence vector representations to obtain sentence level vector representations;
for each paragraph, determining a paragraph vector representation for the paragraph using a pre-trained BERT model;
inputting each paragraph vector representation to a paragraph level Bi-LSTM, and splicing the output paragraph vector representations to obtain a document level vector representation;
for each character in each sentence, fusing according to the sentence-level vector representation and the document-level vector representation through a gating mechanism to obtain a final sentence vector representation;
performing contract behavior classification according to the final sentence vector representation to determine a contract behavior label corresponding to each sentence;
and taking the contractual behavior label corresponding to each sentence as prior information, and determining the position and the category label of the content to be filled corresponding to each sentence by adopting a reading and understanding model.
In one embodiment, preferably, determining the location and category label of the content to be filled corresponding to each sentence by using a reading understanding model includes:
taking a question formed by the contract behavior label and the category label of the content to be filled as a question of the reading understanding model, encoding by a pre-trained BERT model, and inputting the question into the reading understanding model so as to output the position and the category label of the content to be filled.
In one embodiment, preferably, the contractual activity includes a collection activity, a payment activity, and the like.
In one embodiment, preferably, the apparatus further comprises:
the determining module is used for determining whether the target contract template comprises the contents of the buyer and the seller after determining the identification result corresponding to each to-be-filled content in the target contract template;
and the prediction module is used for predicting the buyers and sellers when the target contract template comprises the contents of the buyers and sellers, and correcting the identification result corresponding to each content to be filled according to the prediction result of the buyers and sellers.
According to a third aspect of embodiments of the present invention, there is provided a legal document named entity recognition device, the device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring massive contract data;
determining labeling rules of contract behavior labels and category labels of contents to be filled according to the contract behavior knowledge graph, and labeling data sets of the contents to be filled in the contract according to the labeling rules to obtain labeled data sets;
training by using the contract behavior knowledge graph, the marked data set and a preset neural network model to obtain a category identification model of the content to be filled;
and acquiring a target contract template, identifying the target contract template by using the category identification model of the content to be filled, and outputting an identification result corresponding to each content to be filled in the target contract template, wherein the identification result comprises a category label and a position.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method as in any of the embodiments of the second aspect.
It is further understood that the term "plurality" in this disclosure means two or more, and other adjectives are similar thereto. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It is further understood that the terms "first," "second," and the like are used to describe various information, but such information should not be limited to these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the expressions "first", "second", etc. may be used entirely interchangeably. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention.
It will further be appreciated that although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (9)

1. A method for identifying categories of contents to be filled in of a contract template, the method comprising:
acquiring massive contract data;
determining labeling rules of contract behavior labels and category labels of contents to be filled according to the contract behavior knowledge graph, and labeling data sets of the contents to be filled in the contract according to the labeling rules to obtain labeled data sets;
training by using the contract behavior knowledge graph, the marked data set and a preset neural network model to obtain a category identification model of the content to be filled;
and acquiring a target contract template, identifying the target contract template by using the category identification model of the content to be filled, and outputting an identification result corresponding to each content to be filled in the target contract template, wherein the identification result comprises a category label and a position.
2. The method of claim 1, wherein determining labeling rules for contractual behavior tags and category tags of content to be filled in based on contractual behavior knowledge patterns comprises:
determining a position criterion of a plurality of contents to be filled according to the massive contract data and aiming at the associated problems of the contents to be filled;
determining category labels corresponding to the contents to be filled in according to the existing element table and clause table aiming at different contract types;
and determining a contract behavior map according to the existing element table and the clause table, and classifying the category of the content to be filled under the corresponding contract behavior label to obtain a labeling rule of the contract behavior label and the category label of the content to be filled.
3. The method of claim 1, wherein identifying the target contract templates using the category identification model of the content to be filled out and outputting a category corresponding to each content to be filled out in the target contract templates comprises:
dividing the target contract template into a plurality of paragraphs with paragraph identifications, wherein each paragraph is divided into a plurality of sentences;
for each sentence, determining a corresponding sentence vector representation using a pre-trained BERT model;
inputting each sentence vector representation to a sentence level Bi-LSTM, and splicing the output sentence vector representations to obtain sentence level vector representations;
for each paragraph, determining a corresponding paragraph vector representation using a pre-trained BERT model;
inputting each paragraph vector representation to a paragraph level Bi-LSTM, and splicing the output paragraph vector representations to obtain a document level vector representation;
for each character in each sentence, fusing according to the sentence-level vector representation and the document-level vector representation through a gating mechanism to obtain a final sentence vector representation;
performing contract behavior classification according to the final sentence vector representation to determine a contract behavior label corresponding to each sentence;
and taking the contractual behavior label corresponding to each sentence as prior information, and determining the position and the category label of the content to be filled corresponding to each sentence by adopting a reading and understanding model.
4. A method according to claim 3, wherein determining the location and category labels of the content to be filled for each sentence using a reading understanding model comprises:
taking a question formed by the contract behavior label and the category label of the content to be filled as a question of the reading understanding model, encoding by a pre-trained BERT model, and inputting the question into the reading understanding model so as to output the position and the category label of the content to be filled.
5. The method of claim 1, wherein the contractual activity comprises a collection activity and a payment activity.
6. The method according to claim 1, wherein the method further comprises:
and predicting the role of the buying party after determining the identification result corresponding to each content to be filled in the target contract template, and correcting the identification result corresponding to each content to be filled in according to the prediction result of the role of the buying party.
7. A category identification device for content to be filled in of a contract template, the device comprising:
the acquisition module is used for acquiring massive contract data;
the labeling module is used for determining labeling rules of contract behavior labels and category labels of the contents to be filled according to the contract behavior knowledge graph so as to label the data sets of the contents to be filled in the contract according to the labeling rules and obtain labeled data sets;
the training module is used for training by utilizing the contract behavior knowledge graph, the marked data set and the preset neural network model to obtain a category identification model of the content to be filled;
the identification module is used for acquiring a target contract template, identifying the target contract template by using the category identification model of the content to be filled, and outputting an identification result corresponding to each content to be filled in the target contract template, wherein the identification result comprises a category label and a position.
8. A category identification device for content to be filled in of a contract template, the device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring massive contract data;
determining labeling rules of contract behavior labels and category labels of contents to be filled according to the contract behavior knowledge graph, and labeling data sets of the contents to be filled in the contract according to the labeling rules to obtain labeled data sets;
training by using the contract behavior knowledge graph, the marked data set and a preset neural network model to obtain a category identification model of the content to be filled;
and acquiring a target contract template, identifying the target contract template by using the category identification model of the content to be filled, and outputting an identification result corresponding to each content to be filled in the target contract template, wherein the identification result comprises a category label and a position.
9. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the steps of the method of any of claims 1-6.
CN202211464102.XA 2022-11-22 2022-11-22 Category identification method, device and storage medium for contract template to be filled-in content Active CN116152843B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211464102.XA CN116152843B (en) 2022-11-22 2022-11-22 Category identification method, device and storage medium for contract template to be filled-in content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211464102.XA CN116152843B (en) 2022-11-22 2022-11-22 Category identification method, device and storage medium for contract template to be filled-in content

Publications (2)

Publication Number Publication Date
CN116152843A true CN116152843A (en) 2023-05-23
CN116152843B CN116152843B (en) 2024-01-12

Family

ID=86360831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211464102.XA Active CN116152843B (en) 2022-11-22 2022-11-22 Category identification method, device and storage medium for contract template to be filled-in content

Country Status (1)

Country Link
CN (1) CN116152843B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933757A (en) * 2023-09-15 2023-10-24 京华信息科技股份有限公司 Document generation method and system applying language artificial intelligence
CN117057325A (en) * 2023-10-13 2023-11-14 湖北华中电力科技开发有限责任公司 Form filling method and system applied to power grid field and electronic equipment
CN117172232A (en) * 2023-11-02 2023-12-05 深圳市迪博企业风险管理技术有限公司 Audit report generation method, audit report generation device, audit report generation equipment and audit report storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753099A (en) * 2020-06-28 2020-10-09 中国农业科学院农业信息研究所 Method and system for enhancing file entity association degree based on knowledge graph
CN112101028A (en) * 2020-08-17 2020-12-18 淮阴工学院 Multi-feature bidirectional gating field expert entity extraction method and system
CN113673943A (en) * 2021-07-19 2021-11-19 清华大学深圳国际研究生院 Personnel exemption aided decision making method and system based on historical big data
WO2022111548A1 (en) * 2020-11-26 2022-06-02 杭州睿胜软件有限公司 Contract review method and apparatus, and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753099A (en) * 2020-06-28 2020-10-09 中国农业科学院农业信息研究所 Method and system for enhancing file entity association degree based on knowledge graph
CN112101028A (en) * 2020-08-17 2020-12-18 淮阴工学院 Multi-feature bidirectional gating field expert entity extraction method and system
WO2022111548A1 (en) * 2020-11-26 2022-06-02 杭州睿胜软件有限公司 Contract review method and apparatus, and readable storage medium
CN113673943A (en) * 2021-07-19 2021-11-19 清华大学深圳国际研究生院 Personnel exemption aided decision making method and system based on historical big data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933757A (en) * 2023-09-15 2023-10-24 京华信息科技股份有限公司 Document generation method and system applying language artificial intelligence
CN116933757B (en) * 2023-09-15 2023-12-29 京华信息科技股份有限公司 Document generation method and system applying language artificial intelligence
CN117057325A (en) * 2023-10-13 2023-11-14 湖北华中电力科技开发有限责任公司 Form filling method and system applied to power grid field and electronic equipment
CN117057325B (en) * 2023-10-13 2024-01-05 湖北华中电力科技开发有限责任公司 Form filling method and system applied to power grid field and electronic equipment
CN117172232A (en) * 2023-11-02 2023-12-05 深圳市迪博企业风险管理技术有限公司 Audit report generation method, audit report generation device, audit report generation equipment and audit report storage medium
CN117172232B (en) * 2023-11-02 2024-01-26 深圳市迪博企业风险管理技术有限公司 Audit report generation method, audit report generation device, audit report generation equipment and audit report storage medium

Also Published As

Publication number Publication date
CN116152843B (en) 2024-01-12

Similar Documents

Publication Publication Date Title
CN116152843B (en) Category identification method, device and storage medium for contract template to be filled-in content
US11734328B2 (en) Artificial intelligence based corpus enrichment for knowledge population and query response
US11348352B2 (en) Contract lifecycle management
CN115310425B (en) Policy text analysis method based on policy text classification and key information identification
CN114168716B (en) Deep learning-based automatic engineering cost extraction and analysis method and device
CN109685056A (en) Obtain the method and device of document information
CN114580424B (en) Labeling method and device for named entity identification of legal document
CN112560491A (en) Information extraction method and device based on AI technology and storage medium
CN111814482B (en) Text key data extraction method and system and computer equipment
CN115917613A (en) Semantic representation of text in a document
CN112052305A (en) Information extraction method and device, computer equipment and readable storage medium
CN115455189A (en) Policy text classification method based on prompt learning
CN113220885B (en) Text processing method and system
CN116150613A (en) Information extraction model training method, information extraction method and device
CN115130437B (en) Intelligent document filling method and device and storage medium
CN112560419A (en) Automatic document generation method and system
CN115757325B (en) Intelligent conversion method and system for XES log
CN115759078A (en) Text information processing method, system, equipment and storage medium
CN114356924A (en) Method and apparatus for extracting data from structured documents
CN111488737A (en) Text recognition method, device and equipment
CN116561348B (en) Method and system for extracting and processing information of increase and decrease of stakeholders
Gholizadeh et al. Automated Assessment of Capital Allowances
CN117436420A (en) Method and device for generating business process model based on natural language processing
Cao et al. Large Language Model in Financial Regulatory Interpretation
KR20210063133A (en) Apparatus and method for supporting review of contract document in shipbuilding and marine sector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant