CN115630843A - Contract clause automatic checking method and system - Google Patents

Contract clause automatic checking method and system Download PDF

Info

Publication number
CN115630843A
CN115630843A CN202211357562.2A CN202211357562A CN115630843A CN 115630843 A CN115630843 A CN 115630843A CN 202211357562 A CN202211357562 A CN 202211357562A CN 115630843 A CN115630843 A CN 115630843A
Authority
CN
China
Prior art keywords
paragraphs
clauses
risk
clause
contract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211357562.2A
Other languages
Chinese (zh)
Inventor
张森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aspire Information Technologies Beijing Ltd
Original Assignee
Aspire Information Technologies Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aspire Information Technologies Beijing Ltd filed Critical Aspire Information Technologies Beijing Ltd
Priority to CN202211357562.2A priority Critical patent/CN115630843A/en
Publication of CN115630843A publication Critical patent/CN115630843A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents

Abstract

The invention discloses a method and a system for automatically checking contract clauses, wherein the method comprises the following steps: acquiring a contract text; splitting the contract text to obtain all clauses and/or paragraphs; classifying the clauses and/or paragraphs to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels; and if the probability of the risk category label is greater than a set probability threshold, determining that the clauses and/or the paragraphs are at risk, and taking the clauses and/or the paragraphs at risk as classified risk clauses. By using the scheme of the invention, the efficiency and the accuracy of contract clause examination can be effectively improved.

Description

Contract clause automatic checking method and system
Technical Field
The invention relates to the technical field of information processing, in particular to a method and a system for automatically auditing contract terms.
Background
In modern enterprise management, a great deal of business activities exist, and the business activities are smoothly developed, so that the two transaction parties need to establish corresponding contracts, namely contract documents. Whether the quality of the contract is good or bad, whether the right and the obligation contract of the two parties are reasonable or not and whether the contract terms have the risk of legal compliance or not directly influence the success or failure of business activities and even enterprises. However, only a small number of legal staff often exist in an enterprise, and the professional levels of different legal staff are different, so that the problems of time consumption, low efficiency, complexity, unreliability and the like exist in manual examination of a large number of contracts. With the development of search engine technology and the improvement of contract management level of enterprises, some enterprises establish a risk clause library in a manual mode. And matching the contract text to be examined with a risk library through an information retrieval technology, and if corresponding risk terms are matched, considering that the contract text has corresponding risks. Through the relatively manual mode of information retrieval's mode, efficiency has been promoted greatly. However, the current mainstream information retrieval technology adopts the BM25 algorithm, which takes the contract terms to be retrieved and the risk term library as input, outputs a ranking result of the related risk terms in the risk library relative to the input terms, and ranks the top risk terms as the highest matching risk terms, and if the matching degree of the highest matching terms exceeds a certain threshold, the contract terms to be retrieved can be considered to have such a risk. The existing risk clause retrieval method mainly comprises the following steps: firstly, extracting a feature word set of a contract text clause to be retrieved by adopting a word segmentation algorithm; step two, a risk clause set to be sequenced is determined according to whether clauses in the risk library contain characteristic words of the characteristic word set; thirdly, calculating the relevancy score of each item in the item to be queried and the risk item set; and fourthly, giving out matched risk terms according to the grading sorting result and the defined threshold judgment condition. The risk clause retrieval based mode mainly gives a final score based on keywords in the text, the statistical weight of the keywords and the length combination of the text, and has good effect on scenes with low accuracy requirements. However, the contract clause examination scene has high requirements on accuracy, and the existing information retrieval mode cannot meet the requirements of the scene.
Disclosure of Invention
The invention provides a method and a system for automatically checking contract clauses, which are used for improving the efficiency and the accuracy of checking the contract clauses.
Therefore, the invention provides the following technical scheme:
a method for automatically reviewing contract terms, the method comprising:
acquiring a contract text;
splitting the contract text to obtain all clauses and/or paragraphs;
classifying the clauses and/or paragraphs by using a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels;
and if the probability of the risk category label is greater than a set probability threshold, determining that the clauses and/or the paragraphs are at risk, and taking the clauses and/or the paragraphs at risk as classified risk clauses.
Optionally, the splitting the contract text to obtain all terms and/or paragraphs includes:
splitting the contract text according to the internal logic structure of the contract text to obtain all clauses and/or paragraphs; or alternatively
And splitting the contract text according to the style structure of the contract text to obtain all clauses and/or paragraphs.
Optionally, the classifying the clauses and/or paragraphs by using the clause classification model, and obtaining the risk category labels corresponding to the clauses and/or paragraphs and the probabilities of the risk category labels includes:
classifying the clauses and/or paragraphs by using the clause classification model to obtain probability distribution of risk category labels corresponding to the clauses and/or paragraphs;
and selecting the maximum risk category label in the probability distribution of the risk labels as the risk category label of the clause and/or paragraph.
Optionally, the method further comprises constructing the clause classification model in the following manner:
collecting a risk clause text and a risk category label corresponding to the risk clause text;
processing the risk clause text and the risk category label corresponding to the risk clause text to obtain a training data set;
building a clause classification model structure, wherein a bert classification model is adopted by the clause classification model; the first layer of the model is a bert pre-training language model layer, the second layer of the model is a full-connection layer, and the third layer of the model is a softmax layer;
and training the bert classification model by using the training data set to obtain a clause classification model.
Optionally, the method further comprises: and if the probability of the classification label is less than or equal to the threshold corresponding to the classification label, determining whether the clause and/or the paragraph is a search risk clause or not through a search mode.
Optionally, the determining whether the term and/or paragraph is a search risk term by a search method includes:
taking the content of the clauses and/or paragraphs as input, and calling a retrieval engine to obtain a plurality of retrieval results and scores corresponding to the retrieval results;
and if the highest score corresponding to the retrieval result is greater than a set score threshold value, determining the terms and/or paragraphs as retrieval risk terms, and otherwise, determining the terms and/or paragraphs as risk-free terms.
Optionally, the score is a score calculated using the bm25 algorithm.
Optionally, the method further comprises: and aggregating the classified risk terms, the retrieval risk terms and the risk-free terms to obtain term examination results.
An automatic review system for contract terms, the system comprising:
the contract text acquisition module is used for acquiring a contract text;
the splitting module is used for splitting the contract text to obtain all clauses and/or paragraphs;
the classification module is used for classifying the clauses and/or the paragraphs by utilizing a clause classification model to obtain risk category labels corresponding to the clauses and/or the paragraphs and the probability of the risk category labels;
and the first judging module is used for determining that the clauses and/or the paragraphs have risks under the condition that the probability of the risk category label is greater than a set probability threshold, and taking the clauses and/or the paragraphs with risks as classified risk clauses.
Optionally, the splitting module is specifically configured to split the contract text according to an internal logic structure of the contract text to obtain all terms and/or paragraphs; or splitting the contract text according to the style structure of the contract text to obtain all clauses and/or paragraphs.
Optionally, the classification module comprises:
the classification unit is used for classifying the clauses and/or the paragraphs by using the clause classification model to obtain the probability distribution of the risk category labels corresponding to the clauses and/or the paragraphs;
and the selecting unit is used for selecting the maximum risk category label in the probability distribution of the risk labels as the risk category label of the clause and/or the paragraph.
Optionally, the system further comprises: a retrieval module; and the retrieval module is used for determining whether the clause and/or the paragraph is a retrieval risk clause or not through a retrieval mode under the condition that the probability of the classification label is less than or equal to the threshold corresponding to the classification label.
Optionally, the retrieving module comprises:
the calling module is used for taking the content of the clauses and/or the paragraphs as input and calling a clause retrieval engine to obtain a plurality of retrieval results and scores corresponding to the retrieval results;
and the second judging module is used for determining the terms and/or the paragraphs as search risk terms under the condition that the highest score corresponding to the search result is greater than a set score threshold, and otherwise determining the terms and/or the paragraphs as risk-free terms.
Optionally, the system further comprises: and the aggregation module is used for aggregating the classified risk clauses, the retrieval risk clauses and the risk-free clauses to obtain a clause examination result.
The contract clause automatic auditing method and the system classify the contract clauses and/or paragraphs by utilizing a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels; and judging whether the clauses and/or paragraphs are at risk according to the probability. Compared with the existing clause retrieval method, the model-based mode takes mass training data as modeling corpora, so that the clauses can be well expressed by semantic vectors, and the accuracy of the clause examination result is greatly improved.
Further, considering that the contract text may relate to different industries and different fields, and is essentially a long tail distribution data set, there is a problem that the training data distribution is extremely uneven, so that the examination result of the individual types of terms and/or paragraphs is not ideal. Therefore, after the clauses and/or paragraphs in the contract text are examined in a model-based mode, the retrieval mode is used as a supplement of the clause classification model, the problem of long-tail clauses in the classification model is better solved, and the adaptability of the scheme of the invention to different industries and fields is greatly improved.
Drawings
FIG. 1 is a flowchart of a method for automatically reviewing contract terms according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a training process of a clause classification model according to an embodiment of the present invention;
FIG. 3 is another flowchart of a method for automatically reviewing contract terms according to an embodiment of the present invention;
FIG. 4 is a flow chart of clause review by retrieval in accordance with an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an automatic contract term review system provided by an embodiment of the present invention;
fig. 6 is another schematic structural diagram of an automatic contract term auditing system according to an embodiment of the present invention.
Detailed Description
In order to make the technical field of the invention better understand the scheme of the embodiment of the invention, the embodiment of the invention is further described in detail with reference to the drawings and the implementation mode.
The contract terms are the manifestation and immobilization of the contract conditions and are the basis for determining the right and obligation of the contracting parties. That is, from the perspective of the legal instrument, the contents of the contract refer to the terms of the contract. Thus, the terms of the contract should be clear, positive, complete, and not contradictory. Otherwise, the contract establishment, the effectiveness and the fulfillment and the contract establishment purposes are influenced, so that the accurate understanding of the meaning of the terms plays an important role. The purpose of contract clause examination is to find the risk in the contract clauses, and by adopting a clause retrieval mode, the efficiency is improved during retrieval, the precious time of a law specialist is saved, and the problem of non-uniform standard is also solved by manually combing the risk clauses. However, a large number of legal experts are still required to participate in the process of constructing the risk clauses, and the accuracy of the examination result is still to be improved.
Therefore, the embodiment of the invention provides a method and a system for automatically auditing contract clauses, which classify the contract clauses and/or paragraphs by using a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels; and judging whether the clauses and/or paragraphs are at risk according to the probability.
Fig. 1 is a flowchart of an automatic review method for contract terms according to an embodiment of the present invention, where the embodiment includes the following steps:
step 101, acquiring a contract text.
In practical application, the contract text may be obtained according to a contract document to be checked uploaded by a user, and the format of the contract document may be a word document (doc \ docx format) or a pdf format (pdf text or pdf scanning piece), which is not limited in this embodiment of the present invention.
And analyzing the contract file to obtain a contract text, and specifically, if the contract file is in a word format, calling an open-source word file analyzing tool to obtain the corresponding text. If the contract document is the pdf text version, an open-source pdf analysis tool can be called; in the case of pdf scan, because each page of the scan is a picture, a corresponding Recognition tool with OCR (Optical Character Recognition) capability can be invoked to recognize the text in each picture, and finally obtain the entire contract text.
And 102, splitting the contract text to obtain all clauses and/or paragraphs.
One contract text can be divided into clauses according to the internal logic structure of the contract text. For example, the first part in the text of the contract is often the party of both parties of the contract, and the second part is the contract target and also contains the terms of default obligations, dispute resolution and the like.
In the embodiment of the invention, one splitting mode can be splitting according to the internal logic structure of the contract text to obtain the clause of the contract; another way of splitting may be to split the contract text into individual paragraphs according to the style structure of the contract text, such as line breaks. Based on the two splitting logics, clause and paragraph sets of the contract text can be obtained.
It should be noted that, in practical application, the contract text may be split in any one of the above manners, or the contract may be split in the above two manners at the same time, and the terms and paragraphs obtained by splitting are respectively examined.
103, classifying the clauses and/or paragraphs by using a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels.
The clause classification model may be obtained by acquiring a large amount of training data in advance, and a specific training process will be described in detail later.
When the clauses and/or paragraphs are classified by using the clause classification model, all clauses and/or paragraphs need to be subjected to classification label calculation sequentially through the clause classification model, the probability distribution of the risk class label corresponding to each clause and/or paragraph is output, and the maximum risk class label in the probability distribution of the risk label is selected as the risk class label of the clause and/or paragraph.
And 104, if the probability of the risk category label is greater than a set probability threshold, determining that the clauses and/or paragraphs are in risk, and taking the clauses and/or paragraphs with risk as classified risk clauses.
The probability threshold may be set according to the requirement of the accuracy of the examination result, for example, the probability threshold is set to 0.5. A clause or paragraph is considered to be at risk if it corresponds to a category label with a probability greater than 0.5.
In the embodiment of the invention, the clause classification model is constructed based on a bert pre-training language model, and the pre-training language model takes mass training data as modeling corpora, so that the clause classification model can well perform semantic vector representation.
As shown in fig. 2, it is a schematic diagram of a training process of a clause classification model in an embodiment of the present invention, including the following steps:
step 201, collecting a risk clause text and a risk category label corresponding to the risk clause text.
Step 202, processing the risk clause text and the risk category label corresponding to the risk clause text to obtain a training data set.
Specifically, the collected risk clause text and the risk category label corresponding to the text are preprocessed and converted into an input which can be received by the bert classification model, namely, a training set of the model.
And step 203, building a clause classification model structure, wherein the clause classification model adopts a bert classification model.
The first layer of the model is a bert pre-training language model layer, namely a semantic vector coding layer. The output of the coding layer contains a cls vector, which is a semantic representation of the entire text input.
The second layer of the model is a fully connected layer, receives cls vectors as input, and outputs the same number of dimension vectors as the classification types.
The third layer of the model is a softmax layer and is used for converting the value of the output vector of the second layer into 0-1, namely the classification probability distribution result of the clauses.
After the model structure is built, a loss function is specified according to the structure of the model. The loss function selected by the clause classification model is cross entropy loss.
And step 204, training the bert classification model by using the training data set to obtain a clause classification model.
Before the model is trained, an optimization algorithm is assigned to the model, and the adamw optimization algorithm can be selected for the clause classification. In the training process of the model, firstly, parameters of the model are initialized, then loss function values of the model are calculated according to data input in batches, parameters of the model are updated by adopting an adamw optimization algorithm, the process of calculating the loss function values and updating the parameters of the model can be continuously and iteratively executed until the loss function of the model reaches a set value, namely the model achieves a good classification effect, and the training of the model is stopped.
Through the steps, the clause classification model can be obtained.
The contract clause automatic auditing method provided by the invention classifies contract clauses and/or paragraphs by utilizing a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels; and judging whether the clauses and/or paragraphs are at risk according to the probability. Compared with the existing clause retrieval method, the model-based mode takes mass training data as modeling linguistic data, so that semantic vector representation can be well performed on the clauses, and the accuracy of the clause examination result is greatly improved.
Further, considering that the contract texts may relate to different industries and different fields, the contract texts are essentially a long-tailed distribution data set, that is, some types of clause texts have a proportion of actual business data of more than twenty percent, and some types of clause texts have a proportion of actual business data of less than one thousandth, which may result in extremely uneven distribution of training data, and even if some data enhancement methods are adopted for relieving, the examination results of individual types of clauses and/or paragraphs are not ideal. For this reason, in another non-limiting embodiment of the present invention, after the terms and/or paragraphs in the contract text are examined in a model-based manner, a retrieval manner is used as a supplement to the term classification model, so as to better handle the long-tailed term problem in the classification model, and improve the adaptability of the present invention to different industries and fields.
As shown in fig. 3, another flow chart of the method for automatically auditing contract terms according to the embodiment of the present invention includes the following steps:
step 301, acquiring a contract text.
Step 302, splitting the contract text to obtain all clauses and/or paragraphs.
Step 303, classifying the clauses and/or paragraphs by using a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels.
Steps 301 to 303 are the same as steps 101 to 103 in the embodiment shown in fig. 1, and will not be described in detail here.
104, judging that the probability of the risk category label is greater than a set probability threshold; if yes, go to step 305; otherwise, step 306 is performed.
Step 305, determining that the clauses and/or paragraphs are at risk, and using the at-risk clauses and/or paragraphs as classified risk clauses. Step 307 is then performed.
Step 306, determining whether the term and/or paragraph is a search risk term through a search mode.
Step 307, aggregating the classified risk clauses, the search risk clauses and the risk-free clauses to obtain a clause review result.
In this embodiment, for the clauses and/or paragraphs that cannot be judged by the classification model, whether the clauses and/or paragraphs are at risk is further judged by the clause search logic, and the clause search engine may be specifically used to complete the corresponding search. And finally, aggregating the risk clause results given by the classification model, so that the final clause examination result can be more accurate.
It should be noted that the term search engine may be some existing search engines with corresponding functions, and the embodiment of the present invention is not limited thereto.
In addition, an embodiment of the present invention also provides a clause search engine, where the clause search engine may include the following modules: the system comprises an information storage module, a text indexing module, a retrieval condition analysis module, a sequencing algorithm module, a vocabulary analyzer module and the like. Wherein:
the information storage module is mainly used for efficiently and reliably accessing information.
The text retrieval module is used for establishing an inverted index for the text, the inverted index refers to an index mapping relation from the vocabulary to the text object, and the text corresponding to the search word can be quickly positioned through the inverted index after the search word exists.
The analysis module of the search condition is used for analyzing the input query condition, and the input of the search condition is often the combination of AND or NOR Boolean logic, so the analysis module is needed to analyze the search condition, find out specific search terms and establish a connection with the pre-constructed inverted index.
The ranking algorithm module mainly calculates the relevance scores between the search terms and the texts, gives the relevance scores between the query conditions and each document through summation, and gives the ranking results according to the relevance scores in a ranking mode.
The vocabulary parser module is mainly used for extracting the characteristics of the text, and a word segmentation parser is commonly used for segmenting the text.
In addition, according to the clause retrieval requirement, clause data required to be input into the clause search engine is collected, and the collected clause data is preprocessed and converted into a format acceptable by the clause retrieval engine. And calling an information storage function of the retrieval engine to store the clause data into the corresponding storage table. The search engine can establish an inverted index for the clause data while storing the data, and provides possibility for quickly matching the vocabulary to the clause content subsequently.
In this embodiment of the present invention, the structure of the storage table may include three fields, which are respectively: the unique identification of the clause text, the content of the clause text and the label corresponding to the clause text. The unique identification of the clause text is a character string type, the content of the clause text is the character string type and a corresponding vocabulary parser needs to be configured, and the tag field corresponding to the clause text is the character string type.
As shown in fig. 4, it is a flowchart of performing clause review by a retrieval method in the embodiment of the present invention, and the flowchart includes the following steps:
step 401, using the content of the clause and/or paragraph as input, and invoking a search engine to obtain a plurality of search results and scores corresponding to the search results.
For example, for each input term or paragraph, the top 10 terms with the highest similarity score are found from the term library (i.e., the stored table). The similarity score here may be a score of bm25 algorithm.
Step 402, judging whether the highest score corresponding to the retrieval result is greater than a set score threshold value; if yes, go to step 403; otherwise, step 404 is performed.
The scoring threshold may be determined by a business expert based on actual business experience.
Step 403, determining the clauses and/or paragraphs as search risk clauses.
Step 404, determining the clauses and/or paragraphs as risk-free clauses.
According to the automatic contract clause auditing method provided by the embodiment of the invention, through the combination of classification and retrieval modes, high-precision clause risk identification can be realized from a semantic level, the identification effect of rare clauses can be improved through the retrieval mode, and the accuracy of the clause auditing result is integrally improved. Threshold judgment mechanisms are introduced into the classification model and the retrieval mode, and the classification model and the retrieval mode can be more flexibly adapted to various different service scenes.
Correspondingly, an embodiment of the present invention further provides an automatic contract term review system, and as shown in fig. 5, a schematic structural diagram of the automatic contract term review system provided in the embodiment of the present invention is shown.
In this embodiment, the system includes the following modules:
a contract text obtaining module 501, configured to obtain a contract text;
a splitting module 502, configured to split the contract text to obtain all terms and/or paragraphs;
a classification module 503, configured to classify the clauses and/or paragraphs by using a clause classification model, so as to obtain risk category labels corresponding to the clauses and/or paragraphs and probabilities of the risk category labels;
a first determining module 504, configured to determine that the clause and/or the paragraph are/is at risk when the probability of the risk category label is greater than a set probability threshold, and use the clause and/or the paragraph at risk as a classified risk clause.
The splitting module 502 may specifically split the contract text according to the inherent logic structure of the contract text to obtain all terms and/or paragraphs; or splitting the contract text according to the style structure of the contract text to obtain all clauses and/or paragraphs.
The classification module 503 may specifically include a classification unit and a selection unit. The classification unit is used for classifying the clauses and/or paragraphs by using the clause classification model to obtain the probability distribution of risk category labels corresponding to the clauses and/or paragraphs; the selecting unit is used for selecting the maximum risk category label in the probability distribution of the risk labels as the risk category label of the clause and/or paragraph.
The term classification model may be pre-established by a corresponding term classification model building module, and the training process of the model may refer to the description in the foregoing embodiment of the method of the present invention, which is not described herein again. The clause classification model building module may be a part of the system of the present invention, or may be independent of the system, and the embodiment of the present invention is not limited thereto.
Fig. 6 is a schematic structural diagram of an automatic contract term review system according to an embodiment of the present invention.
Unlike the embodiment shown in fig. 5, in this embodiment, the system further includes: a retrieval module 505 and an aggregation module 506. Wherein:
the retrieval module 505 is configured to determine whether the term and/or paragraph is a retrieval risk term by a retrieval manner if the probability of the category label is smaller than or equal to the threshold corresponding to the category label.
The aggregation module 506 is configured to aggregate the classified risk terms, the search risk terms determined by the search 505, and the risk-free terms, and obtain term review results.
In practical applications, the search module 505 may utilize an existing term search engine with corresponding functionality to determine whether the term and/or paragraph is a search risk term.
In addition, in another non-limiting embodiment of the present invention, the search module 505 may further utilize the clause search engine established in the previous embodiment of the method of the present invention to determine whether the clause and/or paragraph is a search risk clause.
Accordingly, the retrieving module 505 may include: the calling module and the second judging module, wherein: the calling module is used for taking the content of the clauses and/or the paragraphs as input and calling the clause retrieval engine to obtain a plurality of retrieval results and scores corresponding to the retrieval results; the second judging module is used for determining the clauses and/or the paragraphs as search risk clauses when the highest score corresponding to the search result is larger than a set score threshold value, and otherwise determining the clauses and/or the paragraphs as risk-free clauses.
The contract clause automatic auditing system provided by the invention classifies contract clauses and/or paragraphs by utilizing a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels; and judging whether the clauses and/or paragraphs are in risk according to the probability. Compared with the existing clause retrieval method, the model-based mode takes mass training data as modeling linguistic data, so that semantic vector representation can be well performed on the clauses, and the accuracy of the clause examination result is greatly improved. Furthermore, after the terms and/or paragraphs in the contract text are examined in a model-based mode, a retrieval mode is used as a supplement of the term classification model, the problem of long-tail terms in the classification model is better solved, and the adaptability of the system to different industries and fields is greatly improved.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Furthermore, the above-described system embodiments are merely illustrative, wherein modules and units illustrated as separate components may or may not be physically separate, i.e., may be located on one network element, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement without inventive effort.
The foregoing detailed description of the embodiments of the present invention has been presented for purposes of illustration and description, and is intended to be exhaustive or to limit the invention to the precise forms disclosed. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without making any creative effort shall fall within the protection scope of the present invention, and the content of the present specification shall not be construed as limiting the present invention. Therefore, any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An automatic review method for contract terms, characterized in that the method comprises:
acquiring a contract text;
splitting the contract text to obtain all clauses and/or paragraphs;
classifying the clauses and/or paragraphs by using a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels;
and if the probability of the risk category label is greater than a set probability threshold, determining that the clauses and/or the paragraphs are at risk, and taking the clauses and/or the paragraphs at risk as classified risk clauses.
2. The method of claim 1, wherein the splitting the contract text to obtain all terms and/or paragraphs comprises:
splitting the contract text according to the internal logic structure of the contract text to obtain all clauses and/or paragraphs; or
And splitting the contract text according to the style structure of the contract text to obtain all clauses and/or paragraphs.
3. The method of claim 1, wherein the classifying the terms and/or paragraphs using a term classification model to obtain risk category labels and probabilities of the risk category labels corresponding to the terms and/or paragraphs comprises:
classifying the clauses and/or paragraphs by using the clause classification model to obtain probability distribution of risk category labels corresponding to the clauses and/or paragraphs;
and selecting the maximum risk category label in the probability distribution of the risk labels as the risk category label of the clause and/or paragraph.
4. The method according to any one of claims 1 to 3, further comprising:
and if the probability of the classification label is less than or equal to the threshold corresponding to the classification label, determining whether the clause and/or the paragraph is a search risk clause or not through a search mode.
5. An automatic contract term review system, comprising:
the contract text acquisition module is used for acquiring a contract text;
the splitting module is used for splitting the contract text to obtain all clauses and/or paragraphs;
the classification module is used for classifying the clauses and/or paragraphs by utilizing a clause classification model to obtain risk category labels corresponding to the clauses and/or paragraphs and the probability of the risk category labels;
and the first judging module is used for determining that the clauses and/or the paragraphs have risks under the condition that the probability of the risk category label is greater than a set probability threshold, and taking the clauses and/or the paragraphs with risks as classified risk clauses.
6. The system of claim 5,
the splitting module is specifically configured to split the contract text according to the inherent logic structure of the contract text to obtain all terms and/or paragraphs; or splitting the contract text according to the style structure of the contract text to obtain all clauses and/or paragraphs.
7. The system of claim 5, wherein the classification module comprises:
the classification unit is used for classifying the clauses and/or the paragraphs by using the clause classification model to obtain the probability distribution of the risk category labels corresponding to the clauses and/or the paragraphs;
and the selecting unit is used for selecting the maximum risk category label in the probability distribution of the risk labels as the risk category label of the clause and/or the paragraph.
8. The system of any one of claims 5 to 7, further comprising: a retrieval module;
and the retrieval module is used for determining whether the clause and/or the paragraph is a retrieval risk clause or not through a retrieval mode under the condition that the probability of the classification label is less than or equal to the threshold corresponding to the classification label.
9. The system of claim 8, wherein the retrieval module comprises:
the calling module is used for taking the content of the clauses and/or the paragraphs as input and calling a clause retrieval engine to obtain a plurality of retrieval results and scores corresponding to the retrieval results;
and the second judging module is used for determining the terms and/or the paragraphs as search risk terms under the condition that the highest score corresponding to the search result is greater than a set score threshold, and otherwise determining the terms and/or the paragraphs as risk-free terms.
10. The system of claim 9, further comprising:
and the aggregation module is used for aggregating the classified risk terms, the retrieval risk terms and the risk-free terms to obtain term examination results.
CN202211357562.2A 2022-11-01 2022-11-01 Contract clause automatic checking method and system Pending CN115630843A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211357562.2A CN115630843A (en) 2022-11-01 2022-11-01 Contract clause automatic checking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211357562.2A CN115630843A (en) 2022-11-01 2022-11-01 Contract clause automatic checking method and system

Publications (1)

Publication Number Publication Date
CN115630843A true CN115630843A (en) 2023-01-20

Family

ID=84907851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211357562.2A Pending CN115630843A (en) 2022-11-01 2022-11-01 Contract clause automatic checking method and system

Country Status (1)

Country Link
CN (1) CN115630843A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663525A (en) * 2023-07-21 2023-08-29 科大讯飞股份有限公司 Document auditing method, device, equipment and storage medium
CN116976683A (en) * 2023-09-25 2023-10-31 江铃汽车股份有限公司 Automatic auditing method, system, storage medium and device for contract clauses
CN117132244A (en) * 2023-10-26 2023-11-28 国网浙江省电力有限公司 Classification processing method, device and storage medium for intelligent compliance management system
CN117151096A (en) * 2023-09-05 2023-12-01 江苏群杰物联科技有限公司 Intelligent contract checking method and device, electronic equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663525A (en) * 2023-07-21 2023-08-29 科大讯飞股份有限公司 Document auditing method, device, equipment and storage medium
CN116663525B (en) * 2023-07-21 2023-12-01 科大讯飞股份有限公司 Document auditing method, device, equipment and storage medium
CN117151096A (en) * 2023-09-05 2023-12-01 江苏群杰物联科技有限公司 Intelligent contract checking method and device, electronic equipment and storage medium
CN116976683A (en) * 2023-09-25 2023-10-31 江铃汽车股份有限公司 Automatic auditing method, system, storage medium and device for contract clauses
CN116976683B (en) * 2023-09-25 2024-02-27 江铃汽车股份有限公司 Automatic auditing method, system, storage medium and device for contract clauses
CN117132244A (en) * 2023-10-26 2023-11-28 国网浙江省电力有限公司 Classification processing method, device and storage medium for intelligent compliance management system
CN117132244B (en) * 2023-10-26 2024-01-09 国网浙江省电力有限公司 Classification processing method, device and storage medium for intelligent compliance management system

Similar Documents

Publication Publication Date Title
CN111414479B (en) Label extraction method based on short text clustering technology
CN111222305B (en) Information structuring method and device
CN115630843A (en) Contract clause automatic checking method and system
WO2020244073A1 (en) Speech-based user classification method and device, computer apparatus, and storage medium
CN111309912A (en) Text classification method and device, computer equipment and storage medium
CN110674252A (en) High-precision semantic search system for judicial domain
CN111651601B (en) Training method and classification method for fault classification model of power information system
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN112052356B (en) Multimedia classification method, apparatus and computer readable storage medium
CN110096581B (en) System and method for establishing question-answer system recommendation questions based on user behaviors
CN113704444B (en) Question-answering method, system, equipment and storage medium based on natural language processing
CN113407677B (en) Method, apparatus, device and storage medium for evaluating consultation dialogue quality
CN114881043B (en) Deep learning model-based legal document semantic similarity evaluation method and system
CN114491079A (en) Knowledge graph construction and query method, device, equipment and medium
CN114547232A (en) Nested entity identification method and system with low labeling cost
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN111259223B (en) News recommendation and text classification method based on emotion analysis model
CN111475607A (en) Web data clustering method based on Mashup service function characteristic representation and density peak detection
CN111859955A (en) Public opinion data analysis model based on deep learning
Hamdi et al. Machine learning vs deterministic rule-based system for document stream segmentation
CN116150010A (en) Test case classification method based on ship feature labels
CN108733733B (en) Biomedical text classification method, system and storage medium based on machine learning
CN112800219B (en) Method and system for feeding back customer service log to return database
CN113722421A (en) Contract auditing method and system and computer readable storage medium
CN115221871B (en) Multi-feature fusion English scientific literature keyword extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination