CN110705265A - Contract clause risk identification method and device - Google Patents

Contract clause risk identification method and device Download PDF

Info

Publication number
CN110705265A
CN110705265A CN201910797847.XA CN201910797847A CN110705265A CN 110705265 A CN110705265 A CN 110705265A CN 201910797847 A CN201910797847 A CN 201910797847A CN 110705265 A CN110705265 A CN 110705265A
Authority
CN
China
Prior art keywords
risk
clause
contract
historical
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910797847.XA
Other languages
Chinese (zh)
Inventor
余红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910797847.XA priority Critical patent/CN110705265A/en
Publication of CN110705265A publication Critical patent/CN110705265A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents

Abstract

The application provides a contract clause risk identification method and a device, wherein the contract clause risk identification method comprises the following steps: acquiring a contract text to be identified; splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause; inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model; and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.

Description

Contract clause risk identification method and device
Technical Field
The application relates to the technical field of text recognition, in particular to a contract clause risk recognition method. The application also relates to a contract clause risk identification device, an electronic device and a computer readable storage medium.
Background
The contract has the significance that both parties are in a trusted or untrusted state, and due to the fact that legal dependence exists when the contract is signed, written commitments of both parties are legal and can be found during the process of performing cooperation, all business partners can standardize the process of committing and performing cooperation, so that the result of cooperation is beautified and legalized, and the method plays an immeasurable role in harmonious society and peace, and therefore the method also becomes an important way for building a legal society.
However, due to the characteristics of the specificity, the long-term performance, the diversity and the complexity of the contract performance, the risk responsibility existing in the contract clauses cannot be avoided by both contract parties, so the risk identification needs to be carried out on the contract clauses before contract subscription so as to achieve the purpose of avoiding the risk.
At present, the risk judgment work of contract terms mainly depends on professional knowledge and practice of legal professionals and the requirements of contract parties in combination with the currently effective laws and regulations to judge whether a contract term has a risk, which is a time-consuming and labor-consuming process. Not only brings huge workload for related legal personnel, but also reduces the efficiency of the whole process. It is difficult for lay-legal to determine whether a contract term is at risk. Therefore, a simple and efficient contract risk identification method is needed.
Disclosure of Invention
In view of this, the embodiment of the present application provides a contract term risk identification method. The application also relates to a contract clause risk identification device, an electronic device and a computer readable storage medium, which are used for solving the technical defects in the prior art.
According to a first aspect of the embodiments of the present application, there is provided a contract term risk identification method, including:
acquiring a contract text to be identified;
splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model;
and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
Optionally, the clause risk identification model is trained by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
carrying out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and inputting a clause risk recognition pre-training model for model training by taking the historical clause information as a training sample and taking a risk marking result corresponding to the historical clause information as a training label.
Optionally, the term risk identification pre-training model is pre-trained in the following manner:
constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer;
inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text;
adjusting parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the inputting of the training text into the pre-training model for pre-training of the model includes:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
and determining a target word vector of the historical clause information through the embedding layer according to the word vector dictionary, and inputting the target word vector into the pre-training model for pre-training of the model.
Optionally, the determining, by the embedding layer, a target word vector of the historical term information according to a word vector dictionary includes:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Optionally, the word vector dictionary is constructed by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and calculating a word vector corresponding to each target word unit, and constructing the word vector dictionary according to the word vector corresponding to each target word unit.
Optionally, after the step of obtaining the item risk identification result output by the item risk identification model is executed, before the step of highlighting the risk item in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
judging whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value or not;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
Optionally, the acquiring the contract text to be recognized includes:
receiving a contract text risk identification instruction;
acquiring a contract text image of a paper contract text;
and recognizing the text content in the contract text image by adopting an optical character recognition technology, and taking a recognition result as the contract text to be recognized.
Optionally, the presentation effect comprises at least one of:
bold, highlight, enlarge font, change font, underline, show by special color marking.
According to another aspect of the embodiments of the present application, there is provided a contract term risk identification apparatus, including:
the text to be recognized acquisition module is configured to acquire a contract text to be recognized;
the contract clause splitting module is configured to split contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
the clause risk identification module is configured to input the clause information into a pre-trained clause risk identification model for risk identification, and acquire a clause risk identification result output by the clause risk identification model;
and the risk clause highlighting module is configured to highlight and display the risk clause in the contract text to be identified in a preset highlighting mode if the clause risk identification result contains risk clauses.
Optionally, the contract term risk identification apparatus further includes:
a first historical clause information acquisition module configured to: acquiring historical clause information of a plurality of contract clauses in a historical contract text;
the risk marking processing module is configured to carry out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and the model training module is configured to input a term risk identification pre-training model for model training by taking the historical term information as a training sample and taking a risk marking result corresponding to the historical term information as a training label.
Optionally, the contract term risk identification apparatus further includes:
a pre-training model building module configured to: constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
a model parameter configuration module configured to configure pre-training model parameters, the pre-training model comprising an input layer, an embedding layer;
the model pre-training module is configured to input a training text into the pre-training model for model pre-training, wherein the training text is an unlabeled text;
a model parameter adjusting module configured to adjust parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the model pre-training module includes:
a second historical clause information obtaining sub-module configured to obtain historical clause information of a plurality of contract clauses in the historical contract text;
and the model pre-training sub-module is configured to determine a target word vector of the historical clause information through the embedding layer according to a word vector dictionary, and input the target word vector into the pre-training model for pre-training of the model.
Optionally, the model pre-training sub-module is further configured to:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the contract term risk identification method when executing the instructions.
According to another aspect of embodiments of the present application, there is provided a computer-readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the contract term risk identification method.
In the embodiment of the application, the clause risk identification is carried out by inputting the clause information obtained by splitting the contract to be identified into the clause risk identification model without manual intervention, so that the accuracy of the clause risk identification can be ensured, the speed of the clause risk identification can be improved, the working efficiency is improved, the risk clause is highlighted and displayed in the contract text in a preset highlighting mode, and the risk clause can be checked and processed by a worker conveniently so as to carry out risk management and control in time.
Drawings
FIG. 1 is a flow chart of a contract term risk identification method provided by an embodiment of the present application;
FIG. 2(a) is a schematic diagram of the generation of an input set of a pre-training model provided in an embodiment of the present application;
FIG. 2(b) is a schematic diagram of a specific generation of an input set of a pre-training model provided in an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a BERT model provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a contract term risk identification method provided by an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a contract term risk identification apparatus provided in an embodiment of the present application;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
First, the noun terms to which one or more embodiments of the present invention relate are explained.
BERT model: is called Bidirectional Encoder reproduction from transformations, and is a Bidirectional attention neural network model.
Optical Character Recognition technology (OCR): refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a character recognition method; namely, the process of scanning the text data, then analyzing and processing the image file and obtaining the character and layout information.
Word unit (token): before any actual processing of the input text, it needs to be segmented into language units such as words, punctuation, numbers or pure alphanumerics, which are called word units. For an english text, a word unit may be a word, a punctuation mark, a number, etc., and for a chinese text, the smallest word unit may be a word, a punctuation mark, a number, etc.
The embodiment of the application provides a contract clause risk identification method. The present application also relates to a contract term risk identification apparatus, an electronic device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
FIG. 1 shows a flow diagram of a contract term risk identification method according to an embodiment of the present application, including steps 102-108.
Step 102: and acquiring a contract text to be identified.
In an embodiment provided by this specification, the contract text to be identified may be an electronic contract text or a paper contract text, and if the contract text to be identified is a paper contract text, the contract clause information in the paper contract text needs to be identified, which may specifically be implemented by the following steps:
receiving a contract text risk identification instruction;
acquiring a contract text image of a paper contract text;
and recognizing the text content in the contract text image by adopting an optical character recognition technology, and taking a recognition result as the contract text to be recognized.
Specifically, the optical character recognition technology, i.e., the OCR technology, scans text data by using an electronic device, and then analyzes and processes an image file to obtain text and layout information. The OCR technology is utilized to identify the character content in the paper contract text image, so that the accuracy of the identification result can be ensured, and the processing efficiency of the contract text information can be improved.
In practical application, in addition to using the OCR technology to perform character recognition, other character recognition technologies may also be used to perform character recognition, and the application is not limited herein.
Step 104: and splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause.
In specific implementation, the information in the contract text is split according to a preset contract clause splitting method or according to a preset rule, specifically, the preset rule may be splitting the contract text by sentence or paragraph.
Step 106: and inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model.
The method includes the steps that after contract terms contained in a text to be recognized are split, term information obtained through splitting is input into a pre-trained term risk recognition model, and a term risk recognition result can be obtained.
In specific implementation, before model pre-training, a word vector dictionary needs to be constructed in advance, and in the embodiment of the application, the word vector dictionary can be constructed in the following way:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and calculating a word vector corresponding to each target word unit, and constructing the word vector dictionary according to the word vector corresponding to each target word unit.
Specifically, the historical contract text is a generated contract text, taking the financial industry as an example, a financial institution divides 2 ten thousand historical contract texts generated by a financial service-based financial institution, divides the 2 ten thousand historical contract texts by sentences to obtain a plurality of pieces of historical clause information, after the division is completed, performs word segmentation on the obtained plurality of pieces of historical clause information, performs word vector conversion on a target word unit generated by the word segmentation, constructs a word vector dictionary according to the conversion result, and takes the obtained one piece of historical clause information as an example to assist in credit worthiness investigation of foreign users, so as to obtain the clause informationTarget word units generated by word segmentation processing, namely target words are ' assistant ', ' pair ', ' country ', ' foreign ', ' use ', ' user ', ' go ', ' resource ', ' letter ', ' tune up ' and ' search ', the target words are subjected to word (word) vector conversion, and the conversion results are respectively assumed to be ' ECoordination device”、“EHelp with”、“ETo pair”、“EState of China”、“EOuter cover”、“EBy using”、“EHousehold”、“EInto”、“ELine of”、“EResource management system”、“ELetter”、“ERegulating device”、“ECheck the", wherein" ELower corner mark"indicates the word vector corresponding to each target word.
After the word vector conversion is completed, the above target word unit, i.e. target word and its correspondent word (word) vector are converted according to "auxiliary-ECoordination device"," Helo-EHelp with"to construct a word vector dictionary.
After the word vector dictionary is built, model pre-training can be performed, and in one embodiment provided by the application, the clause risk identification pre-training model is pre-trained in the following manner:
constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer;
inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text;
adjusting parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Specifically, a training text is input into the pre-training model to perform pre-training of the model, that is, historical clause information of a plurality of contract clauses in a historical contract text is obtained; and determining a target word vector of the historical clause information through the embedding layer according to the word vector dictionary, and inputting the target word vector into the pre-training model for pre-training of the model.
Before model pre-training, a pre-training model needs to be constructed, in an embodiment provided by the application, the pre-training model may be a BERT model, and in the process of model pre-training, after a training text is input into the pre-training model, a sentence of the input model is pre-embedded through an embedding layer of the model to obtain a target word vector corresponding to the sentence, specifically, the generation process of the target word vector may be realized through the following steps:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Specifically, a schematic diagram of generating the pre-training model input set is shown in fig. 2(a), where the pre-training model input set is a target word vector corresponding to the historical term information of the input pre-training model. Assuming that the input historical term information is 'agreed by two parties without accessing other payments', the specific generation schematic diagram of the pre-training model input set shown in fig. 2(b) is obtained after the historical term information is pre-embedded. Wherein the first word is a CLS (common language specification) token, which can be used for subsequent classification tasks; the special symbol SEP is used for separating two sentences, and the purpose is to distinguish the two sentences; word vector using ETarget characterA representation in which the target character is each target word or target symbol in the inputted history term information, ETarget characterI.e. word vectors corresponding to each target characterIn specific implementation, the word (word) vector corresponding to the target character can be obtained by querying a word vector dictionary; sentence vector EAAnd EBRepresentation for distinguishing two sentences, in particular use, EAAnd EBMay be represented by 0 and 1, respectively; position vector E1、E2、E3Etc., the arabic numeral subscript indicates the location of the target word unit throughout the historical term information.
As shown in fig. 2(b), the input history clause information is "agreed by both parties and no other payment is accessed", the corresponding word vector is [ 02345167891011121 ], the corresponding sentence vector is [ 00000011111111 ], the corresponding position vector is [ 012345678910111213 ], and the word vector, the sentence vector and the position vector corresponding to the history clause information are summed to obtain the target word vector of the history clause information [ 0357961315171921232515 ]. And inputting the target word vector corresponding to the historical clause information into the pre-training model to perform model pre-training, wherein the training sample used in the pre-training process is a label-free sample.
Specifically, as shown in fig. 3, the BERT model may include n stacked layers, and the n stacked layers are sequentially connected. Each stack layer further comprises: a self-attention layer, a first specification layer, a feedforward layer, and a second specification layer. Inputting a target word vector consisting of the word vector, the sentence vector and the position vector to the 1 st stack layer to obtain an output vector of the 1 st stack layer, inputting the output vector of the 1 st stack layer to the 2 nd stack layer … …, repeating the steps, and obtaining an output vector of the last stack layer finally, wherein the result of the last stack layer is used as a new feature representation of each historical clause information.
Inputting a label-free training sample into the BERT model to perform model pre-training, adjusting parameters of the pre-training model according to a pre-training result, and obtaining the clause risk identification pre-training model, wherein the parameters of the clause risk identification pre-training model represent the weight of a neural network; further, after the pre-training is completed, inputting the training sample with the label into the clause risk identification pre-training model for model training, and obtaining the clause risk identification model, in an embodiment provided by the present application, the clause risk identification model is trained in the following manner:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
carrying out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and inputting a clause risk recognition pre-training model for model training by taking the historical clause information as a training sample and taking a risk marking result corresponding to the historical clause information as a training label.
In specific implementation, the number of the historical contract texts acquired in the training stage of the clause risk identification pre-training model is less than or equal to the number of the historical contract texts acquired in the model pre-training stage, and in the model training stage, clause information in the historical contract texts needs to be labeled, namely whether the clause information is risky or not is labeled.
Specifically, after the clause risk identification model is obtained through training, a model application stage can be entered, and whether the contract clause to be identified has a risk can be judged by inputting the contract clause to be identified into the clause risk identification model.
In an embodiment provided by the present application, after obtaining a clause risk identification result output by the clause risk identification model, it is required to determine whether a risk value of the clause information included in the clause risk identification result is greater than a preset risk threshold value;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
Specifically, it is assumed that the clause information input into the clause risk identification model is "the contract takes effect since the date of signing, the validity period is 2 years, after the cooperation period, if both parties without disagreement can continue to extend the contract, the number of the extension period is not limited, and the contract can be signed separately. ", the clause risk identification result output by the clause risk identification model shows that the risk value of the clause information is 79%; if the preset risk threshold is 75%, the risk value of the clause information is greater than the preset risk threshold, which indicates that the probability that the clause information has a risk is high, then execute step 108; if the risk value of the clause information is smaller than a preset risk threshold value, the probability that the risk exists in the clause information is low, and then the clause information is not processed.
Step 108: and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
The preset highlighting manner may be bolding, highlighting, enlarging the font, changing the font, underlining, displaying by a special color mark, and the like.
In an embodiment provided by the present specification, if the text to be recognized includes a risk clause, the risk clause may be modified or deleted, and specifically, the method may be implemented by:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
Specifically, it is assumed that the term risk identification result output by the term risk identification model is: the term information' the contract takes effect from the signing date, the validity period is 2 years, after the cooperation period, if the two parties do not agree, the contract can be continued to be extended, the number of the extended period is not limited, and the contract can be signed separately. If the risk value is 79% and is greater than the preset risk threshold value of 75%, and the risk item belongs to risk items, performing semantic analysis on the item information, and obtaining that the item is an automatic continuation item according to a semantic analysis result, so that the risk exists. In a specific implementation, the risk clause may be deleted or modified into the risk-free clause information, for example, the clause information may be modified into "the contract takes effect from the date of signing, the validity period is 2 years, and after the cooperation period, the two parties may sign the contract separately. ", and replacing the risk terms in the contract text with the non-risk terms to generate a new contract text.
In addition, in an embodiment provided by the present application, if the text to be recognized includes risk terms, the term information after ranking may be processed as follows:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
Specifically, the risk terms are divided into a first risk level and a second risk level according to the magnitude of the risk value, and the risk value of the risk terms in the first risk level is larger and is directly deleted; the risk value of the risk terms in the second risk level is smaller than the risk value of the risk terms in the first risk level, and therefore, the modification process can be performed on the risk terms in the second risk level. In specific implementation, semantic analysis is performed on risk terms in the second risk level to generate replaceable terms with consistent semantics, the replaceable terms are input into the term risk identification model to perform risk identification, and if a term risk result output by the term risk identification model is that a risk value of the replaceable terms is lower than a preset risk threshold, the replaceable terms are used for replacing the risk terms to generate a new contract text.
In the method embodiment provided by the application, the OCR technology is utilized to identify the character content in the paper contract text image, so that the accuracy of the identification result can be ensured, and the processing efficiency of the contract text information can be improved. The clause risk identification is carried out by inputting clause information obtained by splitting the contract to be identified into the clause risk identification model without manual intervention, so that the accuracy of clause risk identification can be ensured, the speed of clause risk identification can be increased, and the working efficiency can be increased. In addition, model training with the marked text is utilized, so that the accuracy of the recognition result of the clause risk recognition model on the clause information is improved. The risk terms are highlighted in the contract text in a preset highlighting mode, so that the risk terms can be conveniently checked and processed by workers, the risk terms are deleted or modified, and the risk can be timely managed and controlled.
In the embodiment of the present application, fig. 4 shows a contract clause risk identification method according to an embodiment of the present application, which is described by taking the contract as a financing contract as an example, and includes steps 402 to 414.
Step 402: and receiving a financing contract text risk identification instruction.
Step 404: and acquiring a contract text image of the paper financing contract text.
Step 406: and recognizing the character content in the contract text image by adopting an OCR technology, and taking the recognition result as the financing contract text to be recognized.
Step 408: and splitting contract terms contained in the to-be-identified financing contract text to obtain financing term information of each contract term.
Step 410: and inputting the financing term information into a pre-trained financing term risk identification model for risk identification, and acquiring a financing term risk identification result output by the financing term risk identification model.
Specifically, the training process of the model may be implemented by steps including step S1 to step S7.
Step S1: and constructing a pre-training model based on the incidence relation between the historical financing clause information in the historical contract text and the risk marking result.
Step S2: and configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer.
Step S3: and inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text.
Step S4: adjusting parameters of the pre-training model to obtain the financial term risk identification pre-training model; wherein the parameters of the financial term risk identification pre-trained model represent weights of a neural network.
Step S5: and acquiring historical financing clause information of part of contract clauses in the historical contract text.
Step S6: carrying out risk marking processing on the historical financing clause information; wherein, the risk label is used for indicating whether the historical financing clause information is risk clause.
Step S7: and inputting a term risk recognition pre-training model for model training by taking the historical financing term information as a training sample and taking a risk marking result corresponding to the historical financing term information as a training label.
Step 412: judging whether the risk value of the financing term information contained in the financing term risk identification result is greater than a preset risk threshold value, if so, executing step 414; if not, the processing is not required.
Step 414: and highlighting the risk financing terms in the financing contract text to be identified in a preset highlighting mode.
In one embodiment provided by the application, the OCR technology is utilized to identify the character content in the text image of the paper financing contract, so that the accuracy of the identification result can be ensured, and the processing efficiency of the text information of the financing contract can be improved. The method has the advantages that the term information obtained by splitting the financial portfolio to be identified is input into the financial term risk identification model for financial term risk identification, manual intervention is not needed, the accuracy of financial term risk identification can be guaranteed, the speed of financial term risk identification can be increased, and the working efficiency is improved. In addition, model training by using the labeled text is also beneficial to improving the accuracy of the recognition result of the financial term risk recognition model on the financial term information. The risk financing terms are highlighted in the contract text in a preset highlighting mode, so that the risk financing terms can be conveniently checked and processed by workers, the risk financing terms are deleted or modified, and timely management and control of financing risks can be guaranteed.
Corresponding to the above method embodiment, the present application also provides an embodiment of a contract term risk identification apparatus, and fig. 5 shows a schematic structural diagram of the contract term risk identification apparatus according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:
a text to be recognized acquisition module 502 configured to acquire a contract text to be recognized;
a contract clause splitting module 504 configured to split contract clauses included in the contract text to be identified, and obtain clause information of each contract clause;
a clause risk identification module 506 configured to input the clause information into a pre-trained clause risk identification model for risk identification, and obtain a clause risk identification result output by the clause risk identification model;
a risk clause highlighting module 508 configured to highlight the risk clause in the contract text to be identified in a preset highlighting manner if the clause risk identification result includes a risk clause.
Optionally, the contract term risk identification apparatus further includes:
a first historical clause information acquisition module configured to: acquiring historical clause information of a plurality of contract clauses in a historical contract text;
the risk marking processing module is configured to carry out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and the model training module is configured to input a term risk identification pre-training model for model training by taking the historical term information as a training sample and taking a risk marking result corresponding to the historical term information as a training label.
Optionally, the contract term risk identification apparatus further includes:
a pre-training model building module configured to: constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
a model parameter configuration module configured to configure pre-training model parameters, the pre-training model comprising an input layer, an embedding layer;
the model pre-training module is configured to input a training text into the pre-training model for model pre-training, wherein the training text is an unlabeled text;
a model parameter adjusting module configured to adjust parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the model pre-training module includes:
a second historical clause information obtaining sub-module configured to obtain historical clause information of a plurality of contract clauses in the historical contract text;
and the model pre-training sub-module is configured to determine a target word vector of the historical clause information through the embedding layer according to a word vector dictionary, and input the target word vector into the pre-training model for pre-training of the model.
Optionally, the model pre-training sub-module is further configured to:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Optionally, the contract term risk identification apparatus further includes:
a third history clause information acquisition module configured to acquire history clause information of a plurality of contract clauses in the history contract text;
the word segmentation processing module is configured to perform word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and the word vector dictionary obtaining module is configured to calculate a word vector corresponding to each target word unit and construct the word vector dictionary according to the word vector corresponding to each target word unit.
Optionally, the contract term risk identification apparatus further includes:
the judgment module is configured to judge whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value;
if yes, operating the position query module;
the position query module is configured to determine the clause information as risk clauses and query specific positions of the risk clauses in the contract text to be identified.
Optionally, the contract term risk identification apparatus further includes:
a grading module configured to grade the risk terms into a first risk grade and a second risk grade according to the magnitude of the risk values of the risk terms;
running a first risk term deletion module if the risk terms belong to a first risk level;
the first risk clause deleting module is configured to delete the risk clauses in the contract text to generate a new contract text;
running a first semantic analysis module if the risk clause belongs to a second risk level;
the first semantic analysis module is configured to perform semantic analysis on the risk terms, generate a semantic analysis result, generate risk-free terms related to the risk terms according to the semantic analysis result, and replace the risk terms in the contract text with the risk-free terms to generate a new contract text.
Optionally, the contract term risk identification apparatus further includes:
a second risk term deletion module configured to delete the risk terms in the contract text to generate a new contract text; and/or
A second semantic analysis module configured to perform semantic analysis on the risk terms, generate a semantic analysis result, generate risk-free terms related to the risk terms according to the semantic analysis result, and replace the risk terms in the contract text with the risk-free terms to generate a new contract text.
Optionally, the to-be-recognized text obtaining module includes:
the instruction receiving submodule is configured to receive a contract text risk identification instruction;
the image acquisition sub-module is configured to acquire a contract text image of the paper contract text;
and the contract text to be recognized acquisition submodule is configured to recognize the text content in the contract text image by adopting an optical character recognition technology, and the recognition result is used as the contract text to be recognized.
In the device embodiment provided by the application, the OCR technology is utilized to identify the character content in the paper contract text image, so that the accuracy of the identification result can be ensured, and the processing efficiency of the contract text information can be improved. The clause risk identification is carried out by inputting clause information obtained by splitting the contract to be identified into the clause risk identification model without manual intervention, so that the accuracy of clause risk identification can be ensured, the speed of clause risk identification can be increased, and the working efficiency can be increased. In addition, model training with the marked text is utilized, so that the accuracy of the recognition result of the clause risk recognition model on the clause information is improved. The risk terms are highlighted in the contract text in a preset highlighting mode, so that the risk terms can be conveniently checked and processed by workers, the risk terms are deleted or modified, and the risk can be timely managed and controlled.
Fig. 6 shows a block diagram of an electronic device 600 according to an embodiment of the present application. The components of the electronic device 600 include, but are not limited to, a memory 610 and a processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to store data.
The electronic device 600 also includes an access device 640 that enables the electronic device 600 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In an embodiment of the present application, the above-mentioned components of the electronic device 600 and other components not shown in fig. 6 may also be connected to each other, for example by a bus. It should be understood that the block diagram of the electronic device shown in fig. 6 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.
The electronic device 600 may be any type of stationary or mobile electronic device, including a mobile computer or mobile electronic device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable electronic device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary electronic device such as a desktop computer or PC. The electronic device 600 may also be a mobile or stationary server.
Wherein processor 620 is configured to execute the following computer-executable instructions:
acquiring a contract text to be identified;
splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model;
and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
Optionally, the clause risk identification model is trained by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
carrying out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and inputting a clause risk recognition pre-training model for model training by taking the historical clause information as a training sample and taking a risk marking result corresponding to the historical clause information as a training label.
Optionally, the term risk identification pre-training model is pre-trained in the following manner:
constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer;
inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text;
adjusting parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the inputting of the training text into the pre-training model for pre-training of the model includes:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
and determining a target word vector of the historical clause information through the embedding layer according to the word vector dictionary, and inputting the target word vector into the pre-training model for pre-training of the model.
Optionally, the determining, by the embedding layer, a target word vector of the historical term information according to a word vector dictionary includes:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Optionally, the word vector dictionary is constructed by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and calculating a word vector corresponding to each target word unit, and constructing the word vector dictionary according to the word vector corresponding to each target word unit.
Optionally, after the step of obtaining the item risk identification result output by the item risk identification model is executed, before the step of highlighting the risk item in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
judging whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value or not;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
Optionally, the acquiring the contract text to be recognized includes:
receiving a contract text risk identification instruction;
acquiring a contract text image of a paper contract text;
and recognizing the text content in the contract text image by adopting an optical character recognition technology, and taking a recognition result as the contract text to be recognized.
Optionally, the presentation effect comprises at least one of:
bold, highlight, enlarge font, change font, underline, show by special color marking.
The above is a schematic scheme of an electronic device of the present embodiment. It should be noted that the technical solution of the electronic device and the technical solution of the contract term risk identification method described above belong to the same concept, and details that are not described in detail in the technical solution of the electronic device can be referred to the description of the technical solution of the contract term risk identification method described above.
An embodiment of the present application also provides a computer readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the contract term risk identification method as described above.
Wherein the computer readable storage medium stores computer instructions for:
acquiring a contract text to be identified;
splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model;
and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
Optionally, the clause risk identification model is trained by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
carrying out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and inputting a clause risk recognition pre-training model for model training by taking the historical clause information as a training sample and taking a risk marking result corresponding to the historical clause information as a training label.
Optionally, the term risk identification pre-training model is pre-trained in the following manner:
constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer;
inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text;
adjusting parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the inputting of the training text into the pre-training model for pre-training of the model includes:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
and determining a target word vector of the historical clause information through the embedding layer according to the word vector dictionary, and inputting the target word vector into the pre-training model for pre-training of the model.
Optionally, the determining, by the embedding layer, a target word vector of the historical term information according to a word vector dictionary includes:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Optionally, the word vector dictionary is constructed by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and calculating a word vector corresponding to each target word unit, and constructing the word vector dictionary according to the word vector corresponding to each target word unit.
Optionally, after the step of obtaining the item risk identification result output by the item risk identification model is executed, before the step of highlighting the risk item in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
judging whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value or not;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
Optionally, the acquiring the contract text to be recognized includes:
receiving a contract text risk identification instruction;
acquiring a contract text image of a paper contract text;
and recognizing the text content in the contract text image by adopting an optical character recognition technology, and taking a recognition result as the contract text to be recognized.
Optionally, the presentation effect comprises at least one of:
bold, highlight, enlarge font, change font, underline, show by special color marking.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the contract term risk identification method described above belong to the same concept, and for details that are not described in detail in the technical solution of the storage medium, reference may be made to the description of the technical solution of the contract term risk identification method described above.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims (18)

1. A contract term risk identification method, comprising:
acquiring a contract text to be identified;
splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model;
and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
2. The contract term risk identification method of claim 1, wherein the term risk identification model is trained by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
carrying out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and inputting a clause risk recognition pre-training model for model training by taking the historical clause information as a training sample and taking a risk marking result corresponding to the historical clause information as a training label.
3. The contract term risk identification method of claim 2, wherein the term risk identification pre-training model is pre-trained in the following manner:
constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer;
inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text;
adjusting parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
4. The contract term risk identification method of claim 3, wherein the entering of training text into the pre-training model for model pre-training comprises:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
and determining a target word vector of the historical clause information through the embedding layer according to the word vector dictionary, and inputting the target word vector into the pre-training model for pre-training of the model.
5. The contract term risk identification method according to claim 4, wherein the determining, by the embedding layer, a target word vector of the historical term information according to a word vector dictionary comprises:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
6. The contract term risk identification method of claim 5, wherein the word vector dictionary is constructed by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and calculating a word vector corresponding to each target word unit, and constructing the word vector dictionary according to the word vector corresponding to each target word unit.
7. The contract term risk identification method according to claim 1, wherein after the step of obtaining the term risk identification result output by the term risk identification model is executed, and before the step of highlighting the risk term in the contract text to be identified by a preset highlighting manner is executed, the method further comprises:
judging whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value or not;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
8. The contract term risk identification method according to claim 7, wherein after the step of highlighting the risk term in the contract text to be identified by a preset highlighting manner is performed, the method further comprises:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
9. The contract term risk identification method according to claim 1, wherein after the step of highlighting the risk term in the contract text to be identified by a preset highlighting manner is performed, the method further comprises:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
10. The contract term risk identification method according to claim 1, wherein the acquiring the contract text to be identified comprises:
receiving a contract text risk identification instruction;
acquiring a contract text image of a paper contract text;
and recognizing the text content in the contract text image by adopting an optical character recognition technology, and taking a recognition result as the contract text to be recognized.
11. The contract term risk identification method according to claim 1, wherein the exhibition effect includes at least one of:
bold, highlight, enlarge font, change font, underline, show by special color marking.
12. A contract term risk identification apparatus, comprising:
the text to be recognized acquisition module is configured to acquire a contract text to be recognized;
the contract clause splitting module is configured to split contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
the clause risk identification module is configured to input the clause information into a pre-trained clause risk identification model for risk identification, and acquire a clause risk identification result output by the clause risk identification model;
and the risk clause highlighting module is configured to highlight and display the risk clause in the contract text to be identified in a preset highlighting mode if the clause risk identification result contains risk clauses.
13. The contract term risk identification apparatus of claim 12, further comprising:
a first historical clause information acquisition module configured to: acquiring historical clause information of a plurality of contract clauses in a historical contract text;
the risk marking processing module is configured to carry out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and the model training module is configured to input a term risk identification pre-training model for model training by taking the historical term information as a training sample and taking a risk marking result corresponding to the historical term information as a training label.
14. The contract term risk identification apparatus of claim 13, further comprising:
a pre-training model building module configured to: constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
a model parameter configuration module configured to configure pre-training model parameters, the pre-training model comprising an input layer, an embedding layer;
the model pre-training module is configured to input a training text into the pre-training model for model pre-training, wherein the training text is an unlabeled text;
a model parameter adjusting module configured to adjust parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
15. The contract term risk identification apparatus of claim 14, wherein the model pre-training module comprises:
a second historical clause information obtaining sub-module configured to obtain historical clause information of a plurality of contract clauses in the historical contract text;
and the model pre-training sub-module is configured to determine a target word vector of the historical clause information through the embedding layer according to a word vector dictionary, and input the target word vector into the pre-training model for pre-training of the model.
16. The contract term risk identification apparatus of claim 15, wherein the model pre-training sub-module is further configured to:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
17. An electronic device, comprising:
a memory, a processor;
the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:
acquiring a contract text to be identified;
splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model;
and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
18. A computer readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the contract term risk identification method of any one of claims 1-11.
CN201910797847.XA 2019-08-27 2019-08-27 Contract clause risk identification method and device Pending CN110705265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910797847.XA CN110705265A (en) 2019-08-27 2019-08-27 Contract clause risk identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910797847.XA CN110705265A (en) 2019-08-27 2019-08-27 Contract clause risk identification method and device

Publications (1)

Publication Number Publication Date
CN110705265A true CN110705265A (en) 2020-01-17

Family

ID=69193768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910797847.XA Pending CN110705265A (en) 2019-08-27 2019-08-27 Contract clause risk identification method and device

Country Status (1)

Country Link
CN (1) CN110705265A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310437A (en) * 2020-02-29 2020-06-19 重庆百事得大牛机器人有限公司 Robot-based counselor link system
CN111666408A (en) * 2020-05-26 2020-09-15 中国工商银行股份有限公司 Method and device for screening and displaying important clauses
CN111783781A (en) * 2020-05-22 2020-10-16 平安国际智慧城市科技股份有限公司 Malicious clause identification method, device and equipment based on product agreement character identification
CN111832300A (en) * 2020-07-24 2020-10-27 中国联合网络通信集团有限公司 Contract auditing method and device based on deep learning
CN111932412A (en) * 2020-09-04 2020-11-13 汪宏杰 Contract drafting and revising method, device, storage medium and equipment
CN112232088A (en) * 2020-11-19 2021-01-15 京北方信息技术股份有限公司 Contract clause risk intelligent identification method and device, electronic equipment and storage medium
CN112632989A (en) * 2020-12-29 2021-04-09 中国农业银行股份有限公司 Method, device and equipment for prompting risk information in contract text
CN112668899A (en) * 2020-12-31 2021-04-16 无锡软美信息科技有限公司 Contract risk identification method and device based on artificial intelligence
CN113393201A (en) * 2020-03-12 2021-09-14 阿里巴巴集团控股有限公司 Contract processing system and method and electronic equipment
CN113469479A (en) * 2020-03-31 2021-10-01 阿里巴巴集团控股有限公司 Contract risk prediction method and device
CN113779640A (en) * 2021-09-01 2021-12-10 北京橙色云科技有限公司 Contract signing method, contract signing device and storage medium
CN115080924A (en) * 2022-07-25 2022-09-20 南开大学 Software license clause extraction method based on natural language understanding
CN117151096A (en) * 2023-09-05 2023-12-01 江苏群杰物联科技有限公司 Intelligent contract checking method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447105A (en) * 2018-09-10 2019-03-08 平安科技(深圳)有限公司 Contract audit method, apparatus, computer equipment and storage medium
CN109918635A (en) * 2017-12-12 2019-06-21 中兴通讯股份有限公司 A kind of contract text risk checking method, device, equipment and storage medium
CN110059924A (en) * 2019-03-13 2019-07-26 平安城市建设科技(深圳)有限公司 Checking method, device, equipment and the computer readable storage medium of contract terms
CN110147981A (en) * 2019-04-12 2019-08-20 深圳壹账通智能科技有限公司 Contract Risk checking method, device and terminal device based on text analyzing
CN110163478A (en) * 2019-04-18 2019-08-23 平安科技(深圳)有限公司 A kind of the risk checking method and device of contract terms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918635A (en) * 2017-12-12 2019-06-21 中兴通讯股份有限公司 A kind of contract text risk checking method, device, equipment and storage medium
CN109447105A (en) * 2018-09-10 2019-03-08 平安科技(深圳)有限公司 Contract audit method, apparatus, computer equipment and storage medium
CN110059924A (en) * 2019-03-13 2019-07-26 平安城市建设科技(深圳)有限公司 Checking method, device, equipment and the computer readable storage medium of contract terms
CN110147981A (en) * 2019-04-12 2019-08-20 深圳壹账通智能科技有限公司 Contract Risk checking method, device and terminal device based on text analyzing
CN110163478A (en) * 2019-04-18 2019-08-23 平安科技(深圳)有限公司 A kind of the risk checking method and device of contract terms

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310437A (en) * 2020-02-29 2020-06-19 重庆百事得大牛机器人有限公司 Robot-based counselor link system
CN111310437B (en) * 2020-02-29 2020-12-15 重庆百事得大牛机器人有限公司 Robot-based counselor link system
CN113393201A (en) * 2020-03-12 2021-09-14 阿里巴巴集团控股有限公司 Contract processing system and method and electronic equipment
CN113469479A (en) * 2020-03-31 2021-10-01 阿里巴巴集团控股有限公司 Contract risk prediction method and device
CN111783781A (en) * 2020-05-22 2020-10-16 平安国际智慧城市科技股份有限公司 Malicious clause identification method, device and equipment based on product agreement character identification
CN111783781B (en) * 2020-05-22 2024-04-05 深圳赛安特技术服务有限公司 Malicious term recognition method, device and equipment based on product agreement character recognition
WO2021232593A1 (en) * 2020-05-22 2021-11-25 平安国际智慧城市科技股份有限公司 Product protocol character recognition-based method and apparatus for recognizing malicious terms, and device
CN111666408A (en) * 2020-05-26 2020-09-15 中国工商银行股份有限公司 Method and device for screening and displaying important clauses
CN111832300A (en) * 2020-07-24 2020-10-27 中国联合网络通信集团有限公司 Contract auditing method and device based on deep learning
CN111932412A (en) * 2020-09-04 2020-11-13 汪宏杰 Contract drafting and revising method, device, storage medium and equipment
CN112232088A (en) * 2020-11-19 2021-01-15 京北方信息技术股份有限公司 Contract clause risk intelligent identification method and device, electronic equipment and storage medium
CN112632989A (en) * 2020-12-29 2021-04-09 中国农业银行股份有限公司 Method, device and equipment for prompting risk information in contract text
CN112632989B (en) * 2020-12-29 2023-11-03 中国农业银行股份有限公司 Method, device and equipment for prompting risk information in contract text
CN112668899A (en) * 2020-12-31 2021-04-16 无锡软美信息科技有限公司 Contract risk identification method and device based on artificial intelligence
CN113779640A (en) * 2021-09-01 2021-12-10 北京橙色云科技有限公司 Contract signing method, contract signing device and storage medium
CN115080924A (en) * 2022-07-25 2022-09-20 南开大学 Software license clause extraction method based on natural language understanding
CN117151096A (en) * 2023-09-05 2023-12-01 江苏群杰物联科技有限公司 Intelligent contract checking method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110705265A (en) Contract clause risk identification method and device
CN109685056B (en) Method and device for acquiring document information
CN110781672B (en) Question bank production method and system based on machine intelligence
Alotaibi et al. Optical character recognition for quranic image similarity matching
CN110209802B (en) Method and device for extracting abstract text
CN110633577A (en) Text desensitization method and device
US10963647B2 (en) Predicting probability of occurrence of a string using sequence of vectors
CN113961685A (en) Information extraction method and device
CN111783471A (en) Semantic recognition method, device, equipment and storage medium of natural language
CN114090776A (en) Document analysis method, system and device
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
CN114297987B (en) Document information extraction method and system based on text classification and reading understanding
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
Tymoshenko et al. Real-Time Ukrainian Text Recognition and Voicing.
CN114780582A (en) Natural answer generating system and method based on form question and answer
CN114548072A (en) Automatic content analysis and information evaluation method and system for contract files
CN113642569A (en) Unstructured data document processing method and related equipment
CN114077655A (en) Method and device for training answer extraction model
CN115470790A (en) Method and device for identifying named entities in file
CN114139545A (en) Information extraction method and device
CN110555431B (en) Image recognition method and device
CN111046934B (en) SWIFT message soft clause recognition method and device
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium
Thomas et al. Extracting Key-Value Pairs in Business Documents
Ngo et al. A Two-Phase Framework for Automated Information Extraction From Curriculum Vitae

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201012

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201012

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right