Disclosure of Invention
In view of this, the embodiment of the present application provides a contract term risk identification method. The application also relates to a contract clause risk identification device, an electronic device and a computer readable storage medium, which are used for solving the technical defects in the prior art.
According to a first aspect of the embodiments of the present application, there is provided a contract term risk identification method, including:
acquiring a contract text to be identified;
splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model;
and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
Optionally, the clause risk identification model is trained by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
carrying out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and inputting a clause risk recognition pre-training model for model training by taking the historical clause information as a training sample and taking a risk marking result corresponding to the historical clause information as a training label.
Optionally, the term risk identification pre-training model is pre-trained in the following manner:
constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer;
inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text;
adjusting parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the inputting of the training text into the pre-training model for pre-training of the model includes:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
and determining a target word vector of the historical clause information through the embedding layer according to the word vector dictionary, and inputting the target word vector into the pre-training model for pre-training of the model.
Optionally, the determining, by the embedding layer, a target word vector of the historical term information according to a word vector dictionary includes:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Optionally, the word vector dictionary is constructed by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and calculating a word vector corresponding to each target word unit, and constructing the word vector dictionary according to the word vector corresponding to each target word unit.
Optionally, after the step of obtaining the item risk identification result output by the item risk identification model is executed, before the step of highlighting the risk item in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
judging whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value or not;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
Optionally, the acquiring the contract text to be recognized includes:
receiving a contract text risk identification instruction;
acquiring a contract text image of a paper contract text;
and recognizing the text content in the contract text image by adopting an optical character recognition technology, and taking a recognition result as the contract text to be recognized.
Optionally, the presentation effect comprises at least one of:
bold, highlight, enlarge font, change font, underline, show by special color marking.
According to another aspect of the embodiments of the present application, there is provided a contract term risk identification apparatus, including:
the text to be recognized acquisition module is configured to acquire a contract text to be recognized;
the contract clause splitting module is configured to split contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
the clause risk identification module is configured to input the clause information into a pre-trained clause risk identification model for risk identification, and acquire a clause risk identification result output by the clause risk identification model;
and the risk clause highlighting module is configured to highlight and display the risk clause in the contract text to be identified in a preset highlighting mode if the clause risk identification result contains risk clauses.
Optionally, the contract term risk identification apparatus further includes:
a first historical clause information acquisition module configured to: acquiring historical clause information of a plurality of contract clauses in a historical contract text;
the risk marking processing module is configured to carry out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and the model training module is configured to input a term risk identification pre-training model for model training by taking the historical term information as a training sample and taking a risk marking result corresponding to the historical term information as a training label.
Optionally, the contract term risk identification apparatus further includes:
a pre-training model building module configured to: constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
a model parameter configuration module configured to configure pre-training model parameters, the pre-training model comprising an input layer, an embedding layer;
the model pre-training module is configured to input a training text into the pre-training model for model pre-training, wherein the training text is an unlabeled text;
a model parameter adjusting module configured to adjust parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the model pre-training module includes:
a second historical clause information obtaining sub-module configured to obtain historical clause information of a plurality of contract clauses in the historical contract text;
and the model pre-training sub-module is configured to determine a target word vector of the historical clause information through the embedding layer according to a word vector dictionary, and input the target word vector into the pre-training model for pre-training of the model.
Optionally, the model pre-training sub-module is further configured to:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the contract term risk identification method when executing the instructions.
According to another aspect of embodiments of the present application, there is provided a computer-readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the contract term risk identification method.
In the embodiment of the application, the clause risk identification is carried out by inputting the clause information obtained by splitting the contract to be identified into the clause risk identification model without manual intervention, so that the accuracy of the clause risk identification can be ensured, the speed of the clause risk identification can be improved, the working efficiency is improved, the risk clause is highlighted and displayed in the contract text in a preset highlighting mode, and the risk clause can be checked and processed by a worker conveniently so as to carry out risk management and control in time.
In specific implementation, the number of the historical contract texts acquired in the training stage of the clause risk identification pre-training model is less than or equal to the number of the historical contract texts acquired in the model pre-training stage, and in the model training stage, clause information in the historical contract texts needs to be labeled, namely whether the clause information is risky or not is labeled.
Specifically, after the clause risk identification model is obtained through training, a model application stage can be entered, and whether the contract clause to be identified has a risk can be judged by inputting the contract clause to be identified into the clause risk identification model.
In an embodiment provided by the present application, after obtaining a clause risk identification result output by the clause risk identification model, it is required to determine whether a risk value of the clause information included in the clause risk identification result is greater than a preset risk threshold value;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
Specifically, it is assumed that the clause information input into the clause risk identification model is "the contract takes effect since the date of signing, the validity period is 2 years, after the cooperation period, if both parties without disagreement can continue to extend the contract, the number of the extension period is not limited, and the contract can be signed separately. ", the clause risk identification result output by the clause risk identification model shows that the risk value of the clause information is 79%; if the preset risk threshold is 75%, the risk value of the clause information is greater than the preset risk threshold, which indicates that the probability that the clause information has a risk is high, then execute step 108; if the risk value of the clause information is smaller than a preset risk threshold value, the probability that the risk exists in the clause information is low, and then the clause information is not processed.
Step 108: and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
The preset highlighting manner may be bolding, highlighting, enlarging the font, changing the font, underlining, displaying by a special color mark, and the like.
In an embodiment provided by the present specification, if the text to be recognized includes a risk clause, the risk clause may be modified or deleted, and specifically, the method may be implemented by:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
Specifically, it is assumed that the term risk identification result output by the term risk identification model is: the term information' the contract takes effect from the signing date, the validity period is 2 years, after the cooperation period, if the two parties do not agree, the contract can be continued to be extended, the number of the extended period is not limited, and the contract can be signed separately. If the risk value is 79% and is greater than the preset risk threshold value of 75%, and the risk item belongs to risk items, performing semantic analysis on the item information, and obtaining that the item is an automatic continuation item according to a semantic analysis result, so that the risk exists. In a specific implementation, the risk clause may be deleted or modified into the risk-free clause information, for example, the clause information may be modified into "the contract takes effect from the date of signing, the validity period is 2 years, and after the cooperation period, the two parties may sign the contract separately. ", and replacing the risk terms in the contract text with the non-risk terms to generate a new contract text.
In addition, in an embodiment provided by the present application, if the text to be recognized includes risk terms, the term information after ranking may be processed as follows:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
Specifically, the risk terms are divided into a first risk level and a second risk level according to the magnitude of the risk value, and the risk value of the risk terms in the first risk level is larger and is directly deleted; the risk value of the risk terms in the second risk level is smaller than the risk value of the risk terms in the first risk level, and therefore, the modification process can be performed on the risk terms in the second risk level. In specific implementation, semantic analysis is performed on risk terms in the second risk level to generate replaceable terms with consistent semantics, the replaceable terms are input into the term risk identification model to perform risk identification, and if a term risk result output by the term risk identification model is that a risk value of the replaceable terms is lower than a preset risk threshold, the replaceable terms are used for replacing the risk terms to generate a new contract text.
In the method embodiment provided by the application, the OCR technology is utilized to identify the character content in the paper contract text image, so that the accuracy of the identification result can be ensured, and the processing efficiency of the contract text information can be improved. The clause risk identification is carried out by inputting clause information obtained by splitting the contract to be identified into the clause risk identification model without manual intervention, so that the accuracy of clause risk identification can be ensured, the speed of clause risk identification can be increased, and the working efficiency can be increased. In addition, model training with the marked text is utilized, so that the accuracy of the recognition result of the clause risk recognition model on the clause information is improved. The risk terms are highlighted in the contract text in a preset highlighting mode, so that the risk terms can be conveniently checked and processed by workers, the risk terms are deleted or modified, and the risk can be timely managed and controlled.
In the embodiment of the present application, fig. 4 shows a contract clause risk identification method according to an embodiment of the present application, which is described by taking the contract as a financing contract as an example, and includes steps 402 to 414.
Step 402: and receiving a financing contract text risk identification instruction.
Step 404: and acquiring a contract text image of the paper financing contract text.
Step 406: and recognizing the character content in the contract text image by adopting an OCR technology, and taking the recognition result as the financing contract text to be recognized.
Step 408: and splitting contract terms contained in the to-be-identified financing contract text to obtain financing term information of each contract term.
Step 410: and inputting the financing term information into a pre-trained financing term risk identification model for risk identification, and acquiring a financing term risk identification result output by the financing term risk identification model.
Specifically, the training process of the model may be implemented by steps including step S1 to step S7.
Step S1: and constructing a pre-training model based on the incidence relation between the historical financing clause information in the historical contract text and the risk marking result.
Step S2: and configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer.
Step S3: and inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text.
Step S4: adjusting parameters of the pre-training model to obtain the financial term risk identification pre-training model; wherein the parameters of the financial term risk identification pre-trained model represent weights of a neural network.
Step S5: and acquiring historical financing clause information of part of contract clauses in the historical contract text.
Step S6: carrying out risk marking processing on the historical financing clause information; wherein, the risk label is used for indicating whether the historical financing clause information is risk clause.
Step S7: and inputting a term risk recognition pre-training model for model training by taking the historical financing term information as a training sample and taking a risk marking result corresponding to the historical financing term information as a training label.
Step 412: judging whether the risk value of the financing term information contained in the financing term risk identification result is greater than a preset risk threshold value, if so, executing step 414; if not, the processing is not required.
Step 414: and highlighting the risk financing terms in the financing contract text to be identified in a preset highlighting mode.
In one embodiment provided by the application, the OCR technology is utilized to identify the character content in the text image of the paper financing contract, so that the accuracy of the identification result can be ensured, and the processing efficiency of the text information of the financing contract can be improved. The method has the advantages that the term information obtained by splitting the financial portfolio to be identified is input into the financial term risk identification model for financial term risk identification, manual intervention is not needed, the accuracy of financial term risk identification can be guaranteed, the speed of financial term risk identification can be increased, and the working efficiency is improved. In addition, model training by using the labeled text is also beneficial to improving the accuracy of the recognition result of the financial term risk recognition model on the financial term information. The risk financing terms are highlighted in the contract text in a preset highlighting mode, so that the risk financing terms can be conveniently checked and processed by workers, the risk financing terms are deleted or modified, and timely management and control of financing risks can be guaranteed.
Corresponding to the above method embodiment, the present application also provides an embodiment of a contract term risk identification apparatus, and fig. 5 shows a schematic structural diagram of the contract term risk identification apparatus according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:
a text to be recognized acquisition module 502 configured to acquire a contract text to be recognized;
a contract clause splitting module 504 configured to split contract clauses included in the contract text to be identified, and obtain clause information of each contract clause;
a clause risk identification module 506 configured to input the clause information into a pre-trained clause risk identification model for risk identification, and obtain a clause risk identification result output by the clause risk identification model;
a risk clause highlighting module 508 configured to highlight the risk clause in the contract text to be identified in a preset highlighting manner if the clause risk identification result includes a risk clause.
Optionally, the contract term risk identification apparatus further includes:
a first historical clause information acquisition module configured to: acquiring historical clause information of a plurality of contract clauses in a historical contract text;
the risk marking processing module is configured to carry out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and the model training module is configured to input a term risk identification pre-training model for model training by taking the historical term information as a training sample and taking a risk marking result corresponding to the historical term information as a training label.
Optionally, the contract term risk identification apparatus further includes:
a pre-training model building module configured to: constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
a model parameter configuration module configured to configure pre-training model parameters, the pre-training model comprising an input layer, an embedding layer;
the model pre-training module is configured to input a training text into the pre-training model for model pre-training, wherein the training text is an unlabeled text;
a model parameter adjusting module configured to adjust parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the model pre-training module includes:
a second historical clause information obtaining sub-module configured to obtain historical clause information of a plurality of contract clauses in the historical contract text;
and the model pre-training sub-module is configured to determine a target word vector of the historical clause information through the embedding layer according to a word vector dictionary, and input the target word vector into the pre-training model for pre-training of the model.
Optionally, the model pre-training sub-module is further configured to:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Optionally, the contract term risk identification apparatus further includes:
a third history clause information acquisition module configured to acquire history clause information of a plurality of contract clauses in the history contract text;
the word segmentation processing module is configured to perform word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and the word vector dictionary obtaining module is configured to calculate a word vector corresponding to each target word unit and construct the word vector dictionary according to the word vector corresponding to each target word unit.
Optionally, the contract term risk identification apparatus further includes:
the judgment module is configured to judge whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value;
if yes, operating the position query module;
the position query module is configured to determine the clause information as risk clauses and query specific positions of the risk clauses in the contract text to be identified.
Optionally, the contract term risk identification apparatus further includes:
a grading module configured to grade the risk terms into a first risk grade and a second risk grade according to the magnitude of the risk values of the risk terms;
running a first risk term deletion module if the risk terms belong to a first risk level;
the first risk clause deleting module is configured to delete the risk clauses in the contract text to generate a new contract text;
running a first semantic analysis module if the risk clause belongs to a second risk level;
the first semantic analysis module is configured to perform semantic analysis on the risk terms, generate a semantic analysis result, generate risk-free terms related to the risk terms according to the semantic analysis result, and replace the risk terms in the contract text with the risk-free terms to generate a new contract text.
Optionally, the contract term risk identification apparatus further includes:
a second risk term deletion module configured to delete the risk terms in the contract text to generate a new contract text; and/or
A second semantic analysis module configured to perform semantic analysis on the risk terms, generate a semantic analysis result, generate risk-free terms related to the risk terms according to the semantic analysis result, and replace the risk terms in the contract text with the risk-free terms to generate a new contract text.
Optionally, the to-be-recognized text obtaining module includes:
the instruction receiving submodule is configured to receive a contract text risk identification instruction;
the image acquisition sub-module is configured to acquire a contract text image of the paper contract text;
and the contract text to be recognized acquisition submodule is configured to recognize the text content in the contract text image by adopting an optical character recognition technology, and the recognition result is used as the contract text to be recognized.
In the device embodiment provided by the application, the OCR technology is utilized to identify the character content in the paper contract text image, so that the accuracy of the identification result can be ensured, and the processing efficiency of the contract text information can be improved. The clause risk identification is carried out by inputting clause information obtained by splitting the contract to be identified into the clause risk identification model without manual intervention, so that the accuracy of clause risk identification can be ensured, the speed of clause risk identification can be increased, and the working efficiency can be increased. In addition, model training with the marked text is utilized, so that the accuracy of the recognition result of the clause risk recognition model on the clause information is improved. The risk terms are highlighted in the contract text in a preset highlighting mode, so that the risk terms can be conveniently checked and processed by workers, the risk terms are deleted or modified, and the risk can be timely managed and controlled.
Fig. 6 shows a block diagram of an electronic device 600 according to an embodiment of the present application. The components of the electronic device 600 include, but are not limited to, a memory 610 and a processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to store data.
The electronic device 600 also includes an access device 640 that enables the electronic device 600 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In an embodiment of the present application, the above-mentioned components of the electronic device 600 and other components not shown in fig. 6 may also be connected to each other, for example by a bus. It should be understood that the block diagram of the electronic device shown in fig. 6 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.
The electronic device 600 may be any type of stationary or mobile electronic device, including a mobile computer or mobile electronic device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable electronic device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary electronic device such as a desktop computer or PC. The electronic device 600 may also be a mobile or stationary server.
Wherein processor 620 is configured to execute the following computer-executable instructions:
acquiring a contract text to be identified;
splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model;
and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
Optionally, the clause risk identification model is trained by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
carrying out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and inputting a clause risk recognition pre-training model for model training by taking the historical clause information as a training sample and taking a risk marking result corresponding to the historical clause information as a training label.
Optionally, the term risk identification pre-training model is pre-trained in the following manner:
constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer;
inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text;
adjusting parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the inputting of the training text into the pre-training model for pre-training of the model includes:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
and determining a target word vector of the historical clause information through the embedding layer according to the word vector dictionary, and inputting the target word vector into the pre-training model for pre-training of the model.
Optionally, the determining, by the embedding layer, a target word vector of the historical term information according to a word vector dictionary includes:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Optionally, the word vector dictionary is constructed by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and calculating a word vector corresponding to each target word unit, and constructing the word vector dictionary according to the word vector corresponding to each target word unit.
Optionally, after the step of obtaining the item risk identification result output by the item risk identification model is executed, before the step of highlighting the risk item in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
judging whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value or not;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
Optionally, the acquiring the contract text to be recognized includes:
receiving a contract text risk identification instruction;
acquiring a contract text image of a paper contract text;
and recognizing the text content in the contract text image by adopting an optical character recognition technology, and taking a recognition result as the contract text to be recognized.
Optionally, the presentation effect comprises at least one of:
bold, highlight, enlarge font, change font, underline, show by special color marking.
The above is a schematic scheme of an electronic device of the present embodiment. It should be noted that the technical solution of the electronic device and the technical solution of the contract term risk identification method described above belong to the same concept, and details that are not described in detail in the technical solution of the electronic device can be referred to the description of the technical solution of the contract term risk identification method described above.
An embodiment of the present application also provides a computer readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the contract term risk identification method as described above.
Wherein the computer readable storage medium stores computer instructions for:
acquiring a contract text to be identified;
splitting contract clauses contained in the contract text to be identified to obtain clause information of each contract clause;
inputting the clause information into a pre-trained clause risk identification model for risk identification, and acquiring a clause risk identification result output by the clause risk identification model;
and if the item risk identification result contains risk items, highlighting and displaying the risk items in the contract text to be identified in a preset highlighting mode.
Optionally, the clause risk identification model is trained by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
carrying out risk marking processing on the historical clause information; wherein the risk label is used for indicating whether the historical clause information is risk clause;
and inputting a clause risk recognition pre-training model for model training by taking the historical clause information as a training sample and taking a risk marking result corresponding to the historical clause information as a training label.
Optionally, the term risk identification pre-training model is pre-trained in the following manner:
constructing a pre-training model based on the incidence relation between the historical clause information and the risk labeling result;
configuring parameters of a pre-training model, wherein the pre-training model comprises an input layer and an embedded layer;
inputting a training text into the pre-training model to perform model pre-training, wherein the training text is a label-free text;
adjusting parameters of the pre-training model to obtain the clause risk identification pre-training model; wherein the parameters of the clause risk identification pre-trained model represent weights of a neural network.
Optionally, the inputting of the training text into the pre-training model for pre-training of the model includes:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
and determining a target word vector of the historical clause information through the embedding layer according to the word vector dictionary, and inputting the target word vector into the pre-training model for pre-training of the model.
Optionally, the determining, by the embedding layer, a target word vector of the historical term information according to a word vector dictionary includes:
acquiring a pre-established word vector dictionary;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
searching word vectors corresponding to each target word unit in the historical clause information in the word vector dictionary, and combining the word vectors to generate word vectors of the historical clause information;
pre-embedding the historical clause information to obtain a sentence vector and a position vector of the historical clause information;
and performing summation operation on the word vector, the sentence vector and the position vector of the historical clause information to obtain a target word vector of the historical clause information.
Optionally, the word vector dictionary is constructed by:
acquiring historical clause information of a plurality of contract clauses in a historical contract text;
performing word segmentation processing on the historical clause information by using a word segmentation algorithm to obtain a plurality of target word units;
and calculating a word vector corresponding to each target word unit, and constructing the word vector dictionary according to the word vector corresponding to each target word unit.
Optionally, after the step of obtaining the item risk identification result output by the item risk identification model is executed, before the step of highlighting the risk item in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
judging whether the risk value of the clause information contained in the clause risk identification result is greater than a preset risk threshold value or not;
if so, determining the clause information as risk clauses, inquiring specific positions of the risk clauses in the contract text to be identified, and performing the step of highlighting the risk clauses in the contract text to be identified in a preset highlighting manner.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
dividing the risk clauses into a first risk level and a second risk level according to the magnitude of the risk values of the risk clauses;
deleting the risk terms in the contract text to generate a new contract text if the risk terms belong to a first risk level;
and under the condition that the risk clauses belong to a second risk level, performing semantic analysis on the risk clauses to generate a semantic analysis result, generating risk-free clauses related to the risk clauses according to the semantic analysis result, and replacing the risk clauses in the contract text with the risk-free clauses to generate a new contract text.
Optionally, after the step of highlighting the risk clause in the contract text to be identified in a preset highlighting manner is executed, the method further includes:
deleting the risk terms in the contract text to generate a new contract text; and/or
Semantic analysis is carried out on the risk clauses to generate a semantic analysis result, risk-free clauses related to the risk clauses are generated according to the semantic analysis result, and the risk-free clauses are used for replacing the risk clauses in the contract text to generate a new contract text.
Optionally, the acquiring the contract text to be recognized includes:
receiving a contract text risk identification instruction;
acquiring a contract text image of a paper contract text;
and recognizing the text content in the contract text image by adopting an optical character recognition technology, and taking a recognition result as the contract text to be recognized.
Optionally, the presentation effect comprises at least one of:
bold, highlight, enlarge font, change font, underline, show by special color marking.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the contract term risk identification method described above belong to the same concept, and for details that are not described in detail in the technical solution of the storage medium, reference may be made to the description of the technical solution of the contract term risk identification method described above.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.