CN110442842A - The extracting method and device of treaty content, computer equipment, storage medium - Google Patents

The extracting method and device of treaty content, computer equipment, storage medium Download PDF

Info

Publication number
CN110442842A
CN110442842A CN201910534911.5A CN201910534911A CN110442842A CN 110442842 A CN110442842 A CN 110442842A CN 201910534911 A CN201910534911 A CN 201910534911A CN 110442842 A CN110442842 A CN 110442842A
Authority
CN
China
Prior art keywords
contract
text
participle
type
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910534911.5A
Other languages
Chinese (zh)
Inventor
张师琲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910534911.5A priority Critical patent/CN110442842A/en
Publication of CN110442842A publication Critical patent/CN110442842A/en
Priority to PCT/CN2020/093511 priority patent/WO2020253506A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of extracting method of treaty content and device, computer equipment, storage mediums.On the one hand, this method comprises: determining target contract text to be identified;The contract type of the target contract text is identified using identification model;The constructive clause content in the target contract text is extracted according to the contract type.Through the invention, the technical issues of low efficiency when extracting treaty content on a large scale in the prior art is solved.

Description

The extracting method and device of treaty content, computer equipment, storage medium
[technical field]
The present invention relates to the extracting method and device of computer field more particularly to a kind of treaty content, computer equipment, Storage medium.
[background technique]
Text identification is the common operation in artificial intelligence, can replace artificial screening body of an instrument, improves working efficiency.
In the prior art, the product temporarily without relevant contract terms automatic identification and classification, is only directed to standard form Contract text, classified by its fixed format, few intelligentized contract classification products, this needs text to be identified must It must be unified format, this is nearly impossible in complicated big data processing and analytic process.For different type or The contract text of person's UNKNOWN TYPE, can only be by being manually divided into text block one by one for text, then into known text block Content is extracted, this needs a large amount of manpower intervention, seriously affects working efficiency.
For the above problem present in the relevant technologies, at present it is not yet found that the solution of effect.
[summary of the invention]
In view of this, the embodiment of the invention provides a kind of extracting method of treaty content and device, computer equipment, depositing Storage media.
On the one hand, the embodiment of the invention provides a kind of extracting methods of treaty content, which comprises determines wait know Other target contract text;The contract type of the target contract text is identified using identification model;According to the contract type Extract the constructive clause content in the target contract text.
Optionally, before the contract type for identifying the target contract text using identification model, the method is also wrapped It includes: each of sample set contract to be sorted being segmented, the type attribute of each participle is set, calculates each participle Feature vector;Calculate prior probability of each contract to be sorted in sample set;Using the prior probability calculate each to The posterior probability of classification contract;The corresponding relationship of each contract type and posterior probability is established in the identification model.
Optionally, after being segmented to each of sample set contract to be sorted, the method also includes: it obtains Frequency of use of each participle in contract field;It selects frequency of use to be greater than the participle of preset threshold, and determines it as symbol The participle of conjunction condition.
Optionally, before obtaining frequency of use of each participle in contract field, the method also includes: it rejects and divides Part of speech is the participle of adjective, adverbial word and modal particle in word.
Optionally, calculating prior probability of each contract to be sorted in sample set includes: in training text collection DiIn Search s1,...,sn, calculate P (w1,...,wn) in training text collection DiThe secondary manifold N (y of middle appearance1,...yn), N (y1, ...yn) divided by training text collection DiIn participle total quantity, obtain P (w1,...,wn) in training text collection DiThe probability of middle appearance Collect Q (w1,...,wn);By Q (w1,...,wn) it is determined as P (w1,...,wn) in training text collection DiIn each participle wnOccur Prior probability P (w | Di), wherein P (wn) are as follows: training text collection DiMiddle attribute is wnParticiple, N (yn) are as follows: attribute wnIn training text This collection DiThe number of middle appearance;Q(wn) are as follows: attribute wnIn training text collection DiThe number of middle appearance.
It optionally, the use of the posterior probability that the prior probability calculates each contract to be sorted include: by all participles Prior probability is weighted summation, obtains the prior probability P (D of all texts to be sortedi);By P (Di)*P(xi|Di) obtained P (w1,...,wn) be determined as in training text collection DiIn posterior probability P (Di| w), wherein P (xi|Di) are as follows: DiX when generationiHair Raw probability, xiThe contract text for being i for contract type.
Optionally, extracting the constructive clause content in the target contract text according to the contract type includes: pre- If searching text layout corresponding with contract type template in database;According to the typesetting pattern of text layout's template Provision content is extracted in the designated position of the target contract text.
On the other hand, the embodiment of the invention provides a kind of extraction element of treaty content, described device comprises determining that mould Block, for determining target contract text to be identified;Identification module, for using identification model to identify the target contract text Contract type;Extraction module, for extracting the constructive clause content in the target contract text according to the contract type.
Optionally, described device further include: word segmentation module, for being used described in identification model identification in the identification module Before the contract type of target contract text, each of sample set contract to be sorted is segmented, each participle is set Type attribute, calculate the feature vector of each participle;First computing module, for calculating each contract to be sorted in sample set Prior probability in conjunction;Second computing module, for calculating the posterior probability of each contract to be sorted using the prior probability; Module is constructed, for establishing the corresponding relationship of each contract type and posterior probability in the identification model.
Optionally, the word segmentation module further include: acquiring unit, for each of sample set contract to be sorted After being segmented, frequency of use of each participle in contract field is obtained;Determination unit, for selecting frequency of use to be greater than The participle of preset threshold, and determine it as qualified participle.
Optionally, the word segmentation module further include: culling unit is being closed for obtaining each participle in the acquiring unit Before frequency of use in same domain, the participle that part of speech in participle is adjective, adverbial word and modal particle is rejected.
Optionally, first computing module includes: the first computing unit, in training text collection DiMiddle lookup s1,...,sn, calculate P (w1,...,wn) in training text collection DiThe secondary manifold N (y of middle appearance1,...yn);Second calculates list Member is used for N (y1,...yn) divided by training text collection DiIn participle total quantity, obtain P (w1,...,wn) in training text collection DiProbability set Q (the w of middle appearance1,...,wn);Determination unit is used for Q (w1,...,wn) it is determined as P (w1,...,wn) instructing Practice text set DiIn each participle wnAppearance prior probability P (w | Di), wherein P (wn) are as follows: training text collection DiMiddle attribute is wn Participle, N (yn) are as follows: attribute wnIn training text collection DiThe number of middle appearance;Q(wn) are as follows: attribute wnIn training text collection DiIn The number of appearance.
Optionally, second computing module includes: computing unit, for the prior probability of all participles to be weighted Summation, obtains the prior probability P (D of all texts to be sortedi), determination unit is used for P (Di)*P(xi|Di) obtained P (w1,...,wn) be determined as in training text collection DiIn posterior probability P (Di| w), wherein P (xi|Di) are as follows: DiX when generationiHair Raw probability, xiThe contract text for being i for contract type.
Optionally, the extraction module includes: searching unit, for searching and the contract type in the preset database Corresponding text layout's template;Extraction unit, for the typesetting pattern according to text layout's template in the target contract Extract provision content in the designated position of text.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described Step in embodiment of the method.
Through the invention, after determining target contract text to be identified, the target contract is identified using identification model The contract type of text, and then the based on contract constructive clause content in type-collection target contract text, solve existing skill When extracting treaty content in art on a large scale the technical issues of low efficiency, the identification model based on artificial intelligence can identify multiple classes The model of the contract of type can learn and adapt to the contract text of arbitrary format, save cost of human resources, the classification effect of machine Rate is higher more acurrate.
[Detailed description of the invention]
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field For those of ordinary skill, without any creative labor, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is a kind of hardware block diagram of the extraction terminal of treaty content of the embodiment of the present invention;
Fig. 2 is the flow chart of the extracting method of treaty content according to an embodiment of the present invention;
Fig. 3 is the flow chart of training identification model of the embodiment of the present invention;
Fig. 4 is the structural block diagram of the extraction element of treaty content according to an embodiment of the present invention.
[specific embodiment]
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Embodiment 1
Embodiment of the method provided by the embodiment of the present application one can mobile terminal, server, terminal or It is executed in similar arithmetic unit.For running on computer terminals, Fig. 1 is a kind of treaty content of the embodiment of the present invention Extraction terminal hardware block diagram.As shown in Figure 1, terminal 10 may include one or more (in Fig. 1 Only showing one) (processor 102 can include but is not limited to Micro-processor MCV or programmable logic device FPGA to processor 102 Deng processing unit) and memory 104 for storing data, optionally, above-mentioned terminal can also include for leading to The transmission device 106 and input-output equipment 108 of telecommunication function.It will appreciated by the skilled person that knot shown in FIG. 1 Structure is only to illustrate, and does not cause to limit to the structure of above-mentioned terminal.For example, terminal 10 may also include than figure More perhaps less component shown in 1 or with the configuration different from shown in Fig. 1.
Memory 104 can be used for storing computer program, for example, the software program and module of application software, such as this hair The corresponding computer program of the extracting method of treaty content in bright embodiment, processor 102 are stored in memory by operation Computer program in 104 realizes above-mentioned method thereby executing various function application and data processing.Memory 104 May include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory, Or other non-volatile solid state memories.In some instances, memory 104 can further comprise relative to processor 102 Remotely located memory, these remote memories can pass through network connection to terminal 10.The example of above-mentioned network Including but not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of terminal 10 provide.In an example, transmitting device 106 includes that a network is suitable Orchestration (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments from And it can be communicated with internet.In an example, transmitting device 106 can be radio frequency (Radio Frequency, abbreviation For RF) module, it is used to wirelessly be communicated with internet.
A kind of extracting method of treaty content is provided in the present embodiment, and Fig. 2 is contract according to an embodiment of the present invention The flow chart of the extracting method of content, as shown in Fig. 2, the process includes the following steps:
Step S202 determines target contract text to be identified;
The contract of the present embodiment is the agreement that establish, change or terminate civil relationship between both parties, and contract text is assisted Discuss the written or e-text formed.
Step S204 identifies the contract type of the target contract text using identification model;
Contract type refers to industry described in contract or law article, and the content of different types of contract, agreement is different, contract item Money is also different, and the text formatting of same type of contract text is identical, and the contract type of the present embodiment includes labour contract, dealing Contract, the contract of gift, loan contract, the contract of lease of property, contract for construction project etc..
Step S206 extracts the constructive clause content in the target contract text according to the contract type.
Scheme through this embodiment, after determining target contract text to be identified, identified using identification model described in The contract type of target contract text, and then the based on contract constructive clause content in type-collection target contract text solve When extracting treaty content on a large scale in the prior art the technical issues of low efficiency, the identification model based on artificial intelligence can know The model of the contract of not multiple types can learn and adapt to the contract text of arbitrary format, save cost of human resources, machine Classification effectiveness it is higher more acurrate.
The identification model of the present embodiment can be to be obtained by training, is also possible to set.It is used in training In sample set, the single sample that uses is contract text, and the contract type of known contract text, and is shifted to an earlier date to it Manual identification, in the training process, the input of identification model is target contract text, is exported as the contract of the target contract text Type.
Before the contract type for identifying the target contract text using identification model, it is also necessary to locally use sample Training identification model, Fig. 3 are the flow charts of training identification model of the embodiment of the present invention, as shown in Figure 3, comprising:
S302 segments each of sample set contract to be sorted, the type attribute of each participle is arranged, and counts Calculate the feature vector of each participle;
Optionally, after being segmented to each of sample set contract to be sorted, further includes: obtain each participle Frequency of use in contract field;It selects frequency of use to be greater than the participle of preset threshold, and determines it as qualified Participle.Frequency of use refers to using temperature, higher using temperature, and frequency of use is also higher.
In a preferred embodiment of this embodiment, it is also necessary to remove meaningless participle word in text to be sorted, this A little word frequency of use are high but without practical significance, are the general words of the contract text of multiple types, will not influence knowledge after rejecting The performance of other model, but the treating capacity of sample data can be reduced, training for promotion efficiency is obtaining each participle in contract field In frequency of use before, further includes: reject participle in part of speech be adjective, adverbial word and modal particle participle.
After obtaining qualified participle set, each participle (text or word) occurred in classifying text is treated siClassify according to type attribute w, belongs to wnParticiple be sn;Wherein wnFor the type attribute of participle.Specifically use comentropy Each participle is quantified as feature vector.
S304 calculates prior probability of each contract to be sorted in sample set;
In an embodiment of the present embodiment, prior probability packet of each contract to be sorted in sample set is calculated It includes: in training text collection DiMiddle lookup s1,...,sn, calculate P (w1,...,wn) in training text collection DiThe secondary manifold of middle appearance N(y1,...yn), N (y1,...yn) divided by training text collection DiThe middle sum by rejecting keyword after meaningless word pre-processes Amount, obtains P (w1,...,wn) in training text collection DiProbability set Q (the w of middle appearance1,...,wn);By Q (w1,...,wn) determine For P (w1,...,wn) in training text collection DiIn each participle wnAppearance prior probability P (w | Di), wherein P (wn) are as follows: training Text set DiMiddle attribute is wnParticiple, N (yn) are as follows: attribute wnIn training text collection DiThe number of middle appearance;Q(wn) are as follows: attribute wnIn training text collection DiThe number of middle appearance.
S306 calculates the posterior probability of each contract to be sorted using the prior probability;
In an embodiment of the present embodiment, the posteriority for calculating each contract to be sorted using the prior probability is general Rate includes: that the prior probability of all participles is weighted summation, obtains the prior probability P (D of all texts to be sortedi);By P (Di)*P(xi|Di) obtained P (w1,...,wn) be determined as in training text collection DiIn posterior probability P (Di| w), wherein P (xi |Di) are as follows: DiX when generationiThe probability of generation, xiThe contract text for being i for contract type.
Due to P (x | DiWhen)=0, when some characteristic item does not occur under some classification, this phenomenon will be generated, This can enable classifier quality substantially reduce.In order to solve this problem, Laplace calibration is introduced, item number under every classification (is closed Same textual data) count is incremented, in this way if when training sample set quantity is sufficiently big, result can't be had an impact, and keep away The scene that said frequencies are 0 is exempted from.
The realization of this embodiment scheme is based on naive Bayesian principle: for the item to be sorted provided, solving and goes out at this The probability that each classification occurs under conditions of existing, which is maximum, is considered as which classification this item to be sorted belongs to.For popular, Like so a reason, you see a Black people in the street, I asks that you guess what where this nabs came, you most likely guess non- Continent.Why, because of African ratio highest in Black people, other is also likely to be American or Asian certainly, but is not being had Have under other available informations, we understand the classification of alternative condition maximum probability, and here it is the idea basis of naive Bayesian.
S308 establishes the corresponding relationship of each contract type and posterior probability in the identification model.
In the present embodiment, identify that the contract type of the target contract text includes using training using identification model Obtained identification model, is classified automatically.The contract text of each type is subjected to semantic participle, is converted to feature vector, Feature vector is input to identification model, identification model identifies it, and available each contract text is some classification Probability, export the type identification of the contract text of each type, select probability is highest as final model.
In one example, deal contract, the contract of gift, the type identification of loan contract are respectively 00,01,02, are passed through Identification model calculates, and the probability of output is respectively as follows: 45%, 47%, 86%, then exports 02.Contract type is without being limited thereto, contract Type can also include: deal contract, the contract of gift, loan contract, the contract of lease of property, contract for construction project etc..
Optionally, extracting the constructive clause content in the target contract text according to the contract type includes: pre- If searching text layout corresponding with contract type template in database;According to the typesetting pattern of text layout's template Provision content is extracted in the designated position of the target contract text.It is identified according to the category, designated position is gone to go to extract clause Content, the clause that different types of contract text is included is different, even if including same clause, clause is in contract text Position is also different.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
Additionally provide a kind of extraction element of treaty content in the present embodiment, the device for realizing above-described embodiment and Preferred embodiment, the descriptions that have already been made will not be repeated.As used below, predetermined function may be implemented in term " module " The combination of the software and/or hardware of energy.It is hard although device described in following embodiment is preferably realized with software The realization of the combination of part or software and hardware is also that may and be contemplated.
Fig. 4 is the structural block diagram of the extraction element of treaty content according to an embodiment of the present invention, as shown in figure 4, the device Include:
Determining module 40, for determining target contract text to be identified;
Identification module 42, for identifying the contract type of the target contract text using identification model;
Extraction module 44, for extracting the constructive clause content in the target contract text according to the contract type.
Optionally, described device further include: word segmentation module, for being used described in identification model identification in the identification module Before the contract type of target contract text, each of sample set contract to be sorted is segmented, each participle is set Type attribute, calculate the feature vector of each participle;First computing module, for calculating each contract to be sorted in sample set Prior probability in conjunction;Second computing module, for calculating the posterior probability of each contract to be sorted using the prior probability; Module is constructed, for establishing the corresponding relationship of each contract type and posterior probability in the identification model.
Optionally, the word segmentation module further include: acquiring unit, for each of sample set contract to be sorted After being segmented, frequency of use of each participle in contract field is obtained;Determination unit, for selecting frequency of use to be greater than The participle of preset threshold, and determine it as qualified participle.
Optionally, the word segmentation module further include: culling unit is being closed for obtaining each participle in the acquiring unit Before frequency of use in same domain, the participle that part of speech in participle is adjective, adverbial word and modal particle is rejected.
Optionally, first computing module includes: the first computing unit, in training text collection DiMiddle lookup s1,...,sn, calculate P (w1,...,wn) in training text collection DiThe secondary manifold N (y of middle appearance1,...yn);Second calculates list Member is used for N (y1,...yn) divided by training text collection DiIn participle total quantity, obtain P (w1,...,wn) in training text collection DiProbability set Q (the w of middle appearance1,...,wn);Determination unit is used for Q (w1,...,wn) it is determined as P (w1,...,wn) instructing Practice text set DiIn each participle wnAppearance prior probability P (w | Di), wherein P (wn) are as follows: training text collection DiMiddle attribute is wn Participle, N (yn) are as follows: attribute wnIn training text collection DiThe number of middle appearance;Q(wn) are as follows: attribute wnIn training text collection DiIn The number of appearance.
Optionally, second computing module includes: computing unit, is used for training text collection DiIn quantity of documents remove Prior probability P (D is obtained with the sum of entire training text collectioni), determination unit is used for P (Di)*P(xi|Di) obtained P (w1,...,wn) be determined as in training text collection DiIn posterior probability P (Di| w), wherein P (xi|Di) are as follows: DiX when generationiHair Raw probability, xiThe contract text for being i for contract type.
Optionally, the extraction module includes: searching unit, for searching and the contract type in the preset database Corresponding text layout's template;Extraction unit, for the typesetting pattern according to text layout's template in the target contract Extract provision content in the designated position of text.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any Combined form is located in different processors.
Embodiment 3
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or group Part can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown Or the mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, device or unit it is indirect Coupling or communication connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that device (can be personal computer, server or network equipment etc.) or processor (Processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1 determines target contract text to be identified;
S2 identifies the contract type of the target contract text using identification model;
S3 extracts the constructive clause content in the target contract text according to the contract type.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read- Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard The various media that can store computer program such as disk, magnetic or disk.
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 determines target contract text to be identified;
S2 identifies the contract type of the target contract text using identification model;
S3 extracts the constructive clause content in the target contract text according to the contract type.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (10)

1. a kind of extracting method of treaty content, which is characterized in that the described method includes:
Determine target contract text to be identified;
The contract type of the target contract text is identified using identification model;
The constructive clause content in the target contract text is extracted according to the contract type.
2. the method according to claim 1, wherein identifying the target contract text using identification model Before contract type, the method also includes:
Each of sample set contract to be sorted is segmented, the type attribute of each participle is set, calculates each participle Feature vector;
Calculate prior probability of each contract to be sorted in sample set;
The posterior probability of each contract to be sorted is calculated using the prior probability;
The corresponding relationship of each contract type and posterior probability is established in the identification model.
3. according to the method described in claim 2, it is characterized in that, dividing to each of sample set contract to be sorted After word, the method also includes:
Obtain frequency of use of each participle in contract field;
It selects frequency of use to be greater than the participle of preset threshold, and determines it as qualified participle.
4. according to the method described in claim 2, it is characterized in that, obtaining frequency of use of each participle in contract field Before, the method also includes
Reject the participle that part of speech in participle is adjective, adverbial word and modal particle.
5. according to the method described in claim 2, it is characterized in that, calculating priori of each contract to be sorted in sample set Probability includes:
In training text collection DiMiddle lookup s1,...,sn, calculate P (w1,...,wn) in training text collection DiThe number of middle appearance Collect N (y1,...yn);By N (y1,...yn) divided by training text collection DiIn participle total quantity, obtain P (w1,...,wn) instructing Practice text set DiProbability set Q (the w of middle appearance1,...,wn);By Q (w1,...,wn) it is determined as P (w1,...,wn) in training text Collect DiIn each participle wnAppearance prior probability P (w | Di), wherein P (wn) are as follows: training text collection DiMiddle attribute is wnParticiple, N(yn) are as follows: attribute wnIn training text collection DiThe number of middle appearance;Q(wn) are as follows: attribute wnIn training text collection DiTime of middle appearance Number.
6. according to the method described in claim 2, it is characterized in that, calculating each contract to be sorted using the prior probability Posterior probability includes:
The prior probability of all participles is weighted summation, obtains the prior probability P (D of all texts to be sortedi);By P (Di)*P(xi|Di) obtained P (w1,...,wn) be determined as in training text collection DiIn posterior probability P (Di| w), wherein P (xi |Di) are as follows: DiX when generationiThe probability of generation, xiThe contract text for being i for contract type.
7. the method according to claim 1, wherein extracting the target contract text according to the contract type In constructive clause content include:
Text layout corresponding with contract type template is searched in the preset database;
Provision content is extracted in the designated position of the target contract text according to the typesetting pattern of text layout's template.
8. a kind of extraction element of treaty content, which is characterized in that described device includes:
Determining module, for determining target contract text to be identified;
Identification module, for identifying the contract type of the target contract text using identification model;
Extraction module, for extracting the constructive clause content in the target contract text according to the contract type.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer storage medium, is stored thereon with computer program, which is characterized in that the computer program is located The step of reason device realizes method described in any one of claims 1 to 7 when executing.
CN201910534911.5A 2019-06-20 2019-06-20 The extracting method and device of treaty content, computer equipment, storage medium Pending CN110442842A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910534911.5A CN110442842A (en) 2019-06-20 2019-06-20 The extracting method and device of treaty content, computer equipment, storage medium
PCT/CN2020/093511 WO2020253506A1 (en) 2019-06-20 2020-05-29 Contract content extraction method and apparatus, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910534911.5A CN110442842A (en) 2019-06-20 2019-06-20 The extracting method and device of treaty content, computer equipment, storage medium

Publications (1)

Publication Number Publication Date
CN110442842A true CN110442842A (en) 2019-11-12

Family

ID=68428235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910534911.5A Pending CN110442842A (en) 2019-06-20 2019-06-20 The extracting method and device of treaty content, computer equipment, storage medium

Country Status (2)

Country Link
CN (1) CN110442842A (en)
WO (1) WO2020253506A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046629A (en) * 2019-12-16 2020-04-21 北大方正集团有限公司 Outline display method, device and equipment
CN111078871A (en) * 2019-11-21 2020-04-28 深圳前海环融联易信息科技服务有限公司 Method and system for automatically classifying contracts based on artificial intelligence
CN111274782A (en) * 2020-02-25 2020-06-12 平安科技(深圳)有限公司 Text auditing method and device, computer equipment and readable storage medium
CN111814457A (en) * 2020-05-30 2020-10-23 国网上海市电力公司 Power grid engineering contract text generation method
WO2020253506A1 (en) * 2019-06-20 2020-12-24 平安科技(深圳)有限公司 Contract content extraction method and apparatus, and computer device and storage medium
CN116306573A (en) * 2023-03-15 2023-06-23 广联达科技股份有限公司 Intelligent analysis method, device and equipment for engineering practice and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391772A (en) * 2017-09-15 2017-11-24 国网四川省电力公司眉山供电公司 A kind of file classification method based on naive Bayesian
CN108830443A (en) * 2018-04-19 2018-11-16 出门问问信息科技有限公司 A kind of contract review method and device
CN109190594A (en) * 2018-09-21 2019-01-11 广东蔚海数问大数据科技有限公司 Optical Character Recognition system and information extracting method
CN109739985A (en) * 2018-12-26 2019-05-10 斑马网络技术有限公司 Automatic document classification method, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045825B (en) * 2015-06-29 2018-05-01 中国地质大学(武汉) A kind of multinomial naive Bayesian file classification method of structure extension
JP6776805B2 (en) * 2016-10-24 2020-10-28 富士通株式会社 Character recognition device, character recognition method, character recognition program
CN110442842A (en) * 2019-06-20 2019-11-12 平安科技(深圳)有限公司 The extracting method and device of treaty content, computer equipment, storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391772A (en) * 2017-09-15 2017-11-24 国网四川省电力公司眉山供电公司 A kind of file classification method based on naive Bayesian
CN108830443A (en) * 2018-04-19 2018-11-16 出门问问信息科技有限公司 A kind of contract review method and device
CN109190594A (en) * 2018-09-21 2019-01-11 广东蔚海数问大数据科技有限公司 Optical Character Recognition system and information extracting method
CN109739985A (en) * 2018-12-26 2019-05-10 斑马网络技术有限公司 Automatic document classification method, equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020253506A1 (en) * 2019-06-20 2020-12-24 平安科技(深圳)有限公司 Contract content extraction method and apparatus, and computer device and storage medium
CN111078871A (en) * 2019-11-21 2020-04-28 深圳前海环融联易信息科技服务有限公司 Method and system for automatically classifying contracts based on artificial intelligence
CN111046629A (en) * 2019-12-16 2020-04-21 北大方正集团有限公司 Outline display method, device and equipment
CN111046629B (en) * 2019-12-16 2022-03-01 北大方正集团有限公司 Outline display method, device and equipment
CN111274782A (en) * 2020-02-25 2020-06-12 平安科技(深圳)有限公司 Text auditing method and device, computer equipment and readable storage medium
WO2021169208A1 (en) * 2020-02-25 2021-09-02 平安科技(深圳)有限公司 Text review method and apparatus, and computer device, and readable storage medium
CN111274782B (en) * 2020-02-25 2023-10-20 平安科技(深圳)有限公司 Text auditing method and device, computer equipment and readable storage medium
CN111814457A (en) * 2020-05-30 2020-10-23 国网上海市电力公司 Power grid engineering contract text generation method
CN116306573A (en) * 2023-03-15 2023-06-23 广联达科技股份有限公司 Intelligent analysis method, device and equipment for engineering practice and readable storage medium

Also Published As

Publication number Publication date
WO2020253506A1 (en) 2020-12-24

Similar Documents

Publication Publication Date Title
CN110442842A (en) The extracting method and device of treaty content, computer equipment, storage medium
CN108629413B (en) Neural network model training and transaction behavior risk identification method and device
CN109189901B (en) Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN107835496B (en) Spam short message identification method and device and server
CN109561322A (en) A kind of method, apparatus, equipment and the storage medium of video audit
CN109471938A (en) A kind of file classification method and terminal
CN108090508A (en) A kind of classification based training method, apparatus and storage medium
CN106777232A (en) Question and answer abstracting method, device and terminal
CN110287328A (en) A kind of file classification method, device, equipment and computer readable storage medium
CN106228389A (en) Network potential usage mining method and system based on random forests algorithm
CN108416032A (en) A kind of file classification method, device and storage medium
CN110069630B (en) Improved mutual information feature selection method
CN106897290B (en) Method and device for establishing keyword model
CN107145516A (en) A kind of Text Clustering Method and system
CN111159404B (en) Text classification method and device
CN107766860A (en) Natural scene image Method for text detection based on concatenated convolutional neutral net
CN109739985A (en) Automatic document classification method, equipment and storage medium
CN109145108A (en) Classifier training method, classification method, device and computer equipment is laminated in text
CN107067022B (en) Method, device and equipment for establishing image classification model
CN108205676A (en) The method and apparatus for extracting pictograph region
CN108334895A (en) Sorting technique, device, storage medium and the electronic device of target data
CN109446300A (en) A kind of corpus preprocess method, the pre- mask method of corpus and electronic equipment
CN102411592B (en) Text classification method and device
CN107229614A (en) Method and apparatus for grouped data
CN106569996A (en) Chinese-microblog-oriented emotional tendency analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination