CN110516036A - Legal documents information extracting method, device, computer equipment and storage medium - Google Patents

Legal documents information extracting method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110516036A
CN110516036A CN201910651409.2A CN201910651409A CN110516036A CN 110516036 A CN110516036 A CN 110516036A CN 201910651409 A CN201910651409 A CN 201910651409A CN 110516036 A CN110516036 A CN 110516036A
Authority
CN
China
Prior art keywords
words
emphasis
legal documents
keyword
heavy duty
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910651409.2A
Other languages
Chinese (zh)
Inventor
戴广宇
周萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910651409.2A priority Critical patent/CN110516036A/en
Publication of CN110516036A publication Critical patent/CN110516036A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents

Abstract

The present invention relates to data analysis field, a kind of legal documents information extracting method, device, computer equipment and storage medium are disclosed, in method, obtain Law Text to be resolved;Heavy duty word is extracted from the Law Text using heavy duty word extracting rule, and emphasis set of words is formed based on the heavy duty word;The emphasis set of words is matched with multiple preset keyword sets respectively, calculate the matching degree of the emphasis set of words Yu each keyword set, each keyword set includes at least one keyword, and a keyword set is associated with a default resolution rules;It obtains and the associated default resolution rules of the highest keyword set of matching degree;Key message is extracted from the Law Text according to the default resolution rules.Legal documents information extracting method provided by the invention can effectively extract key message from legal documents, improve the efficiency that user obtains information from legal documents.

Description

Legal documents information extracting method, device, computer equipment and storage medium
Technical field
The present invention relates to data analysis fields more particularly to a kind of legal documents information extracting method, device, computer to set Standby and storage medium.
Background technique
Legal documents are the texts that administrative organization of justice and party, lawyer etc. are used when solving lawsuit and non-lawsuit part Book also includes the non-standard file of judicial authority.Legal documents had both included normalization and two kinds of non-standard.
Existing legal documents are generally stored in the form of a file in the database of server, and user searches related content Legal documents when, retrieved in the database.However, the legal documents that existing database is saved, generally only provide The index of some simple retrieval entries, such as appellant, appellee, district court.Although this mode can also be to use Family provides certain help, and user is helped to find its required legal documents.However, under normal circumstances, user passes through inspection The mode of rope legal documents quantity obtained is very huge, and user needs that a large amount of energy is spent to extract from from legal documents Oneself required information, extraction efficiency are very low.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide a kind of legal documents information extracting method, device, calculating Machine equipment and storage medium improve user and obtain information from legal documents to extract key message from legal documents Efficiency.
A kind of legal documents information extracting method, comprising:
Obtain Law Text to be resolved;
Heavy duty word is extracted from the Law Text using heavy duty word extracting rule, and heavy duty word is formed based on the heavy duty word Set;
The emphasis set of words is matched with multiple preset keyword sets respectively, calculates the emphasis set of words and each The matching degree of a keyword set, each keyword set include at least one keyword, a key Set of words is associated with a default resolution rules;
It obtains and the associated default resolution rules of the highest keyword set of matching degree;
Key message is extracted from the Law Text according to the default resolution rules.
A kind of legal documents information extracting device, comprising:
Module is obtained, for obtaining Law Text to be resolved;
Heavy duty word collection modules for extracting heavy duty word from the Law Text using heavy duty word extracting rule, and are based on The heavy duty word forms emphasis set of words;
Matching degree computing module is counted for matching the emphasis set of words with multiple preset keyword sets respectively The matching degree of the emphasis set of words Yu each keyword set is calculated, each keyword set includes at least one Keyword, a keyword set are associated with a default resolution rules;
Resolution rules module is obtained, is advised for obtaining with the associated default parsing of the highest keyword set of matching degree Then;
Information module is extracted, for extracting key message from the Law Text according to the default resolution rules.
A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned legal documents information extraction side when executing the computer program Method.
A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes above-mentioned legal documents information extracting method when being executed by processor.
Above-mentioned legal documents information extracting method, device, computer equipment and storage medium obtain law text to be resolved This, to obtain text to be treated.Heavy duty word is extracted from the Law Text using heavy duty word extracting rule, and based on described Heavy duty word forms emphasis set of words, to extract one group of heavy duty word (i.e. emphasis set of words) from Law Text.By the emphasis Set of words is matched with multiple preset keyword sets respectively, calculates the emphasis set of words and each keyword set Matching degree, each keyword set include at least one keyword, a keyword set and a default solution Rule association is analysed, to calculate Law Text (due to can not directly calculate the matching degree of Law Text and keyword set, In Herein using the matching degree of the calculating emphasis word combination extracted from Law Text and keyword set as Law Text With the matching degree of keyword set) with the matching degree of each keyword set.It obtains and is closed with the highest keyword set of matching degree The default resolution rules of connection, to obtain the resolution rules for being suitble to processing Law Text.According to the default resolution rules from Key message is extracted in the Law Text, to extract required key message from Law Text.It is provided by the invention Legal documents information extracting method can effectively extract key message from legal documents, improve user from legal documents The middle efficiency for obtaining information.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is an application environment schematic diagram of legal documents information extracting method in one embodiment of the invention;
Fig. 2 is a flow diagram of legal documents information extracting method in one embodiment of the invention;
Fig. 3 is a flow diagram of legal documents information extracting method in one embodiment of the invention;
Fig. 4 is a flow diagram of legal documents information extracting method in one embodiment of the invention;
Fig. 5 is a flow diagram of legal documents information extracting method in one embodiment of the invention;
Fig. 6 is a flow diagram of legal documents information extracting method in one embodiment of the invention;
Fig. 7 is a flow diagram of legal documents information extracting method in one embodiment of the invention;
Fig. 8 is a structural schematic diagram of legal documents information extracting device in one embodiment of the invention;
Fig. 9 is a structural schematic diagram of legal documents information extracting device in one embodiment of the invention;
Figure 10 is a schematic diagram of computer equipment in one embodiment of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Legal documents information extracting method provided in this embodiment, can be applicable in the application environment such as Fig. 1, wherein visitor Family end is communicated by network with server-side.Wherein, client includes but is not limited to various personal computers, notebook electricity Brain, smart phone, tablet computer and portable wearable device.Server-side can use the either multiple services of independent server The server cluster of device composition is realized.
In one embodiment, it as shown in Fig. 2, providing a kind of legal documents information extracting method, is applied in this way in Fig. 1 In server-side for be illustrated, include the following steps:
S10, Law Text to be resolved is obtained.
Law Text can be the content of the paragragh extracted from legal documents here, be also possible to law text The content of multiple paragraghs in book.Here, legal documents include but is not limited to report type document, notice class document, judgement class Document, determines class document at ruling class document.Law Text to be resolved can refer to the case in legal documents by, trial pass through, case The facts is real, handles reason (opinion), requests reason (opinion) etc..And production organ, language title, number in legal documents, The descriptive entries content such as party's basic condition, signature, date is parsed in combination with existing method, although the present embodiment The method of offer can also the descriptive entry content to case parse.But in practical applications, these descriptive entry contents It is clear with the relationship of legal documents, have no the needs further parsed.
S20, heavy duty word is extracted from the Law Text using heavy duty word extracting rule, and weight is formed based on the heavy duty word Point set of words.
In the present embodiment, heavy duty word extracting rule from passage for extracting some common law words, such as: The first sentence accepts, fulfils, rejecting, deciding, compensating, providing, reviewing, legal person's main body etc..
In one example, Law Text is as follows:
Review applicant XXXX limited liability company because with review defending party to the application XX management company, XXX, XXX marine hull insurance Contract dispute a case refuses to obey that (XXXX) the X people that the provincial high people's courts XX makes in XXXX XX month XX day supervise the word X X people Thing judgement is applied reviewing to the court.The court forms collegiate bench in accordance with the law and is examined this case, has examined and has finished.
The heavy duty word that can be extracted from above-mentioned Law Text includes: to review applicant, review the defending party to the application, contract and entangle Confusingly, higher people's court, civil judgment, application review, collegiate bench, examination.Then it is formed by the heavy duty word extracted following Emphasis set of words:
{ applicant is reviewed, the defending party to the application, contract dispute, higher people's court, civil judgment, application review, panel discussion are reviewed Front yard examines }.
S30, the emphasis set of words is matched with multiple preset keyword sets respectively, calculates the emphasis set of words With the matching degree of each keyword set, each keyword set includes at least one keyword, described in one Keyword set is associated with a default resolution rules.
Multiple keyword sets can be previously provided with.Each keyword set includes the keyword of at least one.Keyword Set associative has Law Text label.For example, the keyword set that Law Text label is " case by " can indicate are as follows: appeal, Plaintiff, the bill of complaint prosecute day ... };And the keyword set that Law Text label is " judgement " can indicate are as follows: trial decision, Court verdict, as a result ... }.
Calculate separately the matching degree of emphasis set of words Yu each keyword set.For example, have n keyword set, including A1、A2、A3……An, then the matching degree of n emphasis set of words and keyword set can be calculated, e.g., emphasis set of words and pass Keyword set A1Matching degree be λ1, emphasis set of words and keyword set A2Matching degree be λ2... ... emphasis set of words and pass Keyword set AnMatching degree be λn
The matching degree of emphasis set of words and keyword set can then be distinguished by each heavy duty word in emphasis set of words With the keyword calculate correlation coefficient in keyword set, then calculate two set matching degrees.
S40, acquisition and the associated default resolution rules of the highest keyword set of matching degree.
In the present embodiment, each keyword set is associated with a default resolution rules.For example, with Law Text label For the associated default resolution rules of keyword set of " case by ", it is mainly used for the relevant content of text of parsing " case by ";And with Law Text label is the associated default resolution rules of keyword set of " judgement ", is mainly used for parsing " judgement " relevant text This content.
The highest keyword set of matching degree can be chosen according to calculated matching degree, and is obtained highest with matching degree The associated default resolution rules of keyword.The default resolution rules got are best suited for parsing the parsing of current Law Text Rule.
S50, key message is extracted from the Law Text according to the default resolution rules.
In the present embodiment, default resolution rules can be based on NLP (natural language processing) model and form text resolution Engine can extract key message from Law Text.Such as dedicated for the relevant default resolution rules of processing " case by " The key message that can be extracted from Law Text includes but is not limited to:
Case is by 1;
Case is by 2;
Case is by 3.
In some cases, the key message that default resolution rules can be extracted from Law Text further comprises reference method Item, reviews results change information at court verdict.
In step S10-S50, Law Text to be resolved is obtained, to obtain text to be treated.It is mentioned using heavy duty word It takes rule to extract heavy duty word from the Law Text, and emphasis set of words is formed based on the heavy duty word, from Law Text Extract one group of heavy duty word (i.e. emphasis set of words).By the emphasis set of words respectively with multiple preset keyword sets Match, calculate the matching degree of the emphasis set of words Yu each keyword set, each keyword set include to A few keyword, a keyword set is associated with a default resolution rules, to calculate Law Text (due to nothing Method directly calculates the matching degree of Law Text and keyword set, extracts from Law Text using calculating here Matching degree of the matching degree of emphasis word combination and keyword set as Law Text and keyword set) and each keyword set The matching degree of conjunction.Acquisition and the associated default resolution rules of the highest keyword set of matching degree are suitble to processing to obtain The resolution rules of Law Text.Key message is extracted from the Law Text according to the default resolution rules, with from law Required key message is extracted in text.
Optionally, as shown in figure 3, step S10 includes:
S101, legal documents are obtained;
The document type of S102, the identification legal documents;
S103, it obtains and the division rule of the document type matching;
S104, the legal documents are divided by least one described law text to be resolved according to the division rule This.
, can be from position acquisition legal documents be locally stored in the present embodiment, it can also be from specified network storage location Obtain legal documents.Under normal conditions, acquired legal documents are the file stored with text formatting, as legal documents can be with It is the file saved with doc, docx, txt, rtf format.
Here, the document type of legal documents includes but is not limited to court verdict, award, letter of decision, notice, prosecution Book.The document type of legal documents can be identified according to the title of legal documents.For example, a legal documents is entitled " Supreme People's Court of the PRC's paper of civil judgment " can identify accordingly due to including " court verdict " in the title The document type of the legal documents is " court verdict " out.It in some cases, can be wider by a range in conjunction with actual conditions Document type be subdivided into the lesser document type of multiple ranges, such as " court verdict " can be subdivided into " paper of civil judgment ", " punishment Thing court verdict ", " administrative judgment book ".It in some cases, can be " written judgment of first instance ", " two trials by document Type division Certainly book ", " Supreme Court's court verdict " etc..
Each document type is at least matched with a kind of division rule.For example, the division of " court verdict " and " award " is advised It then there is larger difference, award content is generally less, the law to be resolved text that corresponding division rule can mark off This negligible amounts;And the content of court verdict is more, corresponding division rule can mark off multiple law texts to be resolved This.
Legal documents can be marked off at least one Law Text to be resolved according to the division rule got.Generally In the case of, the text of a paragragh in legal documents can be divided into a Law Text to be resolved, at other In the case of, if the information that division rule determines that the text of multiple paragraghs illustrates is related to the same Law Text label, These paragraghs can be divided into a Law Text to be resolved.
In step S101-S104, legal documents are obtained, to mark off Law Text to be resolved from legal documents.Identification The document type of the legal documents, to determine the document type of legal documents, it is generally the case that the type of legal documents is very More, the division rule of different document types is not identical, determines that the document type of legal documents can be preferably from legal documents In effectively mark off appropriate Law Text.The division rule with the document type matching is obtained, is suitable for the method to obtain Restrain the division rule of document.The legal documents are divided at least one described law to be resolved according to the division rule Legal documents are divided into the Law Text convenient for parsing, improve the accuracy of parsing by text.
Optionally, as shown in figure 4, step S20 includes:
S201, default emphasis set of words is obtained, includes at least one emphasis vocabulary in the default emphasis set of words;
S202, the Law Text handle as lexical set according to presetting word segmentation regulation, the lexical set is including more A vocabulary to be identified;
S203, judge in the default emphasis set of words with the presence or absence of the heavy duty word with the terminology match to be identified It converges;
S204, if it exists the emphasis vocabulary with the terminology match to be identified, then by with the institute in the emphasis set of words The matched vocabulary to be identified of emphasis vocabulary is stated labeled as heavy duty word;
S205, the emphasis set of words is added in the heavy duty word.
In the present embodiment, presetting emphasis set of words includes some common law words, as can be to certain sample size Legal documents are parsed, and the frequency of occurrences of law word is counted, and therefrom choose a certain number of law words as default Emphasis vocabulary in emphasis set of words.Default emphasis set of words can be stored in advance in the memory with server-side.
Default word segmentation regulation is that the use habit based on language-specific is multiple to be identified by the sentence cutting in Law Text Vocabulary, and form lexical set.For example, a Law Text is Chinese, a sentence wherein included are as follows: " Zhang San refuses to obey a trial Certainly, it puts forward appeal to Shenzhen intermediate court ".Default word segmentation regulation can be handled the sentence as following lexical set: and Zhang San, disobedience, The first sentence, judgement, to Shenzhen intermediate court lifts, appeal }.Wherein, which amounts to including 8 vocabulary to be identified.Ordinary circumstance Under, a Law Text can be processed into the lexical set comprising multiple vocabulary to be identified by default word segmentation regulation.Due to one Law Text generally includes multiple sentences, and the lexical set handled may include tens or several hundred vocabulary to be identified.
Judge with the presence or absence of the emphasis vocabulary with terminology match to be identified in default emphasis set of words, if it exists with it is to be identified The emphasis vocabulary of terminology match then will be labeled as heavy duty word with the matched vocabulary to be identified of emphasis vocabulary in emphasis set of words, Then heavy duty word is added in emphasis set of words.Here, if vocabulary to be identified is identical as emphasis vocabulary, it may be considered that wait know Other vocabulary and emphasis terminology match.In one example, lexical set is that { Zhang San refuses to obey, the first sentence, adjudicates, to Shenzhen intermediate court mentions Rising, appeal, " first sentence " therein, " judgement ", " Shenzhen intermediate court ", " appeal " are present in default emphasis set of words, then through emphasis The emphasis set of words obtained after the processing of word extracting rule are as follows: { first sentence, judgement, Shenzhen intermediate court, appeal }.
In step S201-S205, default emphasis set of words is obtained, includes at least one weight in the default emphasis set of words Point vocabulary, to load default emphasis set of words.The Law Text is handled as lexical set according to default word segmentation regulation, it is described Lexical set includes multiple vocabulary to be identified, to extract vocabulary to be identified from Law Text.Judge the default heavy duty word With the presence or absence of the emphasis vocabulary with the terminology match to be identified in set, to judge whether vocabulary to be identified is emphasis word It converges.If it exists with the emphasis vocabulary of the terminology match to be identified, then by with the emphasis vocabulary in the emphasis set of words The matched vocabulary to be identified is labeled as heavy duty word, to obtain the heavy duty word in Law Text.Institute is added in the heavy duty word Emphasis set of words is stated, to obtain the emphasis set of words that can be used for calculating Law Text Yu keyword set matching degree.
Optionally, as shown in figure 5, step S30 includes:
S301, the matching rule for obtaining designated key set of words, the designated key word set are combined into multiple keywords One in set;
S302, each heavy duty word in the emphasis set of words and the specified pass are determined respectively according to the matching rule The incidence coefficient of keyword in keyword set;
S303, the matching degree that the emphasis set of words Yu the designated key set of words are calculated according to the incidence coefficient.
Present embodiments provide a kind of calculation method of the matching degree of emphasis set of words and keyword set.Each keyword Gather a corresponding matching rule.Matching rule can preset the incidence coefficient of heavy duty word and keyword.
For ease of description, the calculating of the matching degree of emphasis set of words and designated key set of words is listed here Journey.For example, as follows in the corresponding matching rule of a designated key set of words:
The corresponding matching rule of keyword set is specified in 1 one embodiment of table
Keyword 1 Keyword 2 …… Keyword y
Heavy duty word 1 Incidence coefficient t11 Incidence coefficient t12 …… Incidence coefficient t1y
Heavy duty word 2 Incidence coefficient t21 Incidence coefficient t22 …… Incidence coefficient t2y
…… …… …… …… ……
Heavy duty word x Incidence coefficient tx1 Incidence coefficient tx2 …… Incidence coefficient txy
Wherein, the heavy duty word in the as default emphasis set of words of heavy duty word with Keywords matching in matching rule.
It can be searched in the corresponding matching rule of designated key set of words according to the heavy duty word for including in emphasis set of words Corresponding one group of incidence coefficient out.For example, including heavy duty word 2 in emphasis set of words, then incidence coefficient corresponding to heavy duty word 2 is { incidence coefficient t11, incidence coefficient t12... incidence coefficient t1y}.If including s heavy duty word in emphasis set of words, can obtain Obtain s group incidence coefficient.The summation of all incidence coefficients can be calculated, and summation is denoted as matching degree.In some cases, it matches Degree can also be the average of incidence coefficient.The calculation method of matching degree can be set according to the actual needs.
It is worth noting that, may include identical keyword in different keyword sets, for example, keyword set A It can include keyword " appeal " with keyword set B.And same heavy duty word from it is different pass keyword sets in same passes The incidence coefficient of keyword can be different.For example, heavy duty word a and the keyword " appeal " in keyword set A are associated with system Number is 0.1, and the incidence coefficient of heavy duty word a and the keyword " appeal " in keyword set B is 0.5.
In step S301-S303, the matching rule of designated key set of words is obtained, the designated key word set is combined into multiple One in the keyword set, to obtain the matching rule for calculation stress set of words and designated key set of words matching degree Then.It is determined in each heavy duty word and the designated key set of words in the emphasis set of words respectively according to the matching rule Keyword incidence coefficient, to obtain the incidence coefficient of each heavy duty word in emphasis set of words.According to the incidence coefficient The matching degree of the emphasis set of words Yu the designated key set of words is calculated, to obtain Law Text and designated key set of words Matching degree.
Optionally, as shown in fig. 6, after step S20, further includes:
S21, the emphasis set of words is inputted into preset analytic modell analytical model with true-to-shape, obtains the preset parsing mould The Law Text label of type output;
S22, it obtains and the default resolution rules of the Law Text tag match;
S23, key message is extracted from the Law Text according to the default resolution rules.
In the present embodiment, true-to-shape refers to the format that preset analytic modell analytical model can identify.If heavy duty word set symbol Emphasis set of words, then can be directly inputted to preset parsing by the input format (i.e. true-to-shape) for closing preset analytic modell analytical model Model.Each Law Text label is matched with a default resolution rules, and if Law Text label is " case by ", then obtaining can be with Parsing case by default resolution rules.Since legal documents are a kind of stronger texts of normalization, more can make under normal conditions With some clause, default resolution rules can be set based on this, presetting includes multiple parsing sub-rules in resolution rules, with from method Key message is extracted in rule text.
Preset analytic modell analytical model can be the disaggregated model built-up based on machine learning algorithm, can such as select decision Tree algorithm, k- nearest neighbor algorithm etc..Can be prepared in advance the training sample with Law Text label, input analytic modell analytical model in into Row training.When the accuracy of the output result of analytic modell analytical model meets preset requirement, then training can be terminated, and will last time The analytic modell analytical model obtained after training is as the preset analytic modell analytical model for being used for design emphasis set of words.
In step S21-S23, the emphasis set of words is inputted into preset analytic modell analytical model with true-to-shape, is obtained described pre- If analytic modell analytical model output Law Text label, to obtain the label (i.e. Law Text label) of Law Text to be resolved. The default resolution rules with the Law Text tag match are obtained, to obtain the resolution rules for being suitable for parsing Law Text.Root Key message is extracted from the Law Text according to the default resolution rules, to obtain the key message of Law Text.
Optionally, before step S21, comprising:
S211, the training sample for having Law Text label is obtained, the training sample includes multiple heavy duty words;
S212, it will be trained in training sample input analytic modell analytical model;
After S213, training, the analytic modell analytical model after training is determined as the preset analytic modell analytical model.
In the present embodiment, here, training sample can refer to the emphasis set of words extracted from Law Text, Mei Gexun Practice sample and has a Law Text label.Training sample can derive from existing legal documents.It can prepare in the following manner Training sample: being first divided into Law Text for legal documents by the division rule of above-mentioned legal documents, then again with above-mentioned default Each Law Text handle as corresponding emphasis set of words by participle, then is the corresponding Law Text of each emphasis set of words addition (mode manually added can be used) in label, and the emphasis set of words after adding law text label can be used as training sample.Often A training sample includes multiple heavy duty words.
Corresponding analytic modell analytical model can be constructed based on machine learning algorithm, training sample is inputted in analytic modell analytical model and is changed Generation training.Here, the algorithm of the supervised learnings classes such as decision tree, k- neighbour can be used in analytic modell analytical model.
It (is parsed when frequency of training reaches preset value or when the prediction result accuracy of model reaches preset threshold The Law Text label of model output is identical as the Law Text label of the sample, i.e., labeled as correct;If it is different, being then labeled as Mistake), then training can be terminated, and the analytic modell analytical model after training will be terminated as preset analytic modell analytical model.
In step S211-S213, the training sample for having Law Text label is obtained, the training sample includes multiple heavy Point word, to obtain the training sample of building analytic modell analytical model.The training sample is inputted in analytic modell analytical model and is trained, to pass through Training, improves the treatment effect of analytic modell analytical model (better discriminate between out the Law Text label of Law Text).After training, Analytic modell analytical model after training is determined as the preset analytic modell analytical model, to obtain the Law Text mark that can parse Law Text The preset analytic modell analytical model of label.
Optionally, as shown in fig. 7, after step S50, further includes:
S51, judge whether there is also the Law Texts to be resolved for not extracting key message in the legal documents;
If continuing to obtain there is also the Law Text to be resolved for not extracting key message in S52, the legal documents The Law Text to be resolved for not extracting key message, and from the Law Text to be resolved for not extracting key message Middle extraction key message;
If there is no not extracting the Law Text to be resolved of key message in S53, the legal documents, it will be from each The mark associated storage of the key message and the legal documents that are extracted in the Law Text to be resolved.
In the present embodiment, all Law Texts to be resolved in a legal documents will be extracted, is obtained all The key message of Law Text to be resolved, then by the mark associated storage of key message and legal documents.In one example, The key message extracted from a legal documents may include content as shown in Table 2:
The key message of 2 one legal documents of table
Field name Field contents
Applicant
The defending party to the application
Main suit asks
Case is by 1
Case is by 2
Court verdict 1
Court verdict 2
……
Key message after extracted, can be with the mark associated storages of legal documents in the database.When user need from When obtaining information in legal documents, the content of legal documents can be quickly understood from key message, without browsing law text Book full text, substantially increases the efficiency that user obtains information from legal documents.
In step S51-S53, judge whether there is also the methods to be resolved for not extracting key message in the legal documents Text is restrained, to ensure that all Law Texts to be resolved that legal documents mark off extract key message.If the law There is also the Law Text to be resolved for not extracting key message in document, then continue to obtain it is described do not extract key message to The Law Text of parsing, and key message is extracted from the Law Text to be resolved for not extracting key message, to obtain The key message of the Law Text of key message is not extracted.If in the legal documents there is no do not extract key message wait solve The Law Text of analysis, then by the key message extracted from each Law Text to be resolved and law text The mark associated storage of book is convenient for user query to store the key message extracted.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
In one embodiment, a kind of legal documents information extracting device is provided, the legal documents information extracting device with it is upper Legal documents information extracting method in embodiment is stated to correspond.As shown in figure 8, the legal documents information extracting device includes obtaining Modulus block 10, matching degree computing module 30, obtains resolution rules module 40 and extracts information module heavy duty word collection modules 20 50.Detailed description are as follows for each functional module:
Module 10 is obtained, for obtaining Law Text to be resolved;
Heavy duty word collection modules 20, for extracting heavy duty word, and base from the Law Text using heavy duty word extracting rule Emphasis set of words is formed in the heavy duty word;
Matching degree computing module 30, for the emphasis set of words to be matched with multiple preset keyword sets respectively, The matching degree of the emphasis set of words Yu each keyword set is calculated, each keyword set includes at least one A keyword;One keyword set is associated with a default resolution rules;
Resolution rules module 40 is obtained, for obtaining and the associated default parsing of the highest keyword set of matching degree Rule;
Information module 50 is extracted, for extracting key message from the Law Text according to the default resolution rules.
Optionally, as shown in figure 9, acquisition module 10 includes:
Document unit 101 is obtained, for obtaining legal documents;
Identify document type units 102, for identification the document type of the legal documents;
Division rule unit 103 is obtained, for obtaining and the division rule of the document type matching;
Division unit 104, for the legal documents to be divided into described at least one wait solve according to the division rule The Law Text of analysis.
Optionally, heavy duty word collection modules 20 include:
Heavy duty word aggregation units are obtained, include extremely in the default emphasis set of words for obtaining default emphasis set of words Few emphasis vocabulary;
Participle unit, for being handled the Law Text for lexical set, the word finder according to default word segmentation regulation Closing includes multiple vocabulary to be identified;
Vocabulary judging unit whether there is and the terminology match to be identified in the default emphasis set of words for judging The emphasis vocabulary;
Marking unit then will be with the emphasis word set for the emphasis vocabulary if it exists with the terminology match to be identified The matched vocabulary to be identified of the emphasis vocabulary in conjunction is labeled as heavy duty word;
Aggregation units are formed, for the emphasis set of words to be added in the heavy duty word.
Optionally, matching degree computing module 30 includes:
Matching rule unit is obtained, for obtaining the matching rule of designated key set of words, the designated key set of words For one in multiple keyword sets;
Determine incidence coefficient unit, it is each heavy in the emphasis set of words for being determined respectively according to the matching rule The incidence coefficient of keyword in point word and the designated key set of words;
Matching degree unit is calculated, for calculating the emphasis set of words and the designated key word according to the incidence coefficient The matching degree of set.
Optionally, legal documents information extracting device further include:
Label model is obtained, for the emphasis set of words to be inputted preset analytic modell analytical model with true-to-shape, obtains institute State the Law Text label of preset analytic modell analytical model output;
Rule module is obtained, for obtaining and the default resolution rules of the Law Text tag match;
Extraction module, for extracting key message from the Law Text according to the default resolution rules.
Optionally, legal documents information extracting device further include:
Judgement extracts and finishes module, for judge in the legal documents whether there is also do not extract key message wait solve The Law Text of analysis;
Continue extraction module, if for there is also the law texts to be resolved for not extracting key message in the legal documents This, then continue to obtain the Law Text to be resolved for not extracting key message, and from it is described do not extract key message to Key message is extracted in the Law Text of parsing;
Memory module, if for the Law Text to be resolved for not extracting key message to be not present in the legal documents, Then the key message extracted from each Law Text to be resolved is associated with the mark of the legal documents Storage.
Specific restriction about legal documents information extracting device may refer to above for legal documents information extraction The restriction of method, details are not described herein.Modules in above-mentioned legal documents information extracting device can be fully or partially through Software, hardware and combinations thereof are realized.Above-mentioned each module can be embedded in the form of hardware or independently of the place in computer equipment It manages in device, can also be stored in a software form in the memory in computer equipment, in order to which processor calls execution or more The corresponding operation of modules.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 10.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing data involved in legal documents information extracting method.The network interface of the computer equipment For being communicated with external terminal by network connection.To realize a kind of legal documents when the computer program is executed by processor Information extracting method.
In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor perform the steps of when executing computer program
Obtain Law Text to be resolved;
Heavy duty word is extracted from the Law Text using heavy duty word extracting rule, and heavy duty word is formed based on the heavy duty word Set;
The emphasis set of words is matched with multiple preset keyword sets respectively, calculates the emphasis set of words and each The matching degree of a keyword set, each keyword set include at least one keyword, a key Set of words is associated with a default resolution rules;
It obtains and the associated default resolution rules of the highest keyword set of matching degree;
Key message is extracted from the Law Text according to the default resolution rules.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor
Obtain Law Text to be resolved;
Heavy duty word is extracted from the Law Text using heavy duty word extracting rule, and heavy duty word is formed based on the heavy duty word Set;
The emphasis set of words is matched with multiple preset keyword sets respectively, calculates the emphasis set of words and each The matching degree of a keyword set, each keyword set include at least one keyword, a key Set of words is associated with a default resolution rules;
It obtains and the associated default resolution rules of the highest keyword set of matching degree;
Key message is extracted from the Law Text according to the default resolution rules.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of legal documents information extracting method characterized by comprising
Obtain Law Text to be resolved;
Heavy duty word is extracted from the Law Text using heavy duty word extracting rule, and emphasis word set is formed based on the heavy duty word It closes;
The emphasis set of words is matched with multiple preset keyword sets respectively, calculates the emphasis set of words and each institute The matching degree of keyword set is stated, each keyword set includes at least one keyword, a keyword set Conjunction is associated with a default resolution rules;
It obtains and the associated default resolution rules of the highest keyword set of matching degree;
Key message is extracted from the Law Text according to the default resolution rules.
2. legal documents information extracting method as described in claim 1, which is characterized in that described to obtain law text to be resolved Before this, comprising:
Obtain legal documents;
Identify the document type of the legal documents;
Obtain the division rule with the document type matching;
The legal documents are divided at least one described Law Text to be resolved according to the division rule.
3. legal documents information extracting method as described in claim 1, which is characterized in that described to utilize heavy duty word extracting rule Heavy duty word is extracted from the Law Text, and emphasis set of words is formed based on the heavy duty word, comprising:
Default emphasis set of words is obtained, includes at least one emphasis vocabulary in the default emphasis set of words;
The Law Text is handled as lexical set according to default word segmentation regulation, the lexical set includes multiple words to be identified It converges;
Judge in the default emphasis set of words with the presence or absence of the emphasis vocabulary with the terminology match to be identified;
If it exists with the emphasis vocabulary of the terminology match to be identified, then by with the emphasis vocabulary in the emphasis set of words The matched vocabulary to be identified is labeled as heavy duty word;
The emphasis set of words is added in the heavy duty word.
4. legal documents information extracting method as described in claim 1, which is characterized in that described by the emphasis set of words point It is not matched with multiple keyword sets, calculates the matching degree of the emphasis set of words Yu each keyword set, the pass Keyword set includes the keyword of at least one, comprising:
The matching rule of designated key set of words is obtained, the designated key word set is combined into one in multiple keyword sets It is a;
Determine each heavy duty word and the designated key set of words in the emphasis set of words respectively according to the matching rule In keyword incidence coefficient;
The matching degree of the emphasis set of words Yu the designated key set of words is calculated according to the incidence coefficient.
5. legal documents information extracting method as described in claim 1, which is characterized in that described to utilize heavy duty word extracting rule Heavy duty word is extracted from the Law Text, and is formed after emphasis set of words based on the heavy duty word, further includes:
The emphasis set of words is inputted into preset analytic modell analytical model with true-to-shape, obtains the preset analytic modell analytical model output Law Text label;
Obtain the default resolution rules with the Law Text tag match;
Key message is extracted from the Law Text according to the default resolution rules.
6. legal documents information extracting method as claimed in claim 2, which is characterized in that described to be advised according to the default parsing Then after extracting key message in the Law Text, further includes:
Judge whether there is also the Law Texts to be resolved for not extracting key message in the legal documents;
If continuing not mention described in acquisition there is also the Law Text to be resolved for not extracting key message in the legal documents The Law Text to be resolved of key message is taken, and extracts and closes from the Law Text to be resolved for not extracting key message Key information;
If there is no not extracting the Law Text to be resolved of key message in the legal documents, it will be from each described wait solve The mark associated storage of the key message and the legal documents that are extracted in the Law Text of analysis.
7. a kind of legal documents information extracting device characterized by comprising
Module is obtained, for obtaining Law Text to be resolved;
Heavy duty word collection modules, for extracting heavy duty word from the Law Text using heavy duty word extracting rule, and based on described Heavy duty word forms emphasis set of words;
Matching degree computing module calculates institute for matching the emphasis set of words with multiple preset keyword sets respectively The matching degree of emphasis set of words Yu each keyword set is stated, each keyword set includes at least one key Word;One keyword set is associated with a default resolution rules;
Resolution rules module is obtained, for obtaining and the associated default resolution rules of the highest keyword set of matching degree;
Information module is extracted, for extracting key message from the Law Text according to the default resolution rules.
8. legal documents information extracting device as claimed in claim 7, which is characterized in that the acquisition module includes:
Document unit is obtained, for obtaining legal documents;
Identify document type units, for identification the document type of the legal documents;
Division rule unit is obtained, for obtaining and the division rule of the document type matching;
Division unit, for the legal documents to be divided at least one described law to be resolved according to the division rule Text.
9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to Any one of 6 legal documents information extracting methods.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization legal documents information extraction side as described in any one of claim 1 to 6 when the computer program is executed by processor Method.
CN201910651409.2A 2019-07-18 2019-07-18 Legal documents information extracting method, device, computer equipment and storage medium Pending CN110516036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910651409.2A CN110516036A (en) 2019-07-18 2019-07-18 Legal documents information extracting method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910651409.2A CN110516036A (en) 2019-07-18 2019-07-18 Legal documents information extracting method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110516036A true CN110516036A (en) 2019-11-29

Family

ID=68622831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910651409.2A Pending CN110516036A (en) 2019-07-18 2019-07-18 Legal documents information extracting method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110516036A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597307A (en) * 2020-05-18 2020-08-28 山西大学 Judicial judgment reasoning method based on interpretable causal model
CN111798344A (en) * 2020-07-01 2020-10-20 北京金堤科技有限公司 Method and device for determining subject name, electronic equipment and storage medium
CN111796830A (en) * 2020-06-08 2020-10-20 成都数之联科技有限公司 Protocol analysis processing method, device, equipment and medium
CN112199466A (en) * 2020-09-08 2021-01-08 深圳价值在线信息科技股份有限公司 Method and device for identifying associated regulation of mail
CN113190667A (en) * 2021-05-12 2021-07-30 北京律联东方文化传播有限公司 Legal data query method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050210047A1 (en) * 2004-03-18 2005-09-22 Zenodata Corporation Posting data to a database from non-standard documents using document mapping to standard document types
CN106815208A (en) * 2015-12-01 2017-06-09 北京国双科技有限公司 The analysis method and device of law judgement document
CN107784041A (en) * 2016-08-31 2018-03-09 北京国双科技有限公司 Judgement document's case by acquisition methods and device
CN108874928A (en) * 2018-05-31 2018-11-23 平安科技(深圳)有限公司 Resume data information analyzing and processing method, device, equipment and storage medium
CN109271512A (en) * 2018-08-29 2019-01-25 中国平安保险(集团)股份有限公司 The sentiment analysis method, apparatus and storage medium of public sentiment comment information
WO2019034957A1 (en) * 2017-08-17 2019-02-21 International Business Machines Corporation Domain-specific lexically-driven pre-parser

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050210047A1 (en) * 2004-03-18 2005-09-22 Zenodata Corporation Posting data to a database from non-standard documents using document mapping to standard document types
CN106815208A (en) * 2015-12-01 2017-06-09 北京国双科技有限公司 The analysis method and device of law judgement document
CN107784041A (en) * 2016-08-31 2018-03-09 北京国双科技有限公司 Judgement document's case by acquisition methods and device
WO2019034957A1 (en) * 2017-08-17 2019-02-21 International Business Machines Corporation Domain-specific lexically-driven pre-parser
CN108874928A (en) * 2018-05-31 2018-11-23 平安科技(深圳)有限公司 Resume data information analyzing and processing method, device, equipment and storage medium
CN109271512A (en) * 2018-08-29 2019-01-25 中国平安保险(集团)股份有限公司 The sentiment analysis method, apparatus and storage medium of public sentiment comment information

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597307A (en) * 2020-05-18 2020-08-28 山西大学 Judicial judgment reasoning method based on interpretable causal model
CN111796830A (en) * 2020-06-08 2020-10-20 成都数之联科技有限公司 Protocol analysis processing method, device, equipment and medium
CN111796830B (en) * 2020-06-08 2023-09-19 成都数之联科技股份有限公司 Protocol analysis processing method, device, equipment and medium
CN111798344A (en) * 2020-07-01 2020-10-20 北京金堤科技有限公司 Method and device for determining subject name, electronic equipment and storage medium
CN111798344B (en) * 2020-07-01 2023-09-22 北京金堤科技有限公司 Principal name determining method and apparatus, electronic device, and storage medium
CN112199466A (en) * 2020-09-08 2021-01-08 深圳价值在线信息科技股份有限公司 Method and device for identifying associated regulation of mail
CN112199466B (en) * 2020-09-08 2024-04-12 深圳价值在线信息科技股份有限公司 Method and device for identifying associated rule of mail
CN113190667A (en) * 2021-05-12 2021-07-30 北京律联东方文化传播有限公司 Legal data query method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110516036A (en) Legal documents information extracting method, device, computer equipment and storage medium
Medelyan et al. Topic indexing with Wikipedia
CN108509482B (en) Question classification method and device, computer equipment and storage medium
US20190394238A1 (en) IT compliance and request for proposal (RFP) management
CN108038096A (en) Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing
AU2021212025B2 (en) Intelligent question answering on tabular content
CN109344234A (en) Machine reads understanding method, device, computer equipment and storage medium
CN111767716A (en) Method and device for determining enterprise multilevel industry information and computer equipment
Leivaditi et al. A benchmark for lease contract review
CN108681977B (en) Lawyer information processing method and system
CN108446295A (en) Information retrieval method, device, computer equipment and storage medium
Drazewski et al. A corpus for multilingual analysis of online terms of service
Al-Azzawy et al. Arabic words clustering by using K-means algorithm
Pirovani et al. Studying the adaptation of Portuguese NER for different textual genres
Mets et al. Automated stance detection in complex topics and small languages: the challenging case of immigration in polarizing news media
Aggarwal et al. CLAUSEREC: A Clause Recommendation Framework for AI-aided Contract Authoring
US20210406290A1 (en) Methods and systems for performing legal brief analysis
CN113590792A (en) User problem processing method and device and server
Yang et al. Transfer learning over big data
Schumann et al. Query-Based Retrieval of German Regulatory Documents for Internal Auditing Purposes
Balaji et al. Finding related research papers using semantic and co-citation proximity analysis
Chen et al. Chinese named entity abbreviation generation using first-order logic
Gope et al. Knowledge Extraction from Bangla Documents: A Case Study
CN115329742B (en) Scientific research project output evaluation acceptance method and system based on text analysis
Vitório et al. Building a Relevance Feedback Corpus for Legal Information Retrieval in the Real-Case Scenario of the Brazilian Chamber of Deputies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191129

RJ01 Rejection of invention patent application after publication