CN109992664A - Mark classification method, device, computer equipment and the storage medium of central issue - Google Patents

Mark classification method, device, computer equipment and the storage medium of central issue Download PDF

Info

Publication number
CN109992664A
CN109992664A CN201910185041.5A CN201910185041A CN109992664A CN 109992664 A CN109992664 A CN 109992664A CN 201910185041 A CN201910185041 A CN 201910185041A CN 109992664 A CN109992664 A CN 109992664A
Authority
CN
China
Prior art keywords
focus
central issue
type
model
focus type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910185041.5A
Other languages
Chinese (zh)
Other versions
CN109992664B (en
Inventor
朱昱锦
徐国强
邱寒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910185041.5A priority Critical patent/CN109992664B/en
Publication of CN109992664A publication Critical patent/CN109992664A/en
Application granted granted Critical
Publication of CN109992664B publication Critical patent/CN109992664B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Technology Law (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application involves field of artificial intelligence, are applied to smart city industry, more particularly to mark classification method, device, computer equipment and the storage medium of a kind of central issue.Method in one embodiment includes: to obtain focus locating rule and judgement document's sample, focal position mark is carried out to judgement document's sample according to focus locating rule, obtain judgement document's sample with mark, extract the central issue in judgement document's sample with mark, and obtain the list of types of disaggregated model and central issue, according to disaggregated model and the list of types of central issue, classify to the central issue of extraction, the automatic classification results for the central issue extracted.Auto-focusing point may be implemented in this way and be labeled classification, the workload of professional mark personnel can be greatly decreased, classification is labeled by model focus point, the mark that can effectively shorten is classified the period, and the working efficiency of mark classification is improved.

Description

Mark classification method, device, computer equipment and the storage medium of central issue
Technical field
This application involves field of artificial intelligence, more particularly to a kind of mark classification method of central issue, device, Computer equipment and storage medium.
Background technique
Judgement document is for recording people's court's hearing process and as a result, being carrier and the people of lawsuit action result Law court is determining and distributes the voucher of party's substantive right obligation.
Traditional, it needs manually to carry out positioning classification to the central issue in judgement document, due to strongly professional, training It is difficult, the features such as mark amount is big, often more difficult to be completed in a short time task, lead to mark classification work low efficiency.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide a kind of mark of central issue that can be improved working efficiency Infuse classification method, device, computer equipment and storage medium.
A kind of mark classification method of central issue, which comprises
Focus locating rule and judgement document's sample are obtained, the focus locating rule is used to position the coke of judgement document Point position;
Focal position mark is carried out to judgement document's sample according to the focus locating rule, obtains the sanction with mark Sentence document sample;
The central issue in judgement document's sample of band mark is extracted, and obtains disaggregated model and central issue List of types;
According to the disaggregated model and the list of types of the central issue, classify to the central issue of extraction, Obtain the automatic classification results of the central issue of the extraction.
In one embodiment, the disaggregated model includes rule model and similarity model, described according to point The list of types of class model and the central issue classifies to the central issue of extraction, obtains the dispute of the extraction The automatic classification results of focus, comprising:
According to the central issue of the list of types of the central issue, the rule model and extraction, the rule are obtained The then corresponding focus type candidate collection of model;
According to the central issue of the list of types of the central issue, the similarity model and extraction, obtain described The corresponding focus type candidate collection of similarity model;
According to the corresponding focus type candidate collection of the rule model and the corresponding focus type of the similarity model Candidate collection, the corresponding focus type of the central issue extracted.
In one embodiment, described according to the list of types of the central issue, the rule model and extraction Central issue obtains the corresponding focus type candidate collection of the rule model, comprising:
It is removed redundancy word and the processing of shared word by central issue of the rule model to extraction, and is carried out synonymous Word conversion process;
According to the list of types of the central issue, central issue all types of in processed central issue is counted, it is raw Coking vertex type keyword dictionary;
The focus sentence in judgement document's sample is obtained, redundancy word is removed to the focus sentence and synonym turns Change processing;
The focus descriptor in processed focus sentence is obtained, the focus descriptor is successively closed with the focus type Keyword dictionary is compared, and is obtained each focus descriptor and is belonged to the general of central issue type in the focus type keyword dictionary Rate value;
The focus type keyword dictionary is traversed, probability value is chosen and is greater than the central issue type of preset threshold as rule The then focus type candidate collection of model.
In one embodiment, described according to the list of types of the central issue, the similarity model and extraction Central issue, obtain the corresponding focus type candidate collection of the similarity model, comprising:
Redundancy word is removed to the central issue extracted by similarity model and shared word is handled, and is carried out synonymous Word conversion process;
According to the list of types of the central issue, central issue all types of in processed central issue is counted, it is raw At focus keyword;
Each central issue in the focus keyword is converted into central issue term vector, with central issue term vector Mean value obtains focus type sentence vector as central issue sentence vector;
The focus sentence in judgement document's sample is obtained, the focus sentence is converted into focus and describes sentence vector;
The focus is described into sentence vector and traverses the focus type sentence vector, focus type is obtained by similarity calculation Distance value, selected distance value meets focus type candidate collection of the focus type as similarity model of preset condition.
In one embodiment, described according to the corresponding focus type candidate collection of the rule model and the similarity The corresponding focus type candidate collection of model, the corresponding focus type of the central issue extracted, comprising:
Seek the corresponding focus type candidate collection of the rule model and the corresponding focus type of the similarity model The intersection of candidate collection;
When the number for handing over focalization type is 1, using the focus type in the intersection as the dispute extracted The corresponding focus type of focus.
In one embodiment, described to seek the corresponding focus type candidate collection of the rule model and the similarity After the intersection of the corresponding focus type candidate collection of model, further includes:
When the number for handing over focalization type is greater than 1, the probability value of each focus type and corresponding is obtained Distance value;
The probability value of each focus type and the product of respective distances value inverse are calculated, by the maximum focus class of product Type is as the corresponding focus type of central issue extracted.
In one embodiment, described to seek the corresponding focus type candidate collection of the rule model and the similarity After the intersection of the corresponding focus type candidate collection of model, further includes:
When the number for handing over focalization type is 0, the probability value of each focus type is obtained;
Using the maximum focus type of probability value as the corresponding focus type of central issue extracted.
A kind of mark sorter of central issue, described device include:
Rule acquisition module, for obtaining focus locating rule and judgement document's sample, the focus locating rule is used In the focal position of positioning judgement document;
Sample labeling module, for carrying out focal position mark to judgement document's sample according to the focus locating rule Note obtains judgement document's sample with mark;
Focus extraction module, the central issue in judgement document's sample for extracting the band mark, and obtain classification The list of types of model and central issue;
Classification of focal spot module, for the list of types according to the disaggregated model and the central issue, to extraction Central issue is classified, and the automatic classification results of the central issue of the extraction are obtained.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device performs the steps of when executing the computer program
Focus locating rule and judgement document's sample are obtained, the focus locating rule is used to position the coke of judgement document Point position;
Focal position mark is carried out to judgement document's sample according to the focus locating rule, obtains the sanction with mark Sentence document sample;
The central issue in judgement document's sample of band mark is extracted, and obtains disaggregated model and central issue List of types;
According to the disaggregated model and the list of types of the central issue, classify to the central issue of extraction, Obtain the automatic classification results of the central issue of the extraction.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor It is performed the steps of when row
Focus locating rule and judgement document's sample are obtained, the focus locating rule is used to position the coke of judgement document Point position;
Focal position mark is carried out to judgement document's sample according to the focus locating rule, obtains the sanction with mark Sentence document sample;
The central issue in judgement document's sample of band mark is extracted, and obtains disaggregated model and central issue List of types;
According to the disaggregated model and the list of types of the central issue, classify to the central issue of extraction, Obtain the automatic classification results of the central issue of the extraction.
Mark classification method, device, computer equipment and the storage medium of above-mentioned central issue, by obtaining focus positioning Rule and judgement document's sample carry out focal position mark to judgement document's sample according to focus locating rule, obtain band mark Judgement document's sample of note extracts the central issue in judgement document's sample with mark, and obtains disaggregated model and dispute The list of types of focus is classified to the central issue of extraction, is obtained according to disaggregated model and the list of types of central issue To the automatic classification results of the central issue of extraction, auto-focusing point may be implemented in this way and be labeled classification, can substantially subtract The workload of few profession mark personnel, is labeled classification by model focus point, and the mark that can effectively shorten is classified the period, mentions The working efficiency of height mark classification.
Detailed description of the invention
Fig. 1 is the flow diagram of the mark classification method of central issue in one embodiment;
Fig. 2 is the flow diagram of classification of focal spot step in one embodiment;
Fig. 3 is the flow diagram of rule model focus type Candidate Set obtaining step in one embodiment;
Fig. 4 is the flow diagram of similarity model focus type Candidate Set obtaining step in one embodiment;
Fig. 5 is the flow diagram of focus type determination step in one embodiment;
Fig. 6 is the structural block diagram of the mark sorter of central issue in one embodiment;
Fig. 7 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
In one embodiment, as shown in Figure 1, providing a kind of mark classification method of central issue, including following step It is rapid:
Step 102, focus locating rule and judgement document's sample are obtained, focus locating rule is for positioning judgement document Focal position.
Focus locating rule refers to the rule for positioning focal position in judgement document, and focus locating rule can pass through Expert provides.Focus locating rule specifically can be " ' focus ' printed words can be had before central issue content ", " central issue appearance At judgement document end ", " having law court's viewpoint behind central issue to show to support or do not support the focus " etc..Focus positioning rule Independence is not needed then, also need not be accurate, fuzzy performance enhances generalization ability.
Judgement document's sample refer to central issue carry out automatic marking classification based training when judgement document, such as ten thousand parts with Judgement document's training sample of upper magnitude.Document central issue in judgement document's training sample was not both positioned, and was not also had Labeled classification.
Step 104, focal position mark is carried out to judgement document's sample according to focus locating rule, obtains the sanction with mark Sentence document sample.
It is the label equation described using program language by focus locating rule transcription, passes through the generation in third party's open source library Model obtains judgement document's sample with mark, generates model for investigating different label equations, by generating model output The weight and prediction result of label equation.For example, being examined by the Generative Model in third party's open source library Snorkel Examine different label equations.The character string of the methods of regular expression, Keywords matching processing input can be used in label equation And return the result, such as a sentence in input judgement document, one two classification value of corresponding output, 1 indicates that the sentence is dispute Focus describes sentence, and it is that central issue describes sentence that 0, which indicates the sentence not,.
Step 106, the central issue in judgement document's sample with mark is extracted, and obtains disaggregated model and dispute coke The list of types of point.
The list of types of central issue refers to the inventory including various central issue types, and the list of types of central issue can To be provided by expert, for example judgement document's case is determined first by, i.e. document classification, then to provide the dispute occurred under such document burnt The list of types of point.List of types specifically can be with phrase form, such as " XXX identification ", " XXX identification " etc., specifically can be with It is " legal relation recognizes (class) ", " authenticity identification (class) " etc..Disaggregated model refers to for focus description to be mapped to correspondence Focus type model.
In one embodiment, the judgement document with mark can be extracted by the deep neural network model trained Central issue in sample.For example, the embeding layer of deep neural network model switchs to feature for sentence is inputted using Transformer Then vector passes through the coding of two-way LSTM (Long Short-Term Memory, shot and long term memory network), finally uses CRF algorithm (Conditional Random Field, condition random field algorithm) is judged.Deep neural network mould Type has robustness to label noise, thus be applicable in by third party increase income library Snorkel it is unsupervised predict obtain band mark Judgement document's sample.By being trained to deep neural network model, can extract in judgement document's sample with mark Central issue.
Step 108, according to disaggregated model and the list of types of central issue, classify to the central issue of extraction, The automatic classification results for the central issue extracted.
According to the focus type in the list of types of central issue, divided by central issue of the disaggregated model to extraction Class exports the corresponding focus type of central issue of extraction.Automatic classification results refer to that the central issue automatically generated is corresponding Focus type.
The mark classification method of above-mentioned central issue, by obtaining focus locating rule and judgement document's sample, according to Focus locating rule carries out focal position mark to judgement document's sample, obtains judgement document's sample with mark, extracts band mark Central issue in judgement document's sample of note, and the list of types of disaggregated model and central issue is obtained, according to classification mould The list of types of type and central issue classifies to the central issue of extraction, and automatic point of the central issue extracted The workload of professional mark personnel can be greatly decreased as a result, auto-focusing point may be implemented be labeled classification in this way in class, logical It crosses model focus point and is labeled classification, the mark that can effectively shorten is classified the period, and the working efficiency of mark classification is improved.
In one embodiment, disaggregated model includes rule model and similarity model, as shown in Fig. 2, according to classification The list of types of model and central issue classifies to the central issue of extraction, the central issue extracted it is automatic Classification results, comprising: step 202, according to the list of types of central issue, rule model and the central issue of extraction, obtain The corresponding focus type candidate collection of rule model;Step 204, it according to the list of types of central issue, similarity model and mentions The central issue taken obtains the corresponding focus type candidate collection of similarity model;Step 206, according to the corresponding coke of rule model Vertex type candidate collection and the corresponding focus type candidate collection of similarity model, the corresponding focus class of the central issue extracted Type.Rule model refers to based on rule is preset, and obtains the model that central issue belongs to the probability value of certain class focus type, phase Refer to the model that central issue Yu each focus type distance value are obtained by similarity calculation like degree model.
In one embodiment, as shown in figure 3, striving according to the list of types of central issue, rule model and extraction Focus is discussed, the corresponding focus type candidate collection of rule model is obtained, comprising: step 302, the dispute by rule model to extraction Focus is removed redundancy word and the processing of shared word, and carries out synonym conversion process;Step 304, according to the class of central issue Type list counts central issue all types of in processed central issue, generates focus type keyword dictionary;Step 306, The focus sentence in judgement document's sample is obtained, focus point sentence is removed redundancy word and synonym conversion process;Step 308, The focus descriptor in processed focus sentence is obtained, focus descriptor is successively compared with focus type keyword dictionary Compared with obtaining the probability value that each focus descriptor belongs to central issue type in focus type keyword dictionary;Step 310, it traverses Focus type keyword dictionary chooses probability value and is greater than focus type of the central issue type of preset threshold as rule model Candidate collection.
Rule model removes redundancy word and shared word first, and makees synonym conversion, then counts each focus type packet The keyword contained generates focus type keyword dictionary.For using legal relation identification as focus type, such focus type Including keyword can be debtor-creditor relationship, investment relation, cooperative relationship, unjustified enrichment, debt, credits etc., different focal point class The corresponding keyword of type is preferably no or rare intersection.When input focus sentence, focus point sentence is removed redundancy word and same Adopted word conversion process.Successively compare listed candidate word in focus descriptor in focus sentence and focus type keyword dictionary, two The ratio between the identical word number of person and certain focus type keyword dictionary number, as the focus descriptor is predicted to be the focus type Probability value.Focus type keyword dictionary is traversed, focus class of the maximum focus type of k probability value as rule model before taking Type candidate collection.For example, the focus descriptor in focus sentence includes a1 and b2, listed candidate word packet in focus type keyword dictionary Focus type A, B and C are included, focus type A includes keyword a1, a2 and a3, and focus type B includes keyword b1 and b2, focus Type C includes keyword a1, a2, b1 and b2.Focus descriptor a1 word number identical with candidate word listed in focus type A is 1, Focus type A keyword dictionary number is 3, then the probability value that focus descriptor a1 is predicted to be focus type A is 1/3.It is burnt Point descriptor a1 word number identical with candidate word listed in focus type B is 0, and focus type B keyword dictionary number is 2, then The probability value that focus descriptor a1 is predicted to be focus type B is 0.Focus descriptor a1 and listed candidate in focus Type C The identical word number of word is 1, and focus Type C keyword dictionary number is 4, then focus descriptor a1 is predicted to be focus type A's Probability value is 1/4.Focus descriptor b1 word number identical with candidate word listed in focus type A is 0, and focus type A is crucial Word dictionary number is 3, then the probability value that focus descriptor b1 is predicted to be focus type A is 0.Focus descriptor b1 and focus The identical word number of listed candidate word is 1 in type B, and focus type B keyword dictionary number is 2, then focus descriptor b1 is pre- Surveying as the probability value of focus type B is 1/2.Focus descriptor b1 word number identical with candidate word listed in focus Type C is 1, focus Type C keyword dictionary number is 4, then the probability value that focus descriptor b1 is predicted to be focus Type C is 1/4. Assuming that k=2, chooses focus type candidate collection of the preceding maximum focus type of 2 probability values as rule model, i.e. the focus sentence The focus type candidate collection of corresponding rule model includes focus type A and B.
In one embodiment, as shown in figure 4, according to the list of types of central issue, similarity model and extraction Central issue obtains the corresponding focus type candidate collection of similarity model, comprising: step 402, by similarity model to extraction Central issue out is removed redundancy word and the processing of shared word, and carries out synonym conversion process;Step 404, according to dispute The list of types of focus counts central issue all types of in processed central issue, generates focus keyword;Step 406, Each central issue in focus keyword is converted into central issue term vector, using the mean value of central issue term vector as dispute Focus sentence vector obtains focus type sentence vector;Step 408, the focus sentence in judgement document's sample is obtained, focus sentence is converted Sentence vector is described for focus;Step 410, focus is described into sentence vector traversal focus type sentence vector, is obtained by similarity calculation To the distance value of focus type, selected distance value meets focus type marquis of the focus type as similarity model of preset condition Selected works.
Similarity model removes redundancy word and shared word first, and makees synonym conversion, generates focus keyword.Then make Focus keyword is converted into term vector with skip-gram algorithm, then using the mean value of the type central issue term vector as should The sentence vector of focus type.When input focus sentence, handled by same steps, obtain focus sentence focus describe sentence to Amount.Focus describes sentence vector traversal focus type sentence vector, carries out similarity calculation two-by-two and obtains distance value, specifically can be Europe Formula distance, focus type candidate collection of the smallest focus type of k distance value as similarity model before taking.
In one embodiment, as shown in figure 5, according to the corresponding focus type candidate collection of rule model and similarity mould The corresponding focus type candidate collection of type, the corresponding focus type of the central issue extracted, comprising: step 502, seek rule The intersection of the corresponding focus type candidate collection of model and the corresponding focus type candidate collection of similarity model;Step 504, work as friendship When the number of focalization type is 1, using the focus type in intersection as the corresponding focus type of central issue extracted.When When the number of focalization type being handed over to be greater than 1, the probability value and corresponding distance value of each focus type are obtained;It calculates each The probability value of focus type and the product of respective distances value inverse, using the maximum focus type of product as the central issue extracted Corresponding focus type.When handing over the number of focalization type to be 0, the probability value of each focus type is obtained;By probability value Maximum focus type is as the corresponding focus type of central issue extracted.
Seek the corresponding focus type candidate collection of rule model and the corresponding focus type candidate collection of similarity model Intersection, for example handing over focalization type is A, using focus type A as the corresponding focus type of central issue extracted.Work as intersection When middle focus type number is greater than 1, for example handing over focalization type includes A and B, and the probability value of focus type A is 1/3, corresponding Distance value be 1/2, the probability value of focus type B is 1/4, and corresponding distance value is 1/2.Calculate focus type probability value with it is right The product for answering distance value inverse, obtaining the corresponding product of focus type A is 2/3, and the corresponding product of focus type B is 1/2, because This, using focus type B as the corresponding focus type of central issue extracted.When handing over focalization type number is 0, such as The corresponding focus type candidate collection of rule model includes A and B, and the probability value of focus type A is 1/3, the probability value of focus type B 1/4, the corresponding focus type candidate collection of similarity model includes C, using focus type A as the corresponding coke of central issue extracted Vertex type.
In one embodiment, focal position mark is carried out to judgement document's sample according to focus locating rule, obtains band Before judgement document's sample of mark, further includes: obtain the model generated by third party's open source library;By gibbs sampler with And pseudo- likelihood algorithm, the model generated to third party's open source library optimize, the model after obtaining optimization processing;It is fixed according to focus Position rule carries out focal position mark to judgement document's sample, obtains judgement document's sample with mark, comprising: at optimization Model and focus locating rule after reason carry out focal position mark to judgement document's sample, obtain the judgement document with mark Sample.Using the generation model in third party's open source library Snorkel, the generation model of Snorkel is the pre- of different label equations It surveys result and regards the factor in factor graph as, reasonable value is therefrom extracted using gibbs sampler, and pass through pseudo- likelihood algorithm optimization Entire graph model, the process do not need to provide true label value.
In one embodiment, according to disaggregated model and the list of types of central issue, to the central issue of extraction into Row is classified, after the automatic classification results for the central issue extracted, further includes: is chosen from classified focus to artificial Focus is verified, terminal will be sent to manual verification's focus and verified;Manual sort is received as a result, manual sort's result is by end It treats manual verification's focus and is classified to obtain in end;Manual sort's result is compared with automatic classification results, is obtained automatic The accuracy rate of classification results.Expert extracts the examination of 10-20 item from every class focus of automatic classification results, obtains manual sort's knot Focus is divided into three classes by fruit according to manual sort's result: can reach the focus type of degree of precision using automatic classification, increase and divide Classification can match correct focus type automatically after rule-like, and incompetent focus type of classifying automatically.Expert is newly-increased Classifying rules, for example increase or modify keyword dictionary, some simple rules etc. that regular expression can be used to position, root are provided It is optimized according to newly-increased classifying rules.It is right the step of duplication model optimization, category of model, expert's verifying for preceding two classes focus It is then marked by professional mark personnel in latter class focus, until all focuses are all classified correctly.
It should be understood that although each step in the flow chart of Fig. 1-5 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 1-5 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.
In one embodiment, as shown in fig. 6, providing a kind of mark sorter of central issue, comprising: rule obtains Modulus block 602, sample labeling module 604, focus extraction module 606 and classification of focal spot module 608.Rule acquisition module is used for Focus locating rule and judgement document's sample are obtained, focus locating rule is used to position the focal position of judgement document;Sample Labeling module obtains the judge with mark for carrying out focal position mark to judgement document's sample according to focus locating rule Document sample;Focus extraction module for extracting the central issue in judgement document's sample with mark, and obtains disaggregated model And the list of types of central issue;Classification of focal spot module is right for the list of types according to disaggregated model and central issue The central issue of extraction is classified, the automatic classification results for the central issue extracted.
In one embodiment, disaggregated model includes rule model and similarity model, and classification of focal spot module includes: rule Then model focus type Candidate Set acquiring unit, for striving according to the list of types of central issue, rule model and extraction Focus is discussed, the corresponding focus type candidate collection of rule model is obtained;Similarity model focus type Candidate Set acquiring unit, is used for According to the list of types of central issue, similarity model and the central issue of extraction, the corresponding focus of similarity model is obtained Type candidate collection;Focus type determining units, for according to the corresponding focus type candidate collection of rule model and similarity mould The corresponding focus type candidate collection of type, the corresponding focus type of the central issue extracted.
In one embodiment, rule model focus type Candidate Set acquiring unit includes: focus processing unit, for leading to It crosses rule model and redundancy word and the processing of shared word is removed to the central issue of extraction, and carry out synonym conversion process;Word Allusion quotation generation unit counts central issue all types of in processed central issue for the list of types according to central issue, Generate focus type keyword dictionary;Focus sentence processing unit, for obtaining the focus sentence in judgement document's sample, focus point sentence It is removed redundancy word and synonym conversion process;Probability acquiring unit, for obtaining the focus in processed focus sentence Focus descriptor is successively compared with focus type keyword dictionary, obtains each focus descriptor and belong to focus by descriptor The probability value of central issue type in type keyword dictionary;Focus type generation unit, for traversing focus type keyword Dictionary chooses probability value and is greater than focus type candidate collection of the central issue type of preset threshold as rule model.
In one embodiment, similarity model focus type Candidate Set acquiring unit includes: pretreatment unit, for leading to It crosses similarity model and redundancy word and the processing of shared word is removed to the central issue extracted, and carry out at synonym conversion Reason;Keyword generation unit counts all types of in processed central issue and strives for the list of types according to central issue Focus is discussed, focus keyword is generated;Sentence vector generation unit, for each central issue in focus keyword to be converted to dispute Focus term vector obtains focus type sentence vector using the mean value of central issue term vector as central issue sentence vector;Focus sentence Focus sentence is converted to focus and describes sentence vector by converting unit for obtaining the focus sentence in judgement document's sample;Similarity meter Calculate unit, for by focus describe sentence vector traversal focus type sentence vector, by similarity calculation obtain focus type away from From value, selected distance value meets focus type candidate collection of the focus type as similarity model of preset condition.
In one embodiment, focus type determining units include: intersection acquiring unit, corresponding for seeking rule model Focus type candidate collection and the corresponding focus type candidate collection of similarity model intersection;First judging unit, for working as When the number of focalization type being handed over to be 1, using the focus type in intersection as the corresponding focus type of central issue extracted.
It in one embodiment, further include second judgment unit after intersection acquiring unit, for when friendship focalization class When the number of type is greater than 1, the probability value and corresponding distance value of each focus type are obtained;Calculate the general of each focus type The product of rate value and respective distances value inverse, using the maximum focus type of product as the corresponding focus class of central issue extracted Type.
It in one embodiment, further include third judging unit after intersection acquiring unit, for when friendship focalization class When the number of type is 0, the probability value of each focus type is obtained;The maximum focus type of probability value is burnt as the dispute extracted The corresponding focus type of point.
In one embodiment, further include optimization module before sample labeling module, increased income for obtaining by third party The model that library generates;By gibbs sampler and pseudo- likelihood algorithm, the model generated to third party's open source library is optimized, is obtained Model after to optimization processing;Sample labeling module, for according to after optimization processing model and focus locating rule to sanction Sentence document sample and carry out focal position mark, obtains judgement document's sample with mark.
It in one embodiment, further include authentication module after classification of focal spot module, for being selected from classified focus It takes to manual verification's focus, terminal will be sent to manual verification's focus and verified;Manual sort is received as a result, manual sort As a result manual verification's focus is treated by terminal to be classified to obtain;Manual sort's result is compared with automatic classification results, Obtain the accuracy rate of automatic classification results.
The specific of mark sorter about central issue limits the mark that may refer to above for central issue The restriction of classification method, details are not described herein.Modules in the mark sorter of above-mentioned central issue can whole or portion Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold The corresponding operation of the above modules of row.
In one embodiment, a kind of computer equipment is provided, which can be terminal, internal structure Figure can be as shown in Figure 7.The computer equipment includes processor, the memory, network interface, display connected by system bus Screen and input unit.Wherein, the processor of the computer equipment is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer journey Sequence.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The network interface of machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor with Realize a kind of mark classification method of central issue.The display screen of the computer equipment can be liquid crystal display or electronic ink Water display screen, the input unit of the computer equipment can be the touch layer covered on display screen, be also possible to computer equipment Key, trace ball or the Trackpad being arranged on shell can also be external keyboard, Trackpad or mouse etc..
It will be understood by those skilled in the art that structure shown in Fig. 7, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with The step of computer program, which realizes the mark classification of central issue in any embodiment when executing computer program.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated The step of mark classification method of central issue in any embodiment is realized when machine program is executed by processor.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. a kind of mark classification method of central issue, which comprises
Focus locating rule and judgement document's sample are obtained, the focus locating rule is used to position the focus position of judgement document It sets;
Focal position mark is carried out to judgement document's sample according to the focus locating rule, obtains judge's text with mark Book sample;
The central issue in judgement document's sample of the band mark is extracted, and obtains the type of disaggregated model and central issue List;
According to the disaggregated model and the list of types of the central issue, classifies to the central issue of extraction, obtain The automatic classification results of the central issue of the extraction.
2. the method according to claim 1, wherein the disaggregated model includes rule model and similarity mould Type, it is described according to the disaggregated model and the list of types of the central issue, classify to the central issue of extraction, obtains To the automatic classification results of the central issue of the extraction, comprising:
According to the central issue of the list of types of the central issue, the rule model and extraction, the regular mould is obtained The corresponding focus type candidate collection of type;
According to the central issue of the list of types of the central issue, the similarity model and extraction, obtain described similar Spend the corresponding focus type candidate collection of model;
According to the corresponding focus type candidate collection of the rule model and the corresponding focus type candidate of the similarity model Collection, the corresponding focus type of the central issue extracted.
3. according to the method described in claim 2, it is characterized in that, the list of types according to the central issue, described Rule model and the central issue of extraction obtain the corresponding focus type candidate collection of the rule model, comprising:
It is removed redundancy word and the processing of shared word by central issue of the rule model to extraction, and carries out synonym and turns Change processing;
According to the list of types of the central issue, central issue all types of in processed central issue is counted, is generated burnt Vertex type keyword dictionary;
The focus sentence in judgement document's sample is obtained, the focus sentence is removed at redundancy word and synonym conversion Reason;
Obtain the focus descriptor in processed focus sentence, by the focus descriptor successively with the focus type keyword Dictionary is compared, and obtains the probability that each focus descriptor belongs to central issue type in the focus type keyword dictionary Value;
The focus type keyword dictionary is traversed, probability value is chosen and is greater than the central issue type of preset threshold as regular mould The focus type candidate collection of type.
4. according to the method described in claim 2, it is characterized in that, the list of types according to the central issue, described Similarity model and the central issue of extraction obtain the corresponding focus type candidate collection of the similarity model, comprising:
Redundancy word is removed to the central issue extracted by similarity model and shared word is handled, and carries out synonym and turns Change processing;
According to the list of types of the central issue, central issue all types of in processed central issue is counted, is generated burnt Point keyword;
Each central issue in the focus keyword is converted into central issue term vector, with the mean value of central issue term vector As central issue sentence vector, focus type sentence vector is obtained;
The focus sentence in judgement document's sample is obtained, the focus sentence is converted into focus and describes sentence vector;
The focus is described into sentence vector and traverses the focus type sentence vector, by similarity calculation obtain focus type away from From value, selected distance value meets focus type candidate collection of the focus type as similarity model of preset condition.
5. according to the method described in claim 2, it is characterized in that, described according to the corresponding focus type marquis of the rule model Selected works and the corresponding focus type candidate collection of the similarity model, the corresponding focus type of the central issue extracted, Include:
Seek the corresponding focus type candidate collection of the rule model and the corresponding focus type candidate of the similarity model The intersection of collection;
When the number for handing over focalization type is 1, using the focus type in the intersection as the central issue extracted Corresponding focus type.
6. according to the method described in claim 5, it is characterized in that, described seek the corresponding focus type marquis of the rule model After the intersection of selected works and the corresponding focus type candidate collection of the similarity model, further includes:
When it is described hand over focalization type number be greater than 1 when, obtain each focus type probability value and corresponding distance Value;
The probability value of each focus type and the product of respective distances value inverse are calculated, the maximum focus type of product is made For the corresponding focus type of central issue of extraction.
7. according to the method described in claim 5, it is characterized in that, described seek the corresponding focus type marquis of the rule model After the intersection of selected works and the corresponding focus type candidate collection of the similarity model, further includes:
When the number for handing over focalization type is 0, the probability value of each focus type is obtained;
Using the maximum focus type of probability value as the corresponding focus type of central issue extracted.
8. a kind of mark sorter of central issue, which is characterized in that described device includes:
Rule acquisition module, for obtaining focus locating rule and judgement document's sample, the focus locating rule is for fixed The focal position of position judgement document;
Sample labeling module, for carrying out focal position mark to judgement document's sample according to the focus locating rule, Obtain judgement document's sample with mark;
Focus extraction module, the central issue in judgement document's sample for extracting the band mark, and obtain disaggregated model And the list of types of central issue;
Classification of focal spot module, the dispute for the list of types according to the disaggregated model and the central issue, to extraction Focus is classified, and the automatic classification results of the central issue of the extraction are obtained.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910185041.5A 2019-03-12 2019-03-12 Dispute focus label classification method and device, computer equipment and storage medium Active CN109992664B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910185041.5A CN109992664B (en) 2019-03-12 2019-03-12 Dispute focus label classification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910185041.5A CN109992664B (en) 2019-03-12 2019-03-12 Dispute focus label classification method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109992664A true CN109992664A (en) 2019-07-09
CN109992664B CN109992664B (en) 2023-04-18

Family

ID=67130621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910185041.5A Active CN109992664B (en) 2019-03-12 2019-03-12 Dispute focus label classification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109992664B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321439A (en) * 2019-07-10 2019-10-11 北京市律典通科技有限公司 A kind of electronics marking management method and system
CN110674633A (en) * 2019-09-18 2020-01-10 平安科技(深圳)有限公司 Document review proofreading method and device, storage medium and electronic equipment
CN110765266A (en) * 2019-09-20 2020-02-07 成都星云律例科技有限责任公司 Method and system for merging similar dispute focuses of referee documents
CN110825879A (en) * 2019-09-18 2020-02-21 平安科技(深圳)有限公司 Case decision result determination method, device and equipment and computer readable storage medium
CN110825872A (en) * 2019-09-11 2020-02-21 成都数之联科技有限公司 Method and system for extracting and classifying litigation request information
CN110889502A (en) * 2019-10-15 2020-03-17 东南大学 Deep learning-based dispute focus generation method
CN111159017A (en) * 2019-12-17 2020-05-15 北京中科晶上超媒体信息技术有限公司 Test case generation method based on slot filling
CN111814477A (en) * 2020-07-06 2020-10-23 重庆邮电大学 Dispute focus discovery method and device based on dispute focus entity and terminal
CN112487146A (en) * 2020-12-02 2021-03-12 重庆邮电大学 Legal case dispute focus acquisition method and device and computer equipment
CN112580338A (en) * 2019-09-27 2021-03-30 北京国双科技有限公司 Method and device for determining dispute focus, storage medium and equipment
CN113553856A (en) * 2021-06-16 2021-10-26 吉林大学 Deep neural network-based dispute focus identification method
CN113792545A (en) * 2021-11-16 2021-12-14 成都索贝数码科技股份有限公司 News event activity name extraction method based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766426A (en) * 2017-09-14 2018-03-06 北京百分点信息科技有限公司 A kind of file classification method, device and electronic equipment
CN108280149A (en) * 2018-01-04 2018-07-13 东南大学 A kind of doctor-patient dispute class case recommendation method based on various dimensions tag along sort
CN109033105A (en) * 2017-06-09 2018-12-18 北京国双科技有限公司 The method and apparatus for obtaining judgement document's focus
CN109033041A (en) * 2017-06-09 2018-12-18 北京国双科技有限公司 The treating method and apparatus of document similarity
CN109359175A (en) * 2018-09-07 2019-02-19 平安科技(深圳)有限公司 Electronic device, the method for lawsuit data processing and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033105A (en) * 2017-06-09 2018-12-18 北京国双科技有限公司 The method and apparatus for obtaining judgement document's focus
CN109033041A (en) * 2017-06-09 2018-12-18 北京国双科技有限公司 The treating method and apparatus of document similarity
CN107766426A (en) * 2017-09-14 2018-03-06 北京百分点信息科技有限公司 A kind of file classification method, device and electronic equipment
CN108280149A (en) * 2018-01-04 2018-07-13 东南大学 A kind of doctor-patient dispute class case recommendation method based on various dimensions tag along sort
CN109359175A (en) * 2018-09-07 2019-02-19 平安科技(深圳)有限公司 Electronic device, the method for lawsuit data processing and storage medium

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321439A (en) * 2019-07-10 2019-10-11 北京市律典通科技有限公司 A kind of electronics marking management method and system
CN110825872A (en) * 2019-09-11 2020-02-21 成都数之联科技有限公司 Method and system for extracting and classifying litigation request information
CN110825872B (en) * 2019-09-11 2023-05-23 成都数之联科技股份有限公司 Method and system for extracting and classifying litigation request information
CN110674633A (en) * 2019-09-18 2020-01-10 平安科技(深圳)有限公司 Document review proofreading method and device, storage medium and electronic equipment
CN110825879A (en) * 2019-09-18 2020-02-21 平安科技(深圳)有限公司 Case decision result determination method, device and equipment and computer readable storage medium
CN110825879B (en) * 2019-09-18 2024-05-07 平安科技(深圳)有限公司 Decide a case result determination method, device, equipment and computer readable storage medium
CN110765266A (en) * 2019-09-20 2020-02-07 成都星云律例科技有限责任公司 Method and system for merging similar dispute focuses of referee documents
CN110765266B (en) * 2019-09-20 2022-07-22 成都星云律例科技有限责任公司 Method and system for merging similar dispute focuses of referee documents
CN112580338A (en) * 2019-09-27 2021-03-30 北京国双科技有限公司 Method and device for determining dispute focus, storage medium and equipment
CN110889502B (en) * 2019-10-15 2024-02-06 东南大学 Deep learning-based dispute focus generation method
CN110889502A (en) * 2019-10-15 2020-03-17 东南大学 Deep learning-based dispute focus generation method
CN111159017A (en) * 2019-12-17 2020-05-15 北京中科晶上超媒体信息技术有限公司 Test case generation method based on slot filling
CN111814477A (en) * 2020-07-06 2020-10-23 重庆邮电大学 Dispute focus discovery method and device based on dispute focus entity and terminal
CN111814477B (en) * 2020-07-06 2022-06-21 重庆邮电大学 Dispute focus discovery method and device based on dispute focus entity and terminal
CN112487146A (en) * 2020-12-02 2021-03-12 重庆邮电大学 Legal case dispute focus acquisition method and device and computer equipment
CN112487146B (en) * 2020-12-02 2022-05-31 重庆邮电大学 Legal case dispute focus acquisition method and device and computer equipment
CN113553856A (en) * 2021-06-16 2021-10-26 吉林大学 Deep neural network-based dispute focus identification method
CN113792545A (en) * 2021-11-16 2021-12-14 成都索贝数码科技股份有限公司 News event activity name extraction method based on deep learning

Also Published As

Publication number Publication date
CN109992664B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN109992664A (en) Mark classification method, device, computer equipment and the storage medium of central issue
CA3129721C (en) Pre-trained contextual embedding models for named entity recognition and confidence prediction
Zayats et al. Disfluency detection using a bidirectional LSTM
Dai et al. Multimodal end-to-end sparse model for emotion recognition
CN108376151A (en) Question classification method, device, computer equipment and storage medium
CN112231472B (en) Judicial public opinion sensitive information identification method integrated with domain term dictionary
CN108804414A (en) Text modification method, device, smart machine and readable storage medium storing program for executing
CN110489550A (en) File classification method, device and computer equipment based on combination neural net
CN111291570A (en) Method and device for realizing element identification in judicial documents
CN115310425B (en) Policy text analysis method based on policy text classification and key information identification
CN111046670B (en) Entity and relationship combined extraction method based on drug case legal documents
CN110442859B (en) Labeling corpus generation method, device, equipment and storage medium
CN109087205A (en) Prediction technique and device, the computer equipment and readable storage medium storing program for executing of public opinion index
CN111462752B (en) Attention mechanism, feature embedding and BI-LSTM (business-to-business) based customer intention recognition method
CN110750978A (en) Emotional tendency analysis method and device, electronic equipment and storage medium
CN106910512A (en) The analysis method of voice document, apparatus and system
CN113886562A (en) AI resume screening method, system, equipment and storage medium
CN115238697A (en) Judicial named entity recognition method based on natural language processing
CN116010902A (en) Cross-modal fusion-based music emotion recognition method and system
CN115185918A (en) Method and device for automatically classifying system logs
CN113868422A (en) Multi-label inspection work order problem traceability identification method and device
Hillebrand et al. Towards automating numerical consistency checks in financial reports
CN113486174A (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN116910251A (en) Text classification method, device, equipment and medium based on BERT model
CN115994220A (en) Contact net text data defect identification method and device based on semantic mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant