CN109992664B - Dispute focus label classification method and device, computer equipment and storage medium - Google Patents
Dispute focus label classification method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109992664B CN109992664B CN201910185041.5A CN201910185041A CN109992664B CN 109992664 B CN109992664 B CN 109992664B CN 201910185041 A CN201910185041 A CN 201910185041A CN 109992664 B CN109992664 B CN 109992664B
- Authority
- CN
- China
- Prior art keywords
- focus
- dispute
- type
- model
- extracted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000013145 classification model Methods 0.000 claims abstract description 29
- 239000013598 vector Substances 0.000 claims description 39
- 230000015654 memory Effects 0.000 claims description 19
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 abstract description 20
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000003062 neural network model Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Technology Law (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to the technical field of artificial intelligence, is applied to the smart city industry, and particularly relates to a dispute focus label classification method, a dispute focus label classification device, computer equipment and a storage medium. The method in one embodiment comprises: the method comprises the steps of obtaining a focus positioning rule and a referee document sample, carrying out focus position labeling on the referee document sample according to the focus positioning rule to obtain the referee document sample with a label, extracting a dispute focus in the referee document sample with the label, obtaining a classification model and a type list of the dispute focus, and classifying the extracted dispute focus according to the classification model and the type list of the dispute focus to obtain an automatic classification result of the extracted dispute focus. Therefore, automatic labeling and classification of the focus can be realized, the workload of professional labeling personnel can be greatly reduced, the focus is labeled and classified through the model, the labeling and classification period can be effectively shortened, and the work efficiency of labeling and classification is improved.
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for classifying comments focus, a computer device, and a storage medium.
Background
The referee document is used for recording the trial process and result of the people's court, is a carrier of the result of the litigation activity, and is also a voucher for the people's court to determine and distribute the entity right obligation of the party.
Traditionally, the dispute focus in the referee document needs to be positioned and classified manually, and due to the characteristics of strong specialization, difficult training, large annotation quantity and the like, tasks are difficult to complete in a short time, so that the annotation classification work efficiency is low.
Disclosure of Invention
In view of the above, there is a need to provide a method, an apparatus, a computer device and a storage medium for classifying comments focused on a dispute, which can improve the work efficiency.
A method of classifying annotations of dispute focus, the method comprising:
acquiring a focus positioning rule and a referee document sample, wherein the focus positioning rule is used for positioning the focus position of the referee document;
performing focus position marking on the referee document sample according to the focus positioning rule to obtain a referee document sample with marks;
extracting a dispute focus in the referee document sample with the label, and acquiring a classification model and a type list of the dispute focus;
and classifying the extracted dispute focus according to the classification model and the type list of the dispute focus to obtain an automatic classification result of the extracted dispute focus.
In one embodiment, the classifying model includes a rule model and a similarity model, and the classifying the extracted dispute focus according to the classification model and the type list of the dispute focus to obtain the automatic classification result of the extracted dispute focus includes:
obtaining a focus type Hou Xuanji corresponding to the rule model according to the dispute focus type list, the rule model and the extracted dispute focus;
obtaining a focus type Hou Xuanji corresponding to the similarity model according to the dispute focus type list, the similarity model and the extracted dispute focus;
and obtaining the focus type corresponding to the extracted dispute focus according to the focus type candidate set corresponding to the rule model and the focus type Hou Xuanji corresponding to the similarity model.
In an embodiment, the obtaining a candidate set of focus types corresponding to the rule model according to the list of types of the dispute focus, the rule model, and the extracted dispute focus includes:
removing redundant words and common words from the extracted dispute focus through the rule model, and performing synonym conversion processing;
counting the dispute focuses of various types in the processed dispute focuses according to the type list of the dispute focuses to generate a focus type keyword dictionary;
acquiring a focus sentence in the referee document sample, and performing redundant word removal and synonym conversion processing on the focus sentence;
obtaining focus describing words in the processed focus sentences, and sequentially comparing the focus describing words with the focus type keyword dictionary to obtain probability values of dispute focus types of all the focus describing words in the focus type keyword dictionary;
and traversing the focus type keyword dictionary, and selecting a dispute focus type with the probability value larger than a preset threshold value as a focus type candidate set of the rule model.
In an embodiment, the obtaining a candidate set of focus types corresponding to the similarity model according to the list of types of the dispute focus, the similarity model, and the extracted dispute focus includes:
removing redundant words and common words from the extracted dispute focus through a similarity model, and performing synonym conversion processing;
counting the dispute focuses of various types in the processed dispute focuses according to the type list of the dispute focuses to generate focus keywords;
converting each dispute focus in the focus keywords into dispute focus word vectors, and taking the mean value of the dispute focus word vectors as dispute focus sentence vectors to obtain focus type sentence vectors;
acquiring a focus sentence in the referee document sample, and converting the focus sentence into a focus description sentence vector;
traversing the focus description sentence vector through the focus type sentence vector, calculating a distance value of the focus type through a similarity, and selecting the focus type with the distance value meeting a preset condition as a focus type candidate set of the similarity model.
In an embodiment, the obtaining a focus type corresponding to the extracted dispute focus according to the focus type candidate set corresponding to the rule model and the focus type Hou Xuanji corresponding to the similarity model includes:
obtaining the intersection of the focus type candidate set corresponding to the rule model and the focus type candidate set corresponding to the similarity model;
and when the number of the focus types in the intersection is 1, taking the focus types in the intersection as the focus types corresponding to the extracted dispute focuses.
In an embodiment, after the intersecting the candidate set of focus types corresponding to the rule model and the candidate set of focus types corresponding to the similarity model, the method further includes:
when the number of the focus types in the intersection is larger than 1, obtaining a probability value and a corresponding distance value of each focus type;
and calculating the product of the probability value of each focus type and the reciprocal of the corresponding distance value, and taking the focus type with the maximum product as the focus type corresponding to the extracted dispute focus.
In one embodiment, after the intersection between the candidate set of the focus type corresponding to the rule model and the candidate set of the focus type corresponding to the similarity model is obtained, the method further includes:
when the number of the focus types in the intersection is 0, acquiring the probability value of each focus type;
and taking the focus type with the maximum probability value as the focus type corresponding to the extracted dispute focus.
An annotation classification device of a dispute focus, the device comprising:
the rule acquisition module is used for acquiring a focus positioning rule and a referee document sample, wherein the focus positioning rule is used for positioning the focus position of the referee document;
the sample marking module is used for marking the focus position of the referee document sample according to the focus positioning rule to obtain a referee document sample with a mark;
the focus extraction module is used for extracting a dispute focus in the referee document sample with the label and acquiring a classification model and a type list of the dispute focus;
and the focus classification module is used for classifying the extracted dispute focus according to the classification model and the type list of the dispute focus to obtain an automatic classification result of the extracted dispute focus.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a focus positioning rule and a referee document sample, wherein the focus positioning rule is used for positioning the focus position of the referee document;
performing focus position marking on the referee document sample according to the focus positioning rule to obtain a referee document sample with marks;
extracting a dispute focus in the referee document sample with the label, and acquiring a classification model and a type list of the dispute focus;
and classifying the extracted dispute focus according to the classification model and the type list of the dispute focus to obtain an automatic classification result of the extracted dispute focus.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a focus positioning rule and a referee document sample, wherein the focus positioning rule is used for positioning the focus position of the referee document;
marking the focal position of the referee document sample according to the focal positioning rule to obtain a referee document sample with a mark;
extracting a dispute focus in the referee document sample with the label, and acquiring a classification model and a type list of the dispute focus;
and classifying the extracted dispute focus according to the classification model and the type list of the dispute focus to obtain an automatic classification result of the extracted dispute focus.
According to the method, the device, the computer equipment and the storage medium for classifying the labels of the dispute focuses, the focus position of the referee document sample is labeled according to the focus positioning rule by obtaining the focus positioning rule and the referee document sample, the referee document sample with the labels is obtained, the dispute focus in the referee document sample with the labels is extracted, the classification model and the type list of the dispute focus are obtained, the extracted dispute focus is classified according to the classification model and the type list of the dispute focus, and the automatic classification result of the extracted dispute focus is obtained, so that the automatic labeling and classification of the dispute focuses can be realized, the workload of professional labeling personnel can be greatly reduced, the focus is classified through the model, the labeling and classification period can be effectively shortened, and the work efficiency of the labeling and classification can be improved.
Drawings
FIG. 1 is a flowchart illustrating a method for classifying comments at a point of dispute in accordance with an embodiment;
FIG. 2 is a flowchart illustrating the focus classification step in one embodiment;
FIG. 3 is a flowchart illustrating the step of obtaining a rule model focus type candidate set in one embodiment;
FIG. 4 is a flowchart illustrating a step of obtaining a candidate set of focus types of a similarity model in one embodiment;
FIG. 5 is a flowchart of the focus type determining step in one embodiment;
FIG. 6 is a block diagram of an apparatus for classifying comments at a point of dispute in accordance with an embodiment;
FIG. 7 is a diagram of the internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In one embodiment, as shown in fig. 1, a method for classifying labels of dispute focus is provided, which comprises the following steps:
step 102, a focus positioning rule and a referee document sample are obtained, wherein the focus positioning rule is used for positioning the focus position of the referee document.
The focus positioning rule is a rule for positioning the focus position in the referee document, and can be given by an expert. The focus positioning rule may specifically be "the word" focus "is carried before the content of the dispute focus", "the dispute focus appears at the end of the referee document", "there is a court opinion behind the dispute focus that supports or does not support the focus", and so on. The focus positioning rule does not need to be independent or accurate, and the fuzzy performance of the focus positioning rule enhances the generalization capability.
The referee document sample refers to a referee document used for automatic labeling and classification training of a dispute focus, for example, a referee document training sample with more than ten thousand orders of magnitude. The focus of document disputes in the referee's document training sample is neither localized nor categorised.
And 104, marking the focus position of the referee document sample according to the focus positioning rule to obtain the referee document sample with marks.
The focus positioning rule is transcribed into a label equation described by using a program language, a judging document sample with a label is obtained through a generation model of a third-party open source library, the generation model is used for investigating different label equations, and the weight and the prediction result of the label equation are output through the generation model. For example, the different tag equations are examined through a Generative Model in a third-party open source library Snorkel. The label equation can process the input character string by using a regular expression, keyword matching and other methods and return a result, for example, a sentence in the referee document is input, a two-classification value is correspondingly output, 1 represents that the sentence is a dispute focus description sentence, and 0 represents that the sentence is not the dispute focus description sentence.
And 106, extracting a dispute focus in the referee document sample with the label, and acquiring a classification model and a type list of the dispute focus.
The list of types of dispute focus refers to a list including various types of dispute focus, which may be provided by an expert, such as first determining the referee's portfolio, i.e., the category of the document, and then providing a list of types of dispute focus that appears under that type of document. The type list may be in the form of a phrase, such as "XXX identification", "XXX recognition", etc., and may be in particular "legal relationship recognition (class)", "authenticity recognition (class)", etc. A classification model refers to a model used to map focus descriptions to corresponding focus types.
In one embodiment, the focus of disputes in the annotated referee document sample can be extracted through a trained deep neural network model. For example, the embedding layer of the deep neural network model converts an input sentence into a feature vector by using a Transformer, then codes the feature vector by using a bidirectional LSTM (Long Short-Term Memory), and finally judges the feature vector by using a CRF (Conditional Random Field) algorithm. The deep neural network model has robustness on label noise, so that the method is suitable for marked referee document samples obtained through unsupervised prediction of a third-party open source library Snorkel. By training the deep neural network model, the dispute focus in the referee document sample with the label can be extracted.
And step 108, classifying the extracted dispute focuses according to the classification models and the type list of the dispute focuses to obtain an automatic classification result of the extracted dispute focuses.
And classifying the extracted dispute focuses through a classification model according to focus types in the dispute focus type list, and outputting the focus types corresponding to the extracted dispute focuses. The automatic classification result refers to a focus type corresponding to the automatically generated dispute focus.
According to the dispute focus labeling and classifying method, the focus position labeling is carried out on the referee document sample according to the focus positioning rule by obtaining the focus positioning rule and the referee document sample, the referee document sample with the label is obtained, the dispute focus in the referee document sample with the label is extracted, the type list of the classification model and the dispute focus is obtained, the extracted dispute focus is classified according to the classification model and the type list of the dispute focus, and the automatic classification result of the extracted dispute focus is obtained.
In one embodiment, the classification model includes a rule model and a similarity model, and as shown in fig. 2, the classification of the extracted dispute focus according to the classification model and the list of types of dispute focuses to obtain an automatic classification result of the extracted dispute focus includes: step 202, obtaining a focus type Hou Xuanji corresponding to the rule model according to the type list of the dispute focus, the rule model and the extracted dispute focus; step 204, obtaining a focus type Hou Xuanji corresponding to the similarity model according to the type list of the dispute focus, the similarity model and the extracted dispute focus; and step 206, obtaining the focus type corresponding to the extracted dispute focus according to the focus type candidate set corresponding to the rule model and the focus type Hou Xuanji corresponding to the similarity model. The rule model is a model for obtaining a probability value that a dispute focus belongs to a certain type of focus type based on a preset rule, and the similarity model is a model for obtaining a distance value between the dispute focus and each focus type through similarity calculation.
In one embodiment, as shown in fig. 3, obtaining a candidate set of focus types corresponding to the rule model according to the list of types of the dispute focus, the rule model, and the extracted dispute focus includes: step 302, removing redundant words and common words from the extracted dispute focus through a rule model, and performing synonym conversion processing; step 304, according to the type list of the dispute focus, counting the dispute focus of each type in the processed dispute focus, and generating a focus type keyword dictionary; step 306, acquiring a focus sentence in the referee document sample, and performing redundant word removal and synonym conversion processing on the focus sentence; step 308, acquiring focus descriptors in the processed focus sentences, and sequentially comparing the focus descriptors with the focus type keyword dictionary to obtain probability values of dispute focus types of all the focus descriptors in the focus type keyword dictionary; and 310, traversing the focus type keyword dictionary, and selecting a dispute focus type with the probability value larger than a preset threshold value as a focus type candidate set of the rule model.
The rule model firstly removes redundant words and common words, carries out synonym conversion, then calculates the keywords contained in each focus type, and generates a focus type keyword dictionary. Taking legal relationship identification as an example of a focus type, keywords included in the focus type may be loan relationships, investment relationships, cooperation relationships, improper profits, debts and the like, and the keywords corresponding to different focus types preferably have no or little intersection. When a focus sentence is input, redundant word removal and synonym conversion processing are performed on the focus sentence. And sequentially comparing focus description words in the focus sentence with the Hou Xuanci listed in the focus type keyword dictionary, wherein the ratio of the same word number of the focus description words to the number of the focus type keyword dictionary is the probability value of the focus description words predicted as the focus type. And traversing the focus type keyword dictionary, and taking the focus types with the highest k probability values as a focus type candidate set of the rule model. For example, the focus descriptors in the focus sentence include a1 and B2, the focus type keyword dictionary lists Hou Xuanci including focus types A, B and C, focus type a including keywords a1, a2, and a3, focus type B including keywords B1 and B2, and focus type C including keywords a1, a2, B1, and B2. The number of words of the focus descriptor a1, which is the same as the number of candidate words listed in the focus type a, is 1, and the number of keyword dictionaries of the focus type a is 3, so that the probability value of the focus descriptor a1 being predicted as the focus type a is 1/3. The number of words of the focus descriptor a1, which is the same as the number of candidate words listed in the focus type B, is 0, and the number of keyword dictionaries of the focus type B is 2, so that the probability value of the focus descriptor a1 being predicted as the focus type B is 0. The number of the focus descriptor a1 is 1, which is the same as the number of the candidate words listed in the focus type C, and the number of the keyword dictionary of the focus type C is 4, so that the probability value of predicting the focus descriptor a1 to be the focus type a is 1/4. The number of the focus descriptor b1 is 0, which is the same as the number of the candidate words listed in the focus type a, and the number of the keyword dictionary of the focus type a is 3, so that the probability value of predicting the focus descriptor b1 as the focus type a is 0. The number of words of the focus descriptor B1, which is the same as the number of candidate words listed in the focus type B, is 1, and the number of keyword dictionaries of the focus type B is 2, so that the probability value of the focus descriptor B1 being predicted as the focus type B is 1/2. The number of the focus descriptor b1 is 1, which is the same as the number of the candidate words listed in the focus type C, and the number of the keyword dictionary of the focus type C is 4, so that the probability value of predicting the focus descriptor b1 to be the focus type C is 1/4. Assuming that k =2, the focus type with the highest probability value in the first 2 is selected as the focus type candidate set of the rule model, that is, the focus type candidate set of the rule model corresponding to the focus sentence includes focus types a and B.
In one embodiment, as shown in fig. 4, obtaining a candidate set of focus types corresponding to the similarity model according to the list of types of the dispute focus, the similarity model, and the extracted dispute focus includes: step 402, removing redundant words and common words from the extracted dispute focus through a similarity model, and performing synonym conversion processing; step 404, according to the type list of the dispute focus, counting the dispute focus of each type in the processed dispute focus, and generating a focus keyword; step 406, converting each dispute focus in the focus keywords into a dispute focus word vector, and taking the mean value of the dispute focus word vectors as a dispute focus sentence vector to obtain a focus type sentence vector; step 408, acquiring a focus sentence in the referee document sample, and converting the focus sentence into a focus description sentence vector; step 410, traversing the focus description sentence vector through the focus type sentence vector, calculating a distance value of the focus type through a similarity, and selecting the focus type with the distance value meeting a preset condition as a focus type candidate set of the similarity model.
The similarity model firstly removes the redundant words and the common words, and carries out synonym conversion to generate the focus key words. And then converting the focus keywords into word vectors by using a skip-gram algorithm, and taking the mean value of the focus word vectors of the type dispute as a sentence vector of the focus type. When a focus sentence is input, processing is carried out through the same steps, and a focus description sentence vector of the focus sentence is obtained. The focus description sentence vector traverses the focus type sentence vector, similarity calculation is carried out on every two sentence vectors to obtain distance values, specifically, euclidean distance is adopted, and the focus type with the minimum k distance values is taken as a focus type candidate set of the similarity model.
In one embodiment, as shown in fig. 5, obtaining a focus type corresponding to the extracted dispute focus according to the focus type candidate set corresponding to the rule model and the focus type Hou Xuanji corresponding to the similarity model includes: step 502, obtaining the intersection of the focus type candidate set corresponding to the rule model and the focus type candidate set corresponding to the similarity model; and step 504, when the number of the focus types in the intersection is 1, taking the focus type in the intersection as the focus type corresponding to the extracted dispute focus. When the number of the focus types in the intersection is larger than 1, obtaining the probability value and the corresponding distance value of each focus type; and calculating the product of the probability value of each focus type and the reciprocal of the corresponding distance value, and taking the focus type with the maximum product as the focus type corresponding to the extracted dispute focus. When the number of the focus types in the intersection is 0, acquiring the probability value of each focus type; and taking the focus type with the maximum probability value as the focus type corresponding to the extracted dispute focus.
And solving an intersection of the focus type candidate set corresponding to the rule model and the focus type candidate set corresponding to the similarity model, wherein the focus type in the intersection is A, and the focus type A is taken as the focus type corresponding to the extracted dispute focus. When the number of the focus types in the intersection is greater than 1, for example, the focus types in the intersection include a and B, the probability value of the focus type a is 1/3, the corresponding distance value is 1/2, the probability value of the focus type B is 1/4, and the corresponding distance value is 1/2. And calculating the product of the probability value of the focus type and the reciprocal of the corresponding distance value to obtain that the product corresponding to the focus type A is 2/3 and the product corresponding to the focus type B is 1/2, and therefore, taking the focus type B as the focus type corresponding to the extracted dispute focus. When the number of the focus types in the intersection is 0, for example, the focus type candidate set corresponding to the rule model includes a and B, the probability value of the focus type a is 1/3, the probability value of the focus type B is 1/4, the focus type candidate set corresponding to the similarity model includes C, and the focus type a is used as the focus type corresponding to the extracted dispute focus.
In one embodiment, before the process of marking the focus position of the referee document sample according to the focus positioning rule to obtain the referee document sample with a mark, the method further comprises the following steps: obtaining a model generated by a third-party open source library; optimizing the model generated by the third-party open source library through Gibbs sampling and a pseudo-likelihood algorithm to obtain an optimized model; according to the focus positioning rule, the focus position marking is carried out on the referee document sample to obtain the referee document sample with the mark, and the method comprises the following steps: and carrying out focus position labeling on the referee document sample according to the optimized model and the focus positioning rule to obtain the referee document sample with labels. And using a generation model in a third-party open source library Snorkel, wherein the generation model of the Snorkel takes the prediction results of different label equations as factors in a factor graph, extracting reasonable values from the factor graph by utilizing Gibbs sampling, and optimizing the whole graph model by a pseudo-likelihood algorithm, wherein the process does not need to provide real label values.
In one embodiment, after the step of classifying the extracted dispute focus according to the classification model and the list of types of dispute focus to obtain the automatic classification result of the extracted dispute focus, the method further includes: selecting a focus to be manually verified from the classified focuses, and sending the focus to be manually verified to the terminal for verification; receiving an artificial classification result, wherein the artificial classification result is obtained by classifying the focus to be artificially verified by the terminal; and comparing the manual classification result with the automatic classification result to obtain the accuracy of the automatic classification result. The expert extracts 10-20 acceptance from each type of focus of the automatic classification result to obtain an artificial classification result, and the focus is divided into three types according to the artificial classification result: the automatic classification can achieve higher precision of the focus type, the automatic classification can match the correct focus type after the classification rule is added, and the focus type which cannot be qualified can be automatically classified. And (3) adding a classification rule by an expert, such as adding or modifying a keyword dictionary, providing some simple rules which can be positioned by using a regular expression, and the like, and optimizing according to the added classification rule. And repeating the steps of model optimization, model classification and expert verification for the first two types of focuses, and labeling the latter type of focuses by professional labeling personnel until all the focuses are classified correctly.
It should be understood that although the various steps in the flow charts of fig. 1-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in FIG. 6, there is provided a device for classifying comments of a dispute focus, including: a rule acquisition module 602, a sample labeling module 604, a focus extraction module 606, and a focus classification module 608. The rule acquisition module is used for acquiring a focus positioning rule and a referee document sample, wherein the focus positioning rule is used for positioning the focus position of the referee document; the sample labeling module is used for labeling the focus position of the referee document sample according to the focus positioning rule to obtain the referee document sample with label; the focus extraction module is used for extracting a dispute focus in the referee document sample with the label and acquiring a classification model and a type list of the dispute focus; and the focus classification module is used for classifying the extracted dispute focuses according to the classification model and the type list of the dispute focuses to obtain an automatic classification result of the extracted dispute focuses.
In one embodiment, the classification model includes a rules model and a similarity model, and the focus classification module includes: the rule model focus type candidate set acquisition unit is used for acquiring a focus type Hou Xuanji corresponding to the rule model according to the type list of the dispute focus, the rule model and the extracted dispute focus; the similarity model focus type candidate set acquisition unit is used for acquiring a focus type Hou Xuanji corresponding to the similarity model according to the dispute focus type list, the similarity model and the extracted dispute focus; and the focus type determining unit is used for obtaining the extracted focus type corresponding to the dispute focus according to the focus type candidate set corresponding to the rule model and the focus type Hou Xuanji corresponding to the similarity model.
In one embodiment, the rule model focus type candidate set obtaining unit includes: the focus processing unit is used for removing redundant words and common words from the extracted dispute focus through a rule model and carrying out synonym conversion processing; the dictionary generating unit is used for counting dispute focuses of various types in the processed dispute focuses according to the type lists of the dispute focuses to generate a focus type keyword dictionary; the focus sentence processing unit is used for acquiring a focus sentence in the referee document sample, and performing redundant word removal and synonym conversion processing on the focus sentence; the probability obtaining unit is used for obtaining focus describing words in the processed focus sentences, and comparing the focus describing words with the focus type keyword dictionary in sequence to obtain the probability value of dispute focus types of all the focus describing words in the focus type keyword dictionary; and the focus type generating unit is used for traversing the focus type keyword dictionary and selecting the dispute focus type with the probability value larger than a preset threshold value as a focus type candidate set of the rule model.
In one embodiment, the similarity model focus type candidate set obtaining unit includes: the preprocessing unit is used for removing redundant words and common words from the extracted dispute focuses through the similarity model and performing synonym conversion processing; the keyword generation unit is used for counting dispute focuses of various types in the processed dispute focuses according to the type list of the dispute focuses and generating a focus keyword; the sentence vector generating unit is used for converting each dispute focus in the focus keywords into dispute focus word vectors, and obtaining focus type sentence vectors by taking the mean value of the dispute focus word vectors as the dispute focus sentence vectors; the focus sentence conversion unit is used for acquiring a focus sentence in the referee document sample and converting the focus sentence into a focus description sentence vector; and the similarity calculation unit is used for traversing the focus description sentence vector through the focus type sentence vector, calculating a distance value of the focus type through the similarity, and selecting the focus type with the distance value meeting the preset condition as a focus type candidate set of the similarity model.
In one embodiment, the focus type determining unit includes: the intersection acquisition unit is used for solving the intersection of the focus type candidate set corresponding to the rule model and the focus type candidate set corresponding to the similarity model; and the first judging unit is used for taking the focus type in the intersection as the focus type corresponding to the extracted dispute focus when the number of the focus types in the intersection is 1.
In an embodiment, the intersection obtaining unit further includes a second determining unit, configured to obtain, when the number of the focus types in the intersection is greater than 1, a probability value and a corresponding distance value of each focus type; and calculating the product of the probability value of each focus type and the reciprocal of the corresponding distance value, and taking the focus type with the maximum product as the focus type corresponding to the extracted dispute focus.
In an embodiment, the intersection obtaining unit further includes a third determining unit, configured to obtain a probability value of each focus type when the number of the focus types in the intersection is 0; and taking the focus type with the maximum probability value as the focus type corresponding to the extracted dispute focus.
In one embodiment, the sample labeling module further comprises an optimization module before the sample labeling module, configured to obtain a model generated by a third-party open source library; optimizing the model generated by the third-party open source library through Gibbs sampling and a pseudo-likelihood algorithm to obtain an optimized model; and the sample marking module is used for marking the focus position of the referee document sample according to the optimized model and the focus positioning rule to obtain the referee document sample with marks.
In one embodiment, the focus classification module further comprises a verification module used for selecting a focus to be manually verified from the classified focuses and sending the focus to be manually verified to the terminal for verification; receiving a manual classification result, wherein the manual classification result is obtained by classifying the to-be-manually-verified focus through a terminal; and comparing the manual classification result with the automatic classification result to obtain the accuracy of the automatic classification result.
For the specific definition of the annotation classification device related to the dispute focus, reference may be made to the above definition of the annotation classification method related to the dispute focus, and details thereof are not described herein again. The modules in the device for classifying comments at the dispute focus can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a method of annotation classification of dispute focus. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer apparatus is provided that includes a memory storing a computer program and a processor that when executed performs the steps of label classification of a point of dispute in any of the embodiments.
In one embodiment, a computer readable storage medium is provided, having stored thereon a computer program, which when executed by a processor, performs the steps of the method for label classification of dispute focus in any of the embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.
Claims (7)
1. A method for classifying annotations of dispute focus, the method comprising:
acquiring a focus positioning rule and a referee document sample, wherein the focus positioning rule is used for positioning the focus position of the referee document;
performing focus position marking on the referee document sample according to the focus positioning rule to obtain a referee document sample with marks;
extracting a dispute focus in the referee document sample with the label, and acquiring a classification model and a type list of the dispute focus; wherein the classification model comprises a rule model and a similarity model;
obtaining a focus type Hou Xuanji corresponding to the rule model according to the dispute focus type list, the rule model and the extracted dispute focus;
obtaining a focus type Hou Xuanji corresponding to the similarity model according to the dispute focus type list, the similarity model and the extracted dispute focus;
obtaining the intersection of the focus type candidate set corresponding to the rule model and the focus type candidate set corresponding to the similarity model;
when the number of the focus types in the intersection is 1, taking the focus types in the intersection as the focus types corresponding to the extracted dispute focuses;
when the number of the focus types in the intersection is larger than 1, obtaining a probability value and a corresponding distance value of each focus type; and calculating the product of the probability value of each focus type and the reciprocal of the corresponding distance value, and taking the focus type with the maximum product as the focus type corresponding to the extracted dispute focus.
2. The method of claim 1, wherein obtaining a candidate set of focus types corresponding to the rule model according to the list of types of the dispute focus, the rule model and the extracted dispute focus comprises:
removing redundant words and common words from the extracted dispute focus through the rule model, and performing synonym conversion processing;
counting the dispute focuses of various types in the processed dispute focuses according to the type list of the dispute focuses to generate a focus type keyword dictionary;
acquiring a focus sentence in the referee document sample, and performing redundant word removal and synonym conversion processing on the focus sentence;
obtaining focus describing words in the processed focus sentences, and sequentially comparing the focus describing words with the focus type keyword dictionary to obtain probability values of dispute focus types of all the focus describing words in the focus type keyword dictionary;
and traversing the focus type keyword dictionary, and selecting a dispute focus type with the probability value larger than a preset threshold value as a focus type candidate set of the rule model.
3. The method of claim 1, wherein obtaining a candidate set of focus types corresponding to the similarity model according to the list of types of the dispute focus, the similarity model, and the extracted dispute focus comprises:
removing redundant words and common words from the extracted dispute focus through a similarity model, and performing synonym conversion processing;
counting the dispute focuses of various types in the processed dispute focuses according to the type list of the dispute focuses to generate focus keywords;
converting each dispute focus in the focus keywords into dispute focus word vectors, and taking the mean value of the dispute focus word vectors as dispute focus sentence vectors to obtain focus type sentence vectors;
acquiring a focus sentence in the referee document sample, and converting the focus sentence into a focus description sentence vector;
traversing the focus description sentence vector through the focus type sentence vector, calculating a distance value of the focus type through a similarity, and selecting the focus type with the distance value meeting a preset condition as a focus type candidate set of the similarity model.
4. The method of claim 1, wherein after intersecting the candidate set of focus types corresponding to the rule model and the candidate set of focus types corresponding to the similarity model, the method further comprises:
when the number of the focus types in the intersection is 0, acquiring the probability value of each focus type;
and taking the focus type with the maximum probability value as the focus type corresponding to the extracted dispute focus.
5. An apparatus for classifying a label at a point of dispute, the apparatus comprising:
the rule acquisition module is used for acquiring a focus positioning rule and a referee document sample, wherein the focus positioning rule is used for positioning the focus position of the referee document;
the sample marking module is used for marking the focus position of the referee document sample according to the focus positioning rule to obtain the referee document sample with marks;
the focus extraction module is used for extracting a dispute focus in the referee document sample with the label and acquiring a classification model and a type list of the dispute focus; wherein the classification model comprises a rule model and a similarity model;
the focus classification module is used for obtaining a focus type Hou Xuanji corresponding to the rule model according to the dispute focus type list, the rule model and the extracted dispute focus; obtaining a focus type Hou Xuanji corresponding to the similarity model according to the dispute focus type list, the similarity model and the extracted dispute focus; obtaining the intersection of the focus type candidate set corresponding to the rule model and the focus type candidate set corresponding to the similarity model; when the number of the focus types in the intersection is 1, taking the focus types in the intersection as the focus types corresponding to the extracted dispute focuses; when the number of the focus types in the intersection is larger than 1, obtaining a probability value and a corresponding distance value of each focus type; and calculating the product of the probability value of each focus type and the reciprocal of the corresponding distance value, and taking the focus type with the maximum product as the focus type corresponding to the extracted dispute focus.
6. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 4.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910185041.5A CN109992664B (en) | 2019-03-12 | 2019-03-12 | Dispute focus label classification method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910185041.5A CN109992664B (en) | 2019-03-12 | 2019-03-12 | Dispute focus label classification method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109992664A CN109992664A (en) | 2019-07-09 |
CN109992664B true CN109992664B (en) | 2023-04-18 |
Family
ID=67130621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910185041.5A Active CN109992664B (en) | 2019-03-12 | 2019-03-12 | Dispute focus label classification method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992664B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321439B (en) * | 2019-07-10 | 2022-02-25 | 北京市律典通科技有限公司 | Electronic annotation management method and system |
CN110825872B (en) * | 2019-09-11 | 2023-05-23 | 成都数之联科技股份有限公司 | Method and system for extracting and classifying litigation request information |
CN110825879B (en) * | 2019-09-18 | 2024-05-07 | 平安科技(深圳)有限公司 | Decide a case result determination method, device, equipment and computer readable storage medium |
CN110674633A (en) * | 2019-09-18 | 2020-01-10 | 平安科技(深圳)有限公司 | Document review proofreading method and device, storage medium and electronic equipment |
CN110765266B (en) * | 2019-09-20 | 2022-07-22 | 成都星云律例科技有限责任公司 | Method and system for merging similar dispute focuses of referee documents |
CN112580338A (en) * | 2019-09-27 | 2021-03-30 | 北京国双科技有限公司 | Method and device for determining dispute focus, storage medium and equipment |
CN110889502B (en) * | 2019-10-15 | 2024-02-06 | 东南大学 | Deep learning-based dispute focus generation method |
CN111159017A (en) * | 2019-12-17 | 2020-05-15 | 北京中科晶上超媒体信息技术有限公司 | Test case generation method based on slot filling |
CN111814477B (en) * | 2020-07-06 | 2022-06-21 | 重庆邮电大学 | Dispute focus discovery method and device based on dispute focus entity and terminal |
CN112487146B (en) * | 2020-12-02 | 2022-05-31 | 重庆邮电大学 | Legal case dispute focus acquisition method and device and computer equipment |
CN113553856B (en) * | 2021-06-16 | 2022-08-26 | 吉林大学 | Deep neural network-based dispute focus identification method |
CN113792545B (en) * | 2021-11-16 | 2022-03-04 | 成都索贝数码科技股份有限公司 | News event activity name extraction method based on deep learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766426A (en) * | 2017-09-14 | 2018-03-06 | 北京百分点信息科技有限公司 | A kind of file classification method, device and electronic equipment |
CN108280149A (en) * | 2018-01-04 | 2018-07-13 | 东南大学 | A kind of doctor-patient dispute class case recommendation method based on various dimensions tag along sort |
CN109033105A (en) * | 2017-06-09 | 2018-12-18 | 北京国双科技有限公司 | The method and apparatus for obtaining judgement document's focus |
CN109033041A (en) * | 2017-06-09 | 2018-12-18 | 北京国双科技有限公司 | The treating method and apparatus of document similarity |
CN109359175A (en) * | 2018-09-07 | 2019-02-19 | 平安科技(深圳)有限公司 | Electronic device, the method for lawsuit data processing and storage medium |
-
2019
- 2019-03-12 CN CN201910185041.5A patent/CN109992664B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033105A (en) * | 2017-06-09 | 2018-12-18 | 北京国双科技有限公司 | The method and apparatus for obtaining judgement document's focus |
CN109033041A (en) * | 2017-06-09 | 2018-12-18 | 北京国双科技有限公司 | The treating method and apparatus of document similarity |
CN107766426A (en) * | 2017-09-14 | 2018-03-06 | 北京百分点信息科技有限公司 | A kind of file classification method, device and electronic equipment |
CN108280149A (en) * | 2018-01-04 | 2018-07-13 | 东南大学 | A kind of doctor-patient dispute class case recommendation method based on various dimensions tag along sort |
CN109359175A (en) * | 2018-09-07 | 2019-02-19 | 平安科技(深圳)有限公司 | Electronic device, the method for lawsuit data processing and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109992664A (en) | 2019-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992664B (en) | Dispute focus label classification method and device, computer equipment and storage medium | |
CN110765265B (en) | Information classification extraction method and device, computer equipment and storage medium | |
CN110021439B (en) | Medical data classification method and device based on machine learning and computer equipment | |
US11941366B2 (en) | Context-based multi-turn dialogue method and storage medium | |
CN110096570B (en) | Intention identification method and device applied to intelligent customer service robot | |
CN109087205B (en) | Public opinion index prediction method and device, computer equipment and readable storage medium | |
CN110442859B (en) | Labeling corpus generation method, device, equipment and storage medium | |
CN115599901B (en) | Machine question-answering method, device, equipment and storage medium based on semantic prompt | |
CN112052684A (en) | Named entity identification method, device, equipment and storage medium for power metering | |
CN107102993B (en) | User appeal analysis method and device | |
CN111695335A (en) | Intelligent interviewing method and device and terminal equipment | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN113724819B (en) | Training method, device, equipment and medium for medical named entity recognition model | |
CN109885830A (en) | Sentence interpretation method, device, computer equipment | |
CN113742733A (en) | Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device | |
CN114417785A (en) | Knowledge point annotation method, model training method, computer device, and storage medium | |
CN111400340A (en) | Natural language processing method and device, computer equipment and storage medium | |
CN113868422A (en) | Multi-label inspection work order problem traceability identification method and device | |
CN116304748A (en) | Text similarity calculation method, system, equipment and medium | |
CN113536784A (en) | Text processing method and device, computer equipment and storage medium | |
CN113761875B (en) | Event extraction method and device, electronic equipment and storage medium | |
CN113343711A (en) | Work order generation method, device, equipment and storage medium | |
CN113052487A (en) | Evaluation text processing method and device and computer equipment | |
CN116702765A (en) | Event extraction method and device and electronic equipment | |
CN111126064A (en) | Money identification method and device, computer equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |