CN109543178A - A kind of judicial style label system construction method and system - Google Patents
A kind of judicial style label system construction method and system Download PDFInfo
- Publication number
- CN109543178A CN109543178A CN201811294777.8A CN201811294777A CN109543178A CN 109543178 A CN109543178 A CN 109543178A CN 201811294777 A CN201811294777 A CN 201811294777A CN 109543178 A CN109543178 A CN 109543178A
- Authority
- CN
- China
- Prior art keywords
- label
- vocabulary
- text
- judicial
- accuracy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title claims abstract description 30
- 238000012360 testing method Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 29
- 239000013598 vector Substances 0.000 claims description 20
- 230000011218 segmentation Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 238000002360 preparation method Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 208000001613 Gambling Diseases 0.000 description 4
- 230000008451 emotion Effects 0.000 description 4
- 239000003814 drug Substances 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000002790 cross-validation Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002574 poison Substances 0.000 description 1
- 231100000614 poison Toxicity 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Probability & Statistics with Applications (AREA)
- Technology Law (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
This application provides a kind of judicial style label system construction method and systems.Judicial vocabulary text is obtained by participle tool, primary label system is constructed according to word frequency statistics, the label of semantic similarity in primary label system is merged, jerky label is extended, extension tag system is obtained, using text test set, counts the accuracy of extension tag system search text, it verifies whether current extension tag system constructs completion, otherwise advanced optimizes label system.It realizes to different Legal constructions targetedly label system, substantially increases the search precision of judicial style.
Description
Technical field
This application involves natural language processing field, in particular to a kind of judicial style label system construction method and it is
System.
Background technique
With the disclosure and transparence of legal field, more and more judgement documents are placed under the supervision of the public.According to
Chinese judgement document nets statistics, currently existing more than 5,000 ten thousand documents online, and is incremented by with daily 30,000 or so scale.However, method
The growth of rule textual resources also brings a series of problems, such as memory capacity is increasing, and search speed is slower and slower, search knot
Not the problems such as fruit is not expectation information.These problems cause the service efficiency of Law Text resource to reduce.In order to solve these problems,
Law Text is handled.Internet mass data processing common method is to carry out data label, i.e. vector space mould
Type technology (Vector Space Model).Data are processed into a series of keywords (Term) or label, then utilize this
A little keywords generate index code.Law Text processing equally uses this model, the difference is that how label defines.
Has a large amount of work in terms of text label extraction.Patent CN201510697001 is proposed to existing short message text
This, excavates class short message of putting up a notice by writing regular expression;Using the XX of excavation as the identity label information of short message text;It is right
Such notice class short message text identity excavated takes the highest identity label information of frequency to make by way of taking threshold value
For the final identity label information of the service number.And this identity label can new message arrive when real-time update.Patent
CN201710541481 proposes a kind of text label generation method, corresponding by the way that each tag types are respectively adopted for target text
Strategy carry out keyword extraction, after obtaining the candidate label of each tag types of the target text, to each tag types
Candidate label, cross validation is carried out between different tag types, finally according to by verifying candidate label, determine mesh
Mark the target labels of text.Due to respectively for different tag types including entity word, segment text and/or topic,
Tag extraction is carried out, and carries out cross validation, to improve the accuracy of tag extraction, label in the prior art is solved and mentions
The not high technical problem of the accuracy taken.Patent CN201711213971 proposes a kind of generation method of text label word.Firstly,
The label word in text is extracted, according to the label word of extraction and preset label word relationship, generation, which is mutually related, is grouped mark
Sign word;And then according to the incidence relation between each packet label word, packet label word is polymerize, and in preset label
Packet label word after searching the polymerization that can be completely covered herein in word dictionary, obtains combined label word;Last basis
Combined label word and preset label word relationship generate map tags word in the text.Can quickly, independently according to reality
Border demand is the corresponding label word of text generation, intervenes without professional.CN201510197328 proposes a kind of text label
Then extracting method, carries out theme prediction by Subject Clustering model, obtains prediction master firstly, carrying out text categories prediction
Topic then extracts text key word, finally, using text objects classification, target topic and target keyword as the text
Label.The label of text has different levels, meets varigrained Search Requirement, can also be mentioned according to different labels
For varigrained recommendation article.
Since Law Text specialized vocabulary is more, the features such as coincidence factor is high, above-mentioned text label extracting method are put in case dispute
It is unable to satisfy precise requirements.For this purpose, it is proposed that a kind of new label system, constructs label word by series of rulesization
Allusion quotation, and by law merit and the verifying of the corresponding relationship of law article and optimization label dictionary, improve the search precision of Law Text.
Summary of the invention
The problems such as present invention is more for Law Text specialized vocabulary, and case dispute point registration is high, propose a kind of judicial text
This label system construction method and system.Due to the advantages of combining machine learning and reinspection, on the basis of reducing manual intervention,
The precision of Law Text retrieval can be significantly improved.
A kind of judicial style label system construction method characterized by comprising
Obtain vocabulary text, vocabulary text refers to is solicited articles this form with vocabulary;
According to vocabulary text word frequency and/or combination word frequency, candidate label is selected, obtains primary label system;
According to the similarity of primary label system acceptance of the bid label, merging and/or extension tag obtain extension tag system;
The accuracy that text is searched for according to extension tag system determines that final label system construction is completed.
Further, vocabulary text is obtained, comprising: construct judicial vocabulary, participle tool is added in judicial vocabulary
Judicial style cutting is obtained vocabulary text by Custom Dictionaries;
Wherein, the judicial vocabulary of the building, comprising:
Preparation vocabulary is added in the entry of law dictionary and legal profession dictionary etc.;
The combination word frequency for counting conventional word adds the conventional word combination that combination word frequency meets given threshold I as new term
Enter prepared vocabulary;
Preparation vocabulary is added in the non-correct specialized vocabulary of cutting by reinspection;
Obtain judicial vocabulary.
Further, according to vocabulary text word frequency and combination word frequency, candidate label is selected, obtains primary label system, packet
It includes:
Length of window K is defined, the number that any M word combination of method statistic traversed using window is occurred will occur
Vocabulary in the highest N number of combination of number counts the word frequency of single vocabulary in the keyword, word frequency is met as keyword
Primary label system is added as candidate label in the vocabulary of given threshold II.
Further, the similarity of label, calculation method include:
Label similarity weight p and semantic-based label similarity weight q based on character are set;
Obtain the label similarity sim (W1, W2) of label W1, W2 based on character, wherein sim (W1, W2)=label W1 and
The identical quantity of character/label W1 and label W2 character length the larger value in label W2;
Obtain the semantic-based label similarity score (W1, W2) of label W1, W2, wherein score (W1, W2) is label
The relevance values of W1 and label W2, relevance values obtain in the semantic model after making corpus training with judicial style;
Calculate similarity=p*sim (W1, W2)+q*score (W1, W2) of label.
Further,
Merge label, specially when the similarity of two labels meets the similar of given threshold III or described two labels
When R are spent before the label similarity value of the primary label system, by two Label Mergings, retain one of label, it will
Another label is removed from the primary label system;
Extension tag is specially set when the similarity of words several in semantic model or thesaurus and label word meets
When threshold value IV, using these words as the expansion word of this label word, primary label system is added in the extension vocabulary.
Further, the accuracy of text is searched for, calculation method includes:
Test set is established, test set includes sample set and object search collection.The each sample of sample set include problem with
And with the maximally related n merit of problem and maximally related m law article.Object search collection includes all merits and law article set;
The text label of the problems in sample drawn collection, merit and law article forms label vector;
The merit similar with problem and applicable law article for being concentrated object search using the method for Vectors matching are recommended
Come, wherein vector similarity is calculated using Euler's distance;
By recommending the control of merit, law article corresponding with sample set merit, law article, accuracy in computation, wherein accuracy
It is indicated using the average value of recall rate and accuracy, recall rate is also known as recall ratio, recall rate=find correct sample number/number out
According to correct sample numbers whole in collection;Accuracy is also known as precision ratio, and accuracy=find correct sample number/whole out is found out
Sample number.
Further, the accuracy of text is searched for, calculation method includes:
Preset sample set and object search collection, wherein sample set SS includes NC sample, a sample SiIncluding one
A search problem QiAnd vocabulary text collection X relevant to search problemi, the vocabulary text collection XiIncluding Hi word
Remittance text, Xi={ xi1,xi2,…,xiHi};Object search collection Y includes NS vocabulary text, Y={ y1,y2,…,yNS};
Using extension tag system, extension tag Z, the Z={ z of object search collection Y is obtained1,z2,…,zNS};
Extract a sample S in order from the sample seti, obtain described search problem QiLabel vector Ti;
Calculate label vector TiWith extension tag ZjSimilarity, take the highest preceding Hi extension tag of similarity corresponding
Vocabulary text, in contrast organizes T;
Calculate single search accuracy=control group T quantity/Hi identical with vocabulary text in set Xi;
Entire sample set is traversed, bat, the accuracy as described search text are calculated.
Further, the accuracy that text is searched for according to extension tag system determines that final label system construction is completed, packet
It includes:
When the accuracy for searching for text meets given threshold V, current extension tag system is final label system, no
Then, the numerical value for adjusting threshold value I, II, III, IV, updates current extension tag system, until the extension tag system of update is searched
The accuracy of Suo Wenben meets given threshold V, obtains final label system.
Further, the accuracy that text is searched for according to extension tag system determines that final label system construction is completed, packet
Include: when the accuracy for searching for text meets given threshold V, current extension tag system is final label system, otherwise, meter
The accuracy of search text after calculating the removal of a certain label, if accuracy than remove the accuracy that is obtained before the label it is constant or
Increase, then the label is removed from extension tag system, traverse all labels, obtain final label system.
A kind of judicial style label system construction system, including law vocabulary module, data acquisition module, participle mould
Block, primary label building module, extension tag module, verifying label model, optimization label model, wherein
Law vocabulary module is stored with law vocabulary, includes judicial relevant speciality vocabulary;
Data acquisition module acquires judicial style, is pre-processed;
General participle tool is added in law vocabulary by word segmentation module, and the judicial style provided data acquisition module is cut
Point, obtain judicial vocabulary text;
Primary label constructs module, obtains the judicial vocabulary text that word segmentation module provides, statistics word frequency and combination word frequency, mentions
It takes word frequency and combines vocabulary and combination vocabulary that word frequency meets given threshold II, as primary label system;
Extension tag module is stored with extension tag dictionary, counts the similarity of primary label system acceptance of the bid label, will meet
The label of given threshold III merges, and corresponding extension vocabulary is extracted from extension tag dictionary, primary label system is added,
Obtain extension tag system;
Verify label model, be stored with sample set and object search collection, sample set include several problem labels and with ask
Relevant judicial vocabulary text collection X is inscribed, object search collection includes several judicial vocabulary text collection Y, utilizes extension tag
System obtains the label of set Y, and problem label is extracted from sample set, and statistics Utilizing question tag search goes out the word in set Y
The accuracy of remittance text and the vocabulary text in set X;
Optimize label model, judge to verify whether the accuracy that label model provides meets given threshold V, meets then current
Label system be final label system;It is unsatisfactory for, then adjusts given threshold II, the extension tag in primary label building module
Given threshold III, given threshold IV in module.
Using at least one above-mentioned technical solution can reach it is following the utility model has the advantages that
In conjunction with the law vocabulary in a variety of sources, judicial vocabulary is constructed, the precision of word segmentation of Law Text is improved, it is high-precision
Word segmentation result is the basis of follow-up text processing.
Using automatic keyword extraction and part-of-speech tagging method, primary label system is established.
Based on layering thought, different label dictionaries are established to different laws, label system is constructed, can effectively eliminate law
Between chiasma interference.
Using a variety of semantic dependency method extension tag dictionaries, label system is filled, the non-standards such as spoken language are effectively eliminated
Term bring semantic ambiguity.
Using a large amount of merits as test set, label system is optimized based on subtraction verification method, while verifying label system
Validity.
Detailed description of the invention
Fig. 1 is flow chart involved in this specification embodiment.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and
Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one
Section Example, instead of all the embodiments.The embodiment of base in this manual, those of ordinary skill in the art are not having
Every other embodiment obtained under the premise of creative work is made, shall fall in the protection scope of this application.
Embodiment one provides a kind of judicial style label system construction method, specifically includes:
One, judicial style data collection and pretreatment.
Judicial style data are collected, such as: administration of justice document, including title of a cause, former defendant's information, law case is by, case
The fields such as part details, applicable law and specific law article;It is suitable in collection law, law article and its interpretative provisions, with judgement document
It is corresponding with law and specific law article.
Judicial style data prediction, removal case details, applicable law field are empty judicial style data, removal case
The text size of part details removes duplicate judicial style data lower than the judicial style data of setting case details threshold value.It is right
Every kind of common law major class, such as marriage and family, traffic safety etc. need to collect enough cases, guarantee the diversity of data and complete
Face property.
Two, vocabulary text is obtained, vocabulary text refers to is solicited articles this form with vocabulary.
Vocabulary text can be administration of justice document and carry out the text after word segmentation processing, be also possible in administration of justice document
A certain field corresponds to text and carries out the text after word segmentation processing, and vocabulary text acquisition methods can use one or more of side
Method.
A, vocabulary text is directly acquired, obtains or directly input vocabulary text from other systems.
In one embodiment, in vocabulary text about marriage law law article such as: ' bigamy ', ' spouse ', ' live together ', '
Two ', ' implement ', ' family ', ' front yard ', ' violence ', ' maltreat ', ' abandon ', ' family ', ' front yard ', ' member ', ' three ', ' gambling ', ' inhale
Poison ', ' bad habit ', ' refuse to mend one's ways despite repeated admonition ', ' four ', ' emotion ', ' or not, ' separation ', ' full ', ' two ', ' year ', ' five ', ' cause ', ' husband
Wife ', ' emotion ', ' rupture ', ' situation ', ' side ', ' declaration ', ' missing '.
B, judicial style is obtained, using participle tool by judicial style cutting, obtains vocabulary text.Existing participle work
Tool, such as jieba, the thulac of Tsinghua University, the hanltp of Harbin Institute of Technology, Fudan University funltp etc., these tools participle function is identical,
It is all made of default glossary and quick word cutting algorithm, can successfully be syncopated as everyday expressions and general professional word.
In one embodiment, judicial style is obtained, in judicial style such as about marriage law law article: " (one) bigamy
Or there is spouse person to live together with other people;(2) implement domestic violence or maltreat, desert one's wife and children member's;(3) there is gambling, take drugs
Etc. bad habits refuse to mend one's ways despite repeated admonition;(4) live apart Man Ernian's because being on bad terms with each other;(5) other situations for leading to the alienation of mutual affection.One
Side is declared missing, what another party took proceedings for divorce, should grant divorce."
Using participle tool thulac by judicial style cutting, vocabulary text is obtained, about marriage law in vocabulary text
Law article is such as: ' bigamy ', ' spouse ', ' live together ', ' two ', ' implement ', ' family ', ' front yard ', ' violence ', ' maltreat ', ' abandon ', ' family ', '
Front yard ', ' member ', ' three ', ' gambling ', ' take drugs ', ' bad habit ', ' refuse to mend one's ways despite repeated admonition ', ' four ', ' emotion ', ' or not, ' separation ', '
Full ', ' two ', ' year ', ' five ', ' cause ', ' man and wife ', ' emotion ', ' rupture ', ' situation ', ' side ', ' declaration ', ' missing '.
Existing participle tool for professional very strong law vocabulary can not correct word cutting, such as ' limitation civil acts
Ability people ', ' disease that should not be got married ' etc..It is positive and definitely goes out these vocabulary, customized law vocabulary need to be used.
C, judicial vocabulary is constructed, judicial vocabulary is added to the Custom Dictionaries of participle tool, is replaced in participle tool
Default glossary judicial style cutting is obtained into vocabulary text.Judicial vocabulary construction method:
C.1 vocabulary) is added in the entry of law dictionary and legal profession dictionary etc.;
C.2 conventional word combination) is formed into new term using combination word frequency statistic algorithm, is more than setting threshold by combination word frequency
Vocabulary, the frequency that combination word frequency refers to more than two words while occurring is added in the new term of value;
C.3) vocabulary is added to the Custom Dictionaries of participle tool, replace the default glossary in participle tool, will take charge of
Method text dividing obtains vocabulary text, artificial to recheck, and checks and compare one by one the word frequency of word cutting result including control cutting result
Statistics reinspection, adds to vocabulary for the non-correct specialized vocabulary of cutting;
C.4) using the vocabulary rechecked as judicial vocabulary.
In one embodiment, judicial style cutting is obtained into vocabulary text, about marriage law using judicial vocabulary
Certain law article is such as: ' bigamy ', ' have spouse person with other people live together ', ' two ', ' implement ', ' domestic violence ', ' maltreat ', ' desert one's wife and children
Member ', ' three ', ' gambling ', ' take drugs ', ' bad habit ', ' refuse to mend one's ways despite repeated admonition ', ' four ', ' be on bad terms with each other ', ' separation ', ' full ', ' two ', '
Year ', ' five ', ' cause ', ' man and wife ', ' break emotionally ', ' situation ', ' side ', ' declaration ', ' missing ', ' side ', ' propose ', ' from
Wedding ', ' lawsuit ', ' answer ', ' grant ', ' divorce '
Compared with directly utilizing participle tool thulac, using judicial vocabulary by judicial style cutting, to legal profession
Word such as ' domestic violence ', ' breaking emotionally ' etc. can correctly cut out.In conjunction with the law vocabulary in a variety of sources, judicial word is constructed
Remittance table, improves the precision of word segmentation of Law Text, and high-precision word segmentation result is the basis of follow-up text processing.
Further, part of speech inspection is carried out to the vocabulary in the vocabulary text of acquisition, retains noun, verb and adjective,
Remove other vocabulary.
Three, according to vocabulary text word frequency and/or combination word frequency, candidate label is selected, obtains primary label system.Word frequency refers to
The frequency or number that single vocabulary occurs;The frequency or number that combination word frequency refers to more than two vocabulary while occurring.It can use
One or more of mode.
A) word frequency for counting single vocabulary in vocabulary text, when word frequency is greater than the threshold value of setting, using the vocabulary as time
Label is selected, primary label system is added, until all glossary statistics terminate;
B) it using vocabulary adjacent two-by-two as combination, counts in vocabulary text and combines word frequency, sort from high to low, take combination
Primary label system is added as new term in the combination vocabulary of setting number of bits before word frequency sequence;
C) window co-occurrence method is used, length of window K is defined, any M vocabulary group of method statistic traversed using window
The number occurred is closed, using the vocabulary in the highest N number of combination of frequency of occurrence as keyword, counts single vocabulary in keyword
Primary label system is added as candidate label in word frequency, the vocabulary using word frequency beyond given threshold.
Further, the label in primary label system is screened using regularization, i.e., the vocabulary in primary label system disappears
Unless universal word and non-label vocabulary, wherein non-universal vocabulary is the vocabulary in preset non-universal vocabulary, such as name;
Non- label vocabulary is the vocabulary in preset non-label vocabulary, such as isolated verb.
Since law is composed a piece of writing the professional of difference and law, same target has different role, such as ' vapour under different laws
Vehicle ' it is a kind of property in marriage law, and ' motor vehicle ' this legal subject is represented in traffic method.Therefore, different laws are wanted
Using different label dictionaries, the label dictionary of a variety of laws forms a label system.
Using automatic keyword extraction and part-of-speech tagging method, primary label system is established, layering thought is based on, to difference
Law establishes different label dictionaries, constructs label system, can effectively eliminate the chiasma interference between law.
Four, according to the similarity of primary label system acceptance of the bid label, merging and/or extension tag, extension tag system is obtained.
Wherein, the similarity calculation of label can use one or more of mode.
In one embodiment, using the label similarity calculation method based on character, two marks are indicated with W1, W2
Label, W1={ w11,w12,…,w1e1, W2={ w21,w22,…,w2e2, wherein e1, e2 are the word for including of label W1, label W2
Accord with length, w11、w12、w1e1Respectively the 1st of label W1 the, 2, e1 character, w21、w22、w2e2Respectively the 1st of label W2 the, 2, e2
A character.
The identical quantity of character/label W1 and label W2 in similarity sim (W1, the W2)=label W1 and label W2 of label
Character length the larger value.
If label 1 is ' Mr. and Mrs ', label 2 is ' man and wife ', character length is respectively 2,2, and wherein character ' husband ' is identical, character
Identical number is 1, then the similarity of label is 0.5.
In one embodiment, using semantic-based label similarity calculation method, using such as Word2Vec,
The language models such as Glove construct semantic model;A large amount of various types of judicial styles are obtained as corpus, training semantic model;
Two labels are inputted into semantic model, obtain the correlation score (W1, W2) of two labels;The correlation of two labels is made
For the similarity of label.
Such as (' elder brother ', ' younger brother ') and (' elder brother ', ' motor vehicle ') two groups of words, after semantic model training, first group
The correlation of word is obviously greater than second group.
In one embodiment, using based on character and semantic label similarity calculation method, setting is based on character
With semantic label similarity weight p, q, the label similarity sim (W1, W2) of label W1, W2 based on character is obtained, obtains mark
Sign the semantic-based label similarity score (W1, W2) of W1, W2, the similarity of COMPREHENSIVE CALCULATING label: p*sim (W1, W2)+q*
score(W1,W2)。
Primary label system is a fairly simple word lists, and there may be the feelings of semantic similarity for some vocabulary in table
Condition needs to merge.In addition, vocabulary can not effectively be compatible with diversity semantic in real life in table, need to extend.
Merge and/or extension tag, acquisition extension tag system can use one or more of mode.
In one embodiment, when the similarity of two labels is more than the similarity of threshold value III or two labels in institute
When there are before the label similarity value of primary label system R, by two Label Mergings, retain one of label, by another
Label is removed from primary label system.When the similarity satisfaction of words several in semantic model or thesaurus and label word is set
When determining threshold value IV, using these words as the expansion word of this label word, primary label system is added in the extension vocabulary.
Such as: include in semantic model or thesaurus ' Mr. and Mrs ', ' object ' this 2 vocabulary, primary label system
Label word is ' man and wife ', calculates separately the similarity of vocabulary Yu label word, judges whether to meet threshold value IV, wherein ' Mr. and Mrs ' are full
Sufficient condition, the expansion word as ' man and wife '.
By tag extension, the table below for example is formed.This table is used for disambiguation, by the different tables of identical semanteme
It states and is unified for same words, complete text normalization.
1 marriage class label dictionary example of table
2 traffic class label dictionary example of table
In one embodiment, from the extension corresponding with the vocabulary in primary label system of extraction in extension tag dictionary
Primary label system is added in vocabulary, when the similarity of two labels in primary label body system is more than threshold value III or two labels
When similarity is R before the label similarity value of all primary label systems, by two Label Mergings, retain one of mark
Label, another label is removed from primary label system.
Five, the accuracy that text is searched for according to the extension tag system determines that final label system construction is completed.
The basic purposes of text label system is text search.Search accuracy is tied up to by compareing different editions label body
On difference, the effectiveness of label system can be verified.
In one embodiment, a kind of accuracy calculation method for searching for text is provided.
5.1) judicial style is obtained, the text of merit, law article relevant field in judicial style is extracted;According to the merit word
Remittance text and law article vocabulary text word frequency and/or combination word frequency, select candidate label, obtain primary label system;According to described
The similarity of primary label system acceptance of the bid label, merging and/or extension tag, obtain extension tag system.
5.2) test set is established, test set includes sample set and object search collection.The each sample of sample set includes one and asks
Topic and to the maximally related n merit of problem and the most related law article of m item.Object search collection includes all merits and applicable law
Law article set.
Such as the problem of sample set is that ' accident occurs in road of driving, breaks rear taillight by non-motor vehicle, how to pay for
Repay? ', with the maximally related merit of the problem 3, maximally related law article 6.
5.3) text label of the problems in sample drawn collection, merit and law article forms label vector.
5.4) merit similar with problem and applicable law article concentrated object search using the method for Vectors matching are pushed away
It recommends out, wherein vector similarity is calculated using Euler's distance, and vector subtracts each other and modulus is vector distance, and Euler's distance is most
Common vector distance calculation method.
5.5) by recommending the control of merit, law article corresponding with sample set merit, law article, accuracy in computation, wherein quasi-
Exactness indicates that recall rate is also known as recall ratio, recall rate=find correct sample out using the average value of recall rate and accuracy
Whole correct sample numbers in number/data set;Accuracy is also known as precision ratio, accuracy=find correct sample number/whole out
The sample number found out.
For example, share 5 recommendation results, correctly the result is that 2, recall rate is exactly 40%;Test set has 10 samples,
Identical as true value to the recommendation results of 5 samples, accuracy is just 50%.
3 object search collection example of table
1 label of merit | Merit 1 is applicable in law article | ×× method first | First strip label |
2 label of merit | Merit 2 is applicable in law article | ×× method Article 2 | Second strip label |
… | … | … | … |
Merit N label | Merit N is applicable in law article | Other methods × article | N strip label |
4 search result of table and true value comparative example
In one embodiment, a kind of accuracy calculation method for searching for text.
Preset sample set and object search collection, wherein sample set SS includes NC sample, a sample SiIncluding one
A search problem QiAnd vocabulary text collection X relevant to search problemi, vocabulary text collection XiIncluding Hi vocabulary text
This, Xi={ xi1,xi2,…,xiHi};Object search collection Y includes NS vocabulary text, Y={ y1,y2,…,yNS};
Using extension tag system, extension tag Z, the Z={ z of object search collection Y is obtained1,z2,…,zNS};
Extract a sample S in order from sample seti, obtain search problem QiLabel vector Ti;
Calculate label vector TiWith extension tag ZjSimilarity, take the highest preceding Hi extension tag of similarity corresponding
Vocabulary text, in contrast organizes T;
Calculate single search accuracy=control group T quantity/Hi identical with vocabulary text in set Xi;
Entire sample set is traversed, bat, the accuracy as described search text are calculated.
Further, when the accuracy for searching for text is greater than threshold value V, current extension tag system is final label body
Otherwise system optimizes label system.
Optimizing label system can be using the combination of following one or more of methods:
1) numerical value for adjusting threshold value I, II, III, IV, updates extension tag system, until searching for current extensions label system
The accuracy of Suo Wenben is greater than threshold value V, obtains final label system.
2) the accuracy calculation method for adjusting law vocabulary, the similarity calculating method of label, search text, updates and expands
Open up label system, until current extensions label system search text accuracy be greater than threshold value V, obtain final label system.
3) using current extension tag system as object, the accuracy of the search text after calculating a certain label removal, such as
Fruit accuracy is more constant or increase than removing the accuracy that obtains before the label, then removes the label from extension tag system,
All labels are traversed, final label system is obtained.
Embodiment two provides a kind of judicial style label system construction system, including the acquisition of law vocabulary module, data
Module, word segmentation module, primary label building module, extension tag module, verifying label model, optimization label model, wherein
Law vocabulary module is stored with law vocabulary, includes judicial relevant speciality vocabulary;
Data acquisition module acquires judicial style, is pre-processed;
General participle tool is added in law vocabulary by word segmentation module, and the judicial style provided data acquisition module is cut
Point, obtain judicial vocabulary text;
Primary label constructs module, obtains the judicial vocabulary text that word segmentation module provides, statistics word frequency and combination word frequency, mentions
It takes word frequency and combines vocabulary and combination vocabulary that word frequency meets given threshold II, as primary label system;
Extension tag module is stored with extension tag dictionary, counts the similarity of primary label system acceptance of the bid label, will meet
The label of given threshold III merges, and corresponding extension vocabulary is extracted from extension tag dictionary, primary label system is added,
Obtain extension tag system;
Verify label model, be stored with sample set and object search collection, sample set include several problem labels and with ask
Relevant judicial vocabulary text collection X is inscribed, object search collection includes several judicial vocabulary text collection Y, utilizes extension tag
System obtains the label of set Y, and problem label is extracted from sample set, and statistics Utilizing question tag search goes out the word in set Y
The accuracy of remittance text and the vocabulary text in set X;
Optimize label model, judge to verify whether the accuracy that label model provides meets given threshold V, meets then current
Label system be final label system;It is unsatisfactory for, then adjusts given threshold II, the extension tag in primary label building module
Given threshold III, given threshold IV in module.
Referring to Fig.1, a kind of judicial style label system construction system data process flow is as follows:
About 160,000 parts of judicial styles of nearly 10 years paper of civil judgement, including marriage class, traffic class judgement document are acquired, are carried out
Data prediction, comprising: removal case details, applicable law field are empty judicial style data, remove the text of case details
This length removes duplicate judicial style data lower than the judicial style data of setting case details threshold value, individually extracts judicial
The text of case details, applicable law and specific law article field in text.Common 170 multi-section of civil law is acquired, law is extracted
The text of two fields of clause and concrete regulation.
Using word segmentation module general participle tool is added in law vocabulary by participle, to the administration of justice text after data prediction
This cutting obtains judicial vocabulary text.
Primary label is constructed, word frequency is extracted and meets the vocabulary of given threshold as primary label.
Extension tag extracts corresponding extension vocabulary from extension tag dictionary.
Label is verified, is verified by law merit and the corresponding relationship of law article, control different editions extension tag is being searched for
Difference in accuracy.
Optimize label, whether the label after judging verifying label meets the requirements, and meets, then label system construction is completed;No
Meet, then feeds back to verifying label model.
Although depicting the application by embodiment, it will be appreciated by the skilled addressee that the application there are many deformation and
Variation is without departing from spirit herein, it is desirable to which appended embodiment includes these deformations and changes without departing from the application.
Claims (10)
1. a kind of judicial style label system construction method characterized by comprising
Obtain vocabulary text, the vocabulary text refers to is solicited articles this form with vocabulary;
According to the vocabulary text word frequency and/or combination word frequency, candidate label is selected, obtains primary label system;
According to the similarity of the primary label system acceptance of the bid label, merging and/or extension tag obtain extension tag system;
The accuracy that text is searched for according to the extension tag system determines that final label system construction is completed.
2. a kind of judicial style label system construction method according to claim 1, which is characterized in that the acquisition vocabulary
Text, comprising: construct judicial vocabulary, the judicial vocabulary is added to the Custom Dictionaries of participle tool, by judicial style
Cutting obtains vocabulary text;Wherein, the judicial vocabulary of the building, comprising:
Preparation vocabulary is added in the vocabulary of law dictionary and legal profession dictionary etc.;
The combination word frequency for counting conventional word adds the conventional word combination that the combination word frequency meets given threshold I as new term
Enter the prepared vocabulary;
Preparation vocabulary is added in the non-correct specialized vocabulary of cutting by reinspection;
Obtain the judicial vocabulary.
3. a kind of judicial style label system construction method according to claim 1, which is characterized in that according to the vocabulary
Text word frequency and combination word frequency, select candidate label, obtain primary label system, comprising:
Length of window K is defined, the number that any M word combination of method statistic traversed using window is occurred, by frequency of occurrence
Vocabulary in highest N number of combination counts the word frequency of single vocabulary in the keyword, the word frequency is met as keyword
The primary label system is added as candidate label in the vocabulary of given threshold II.
4. a kind of judicial style label system construction method according to claim 1, which is characterized in that the phase of the label
Like degree, calculation method includes:
Label similarity weight p and semantic-based label similarity weight q based on character are set;
Obtain the label similarity sim (W1, W2) of label W1, W2 based on character, wherein sim (W1, the W2)=label W1 and
The identical quantity of character/label W1 and label W2 character length the larger value in label W2;
The semantic-based label similarity score (W1, W2) of label W1, W2 is obtained, wherein the score (W1, W2) is label
The relevance values of W1 and label W2, the relevance values obtain in the semantic model after making corpus training with judicial style;
Calculate similarity=p*sim (W1, W2)+q*score (W1, W2) of label.
5. a kind of judicial style label system construction method according to claim 1, which is characterized in that
The merging label, specially when the similarity of two labels meets the similar of given threshold III or described two labels
When R are spent before the label similarity value of the primary label system, by two Label Mergings, retain one of label, it will
Another label is removed from the primary label system;
The extension tag is specially set when the similarity of words several in semantic model or thesaurus and label word meets
When threshold value IV, using these words as the expansion word of this label word, primary label system is added in the extension vocabulary.
6. a kind of judicial style label system construction method according to claim 1, it is characterised in that: described search text
Accuracy, calculation method includes:
Establish test set, test set includes sample set and object search collection, each sample of sample set include a problem with
And with the maximally related n merit of problem and maximally related m law article, described search object set includes all merits and law article collection
It closes;
The text label of the problems in sample drawn collection, merit and law article forms label vector;
The merit similar with problem in described search object set and the law article being applicable in are recommended using the method for Vectors matching
Come, wherein vector similarity is calculated using Euler's distance;
By recommending the control of merit, law article corresponding with the sample set merit, law article, accuracy in computation, wherein accuracy
It is indicated using the average value of recall rate and accuracy, the recall rate is also known as recall ratio, the recall rate=find out correctly
Whole correct sample numbers in sample number/data set;The accuracy is also known as precision ratio, the accuracy=find out correctly
The sample number for sample number/all find out.
7. a kind of judicial style label system construction method according to claim 1, which is characterized in that described search text
Accuracy, calculation method includes:
Preset sample set and object search collection, wherein sample set SS includes NC sample, a sample SiIt is searched including one
Suo Wenti QiAnd vocabulary text collection X relevant to search problemi, the vocabulary text collection XiIncluding Hi vocabulary text
This, Xi={ xi1,xi2,…,xiHi};Described search object set Y includes NS vocabulary text, Y={ y1,y2,…,yNS};
Using the extension tag system, extension tag Z, the Z={ z of described search object set Y is obtained1,z2,…,zNS};
Extract a sample S in order from the sample seti, obtain described search problem QiLabel vector Ti;
Calculate label vector TiWith extension tag ZjSimilarity, take the corresponding vocabulary of the highest preceding Hi extension tag of similarity
In contrast text organizes T;
Calculate single search accuracy=control group T quantity/Hi identical with vocabulary text in set Xi;
Entire sample set is traversed, bat, the accuracy as described search text are calculated.
8. a kind of judicial style label system construction method according to claim 1, which is characterized in that according to the extension
Label system searches for the accuracy of text, determines that final label system construction is completed, comprising: when the accuracy of search text meets
When given threshold V, current extension tag system is final label system, otherwise, adjusts the numerical value of threshold value I, II, III, IV,
Update current extension tag system, until update extension tag system search text accuracy meet given threshold V, obtain
Obtain final label system.
9. a kind of judicial style label system construction method according to claim 1, which is characterized in that according to the extension
Label system searches for the accuracy of text, determines that final label system construction is completed, comprising: when the accuracy of search text meets
When given threshold V, current extension tag system is final label system, otherwise, the search text after calculating a certain label removal
This accuracy marks the label from extension if accuracy is more constant or increase than removing the accuracy that obtains before the label
It is removed in label system, traverses all labels, obtain final label system.
10. a kind of judicial style label system construction system, including law vocabulary module, data acquisition module, word segmentation module,
Primary label building module, extension tag module, verifying label model, optimization label model, wherein
The law vocabulary module is stored with law vocabulary, includes judicial relevant speciality vocabulary;
The data acquisition module acquires judicial style, is pre-processed;
General participle tool is added, the department provided the data acquisition module in the law vocabulary by the word segmentation module
Method text dividing obtains judicial vocabulary text;
The primary label constructs module, obtains the judicial vocabulary text that the word segmentation module provides, counts word frequency and group
Word frequency is closed, word frequency is extracted and combines vocabulary and combination vocabulary that word frequency meets given threshold II, as primary label system;
The extension tag module is stored with extension tag dictionary, counts the similarity of the primary label system acceptance of the bid label, will
The label for meeting given threshold III merges, and is extracted described in corresponding extension vocabulary addition from the extension tag dictionary
Primary label system obtains extension tag system;
The verifying label model, is stored with sample set and object search collection, the sample set include several problem labels and
Administration of justice vocabulary text collection X relevant to problem, described search object set include several judicial vocabulary text collection Y, are utilized
The extension tag system obtains the label of set Y, and problem label is extracted from the sample set, and statistics Utilizing question label is searched
Rope goes out the accuracy of the vocabulary text in set Y and the vocabulary text in set X;
Whether the optimization label model, the accuracy for judging that the verifying label model provides meet given threshold V, meet then
Current label system is final label system;Be unsatisfactory for, then adjust given threshold II in the primary label building module,
Given threshold III, given threshold IV in the extension tag module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811294777.8A CN109543178B (en) | 2018-11-01 | 2018-11-01 | Method and system for constructing judicial text label system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811294777.8A CN109543178B (en) | 2018-11-01 | 2018-11-01 | Method and system for constructing judicial text label system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109543178A true CN109543178A (en) | 2019-03-29 |
CN109543178B CN109543178B (en) | 2023-02-28 |
Family
ID=65846358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811294777.8A Active CN109543178B (en) | 2018-11-01 | 2018-11-01 | Method and system for constructing judicial text label system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109543178B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110675241A (en) * | 2019-08-15 | 2020-01-10 | 上海新颜人工智能科技有限公司 | Label calibration system and method |
CN110929513A (en) * | 2019-10-31 | 2020-03-27 | 北京三快在线科技有限公司 | Text-based label system construction method and device |
CN110928981A (en) * | 2019-11-18 | 2020-03-27 | 佰聆数据股份有限公司 | Method, system and storage medium for establishing and perfecting iteration of text label system |
CN111177388A (en) * | 2019-12-30 | 2020-05-19 | 联想(北京)有限公司 | Processing method and computer equipment |
CN111221974A (en) * | 2020-04-22 | 2020-06-02 | 成都索贝数码科技股份有限公司 | Method for constructing news text classification model based on hierarchical structure multi-label system |
CN111353045A (en) * | 2020-03-18 | 2020-06-30 | 智者四海(北京)技术有限公司 | Method for constructing text classification system |
CN111524043A (en) * | 2020-04-24 | 2020-08-11 | 南京擎盾信息科技有限公司 | Method and device for automatically generating litigation risk assessment questionnaire |
CN111666771A (en) * | 2020-06-05 | 2020-09-15 | 北京百度网讯科技有限公司 | Semantic label extraction device, electronic equipment and readable storage medium of document |
CN112084290A (en) * | 2019-06-13 | 2020-12-15 | 北京沃东天骏信息技术有限公司 | Data retrieval method, device, equipment and storage medium |
CN112148868A (en) * | 2020-09-27 | 2020-12-29 | 南京大学 | Law recommendation method based on law co-occurrence |
CN112365372A (en) * | 2020-10-09 | 2021-02-12 | 银江股份有限公司 | Judgment document oriented quality detection and evaluation method and system |
CN112925902A (en) * | 2021-02-22 | 2021-06-08 | 新智认知数据服务有限公司 | Method and system for intelligently extracting text abstract in case text and electronic equipment |
CN113065312A (en) * | 2020-01-02 | 2021-07-02 | 北京沃东天骏信息技术有限公司 | Text label extraction method and device |
CN113505192A (en) * | 2021-05-25 | 2021-10-15 | 平安银行股份有限公司 | Data tag library construction method and device, electronic equipment and computer storage medium |
CN113948087A (en) * | 2021-09-13 | 2022-01-18 | 北京数美时代科技有限公司 | Voice tag determination method, system, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004318381A (en) * | 2003-04-15 | 2004-11-11 | National Institute Of Advanced Industrial & Technology | Similarity computing method, similarity computing program, and computer-readable storage medium storing it |
JP2017078919A (en) * | 2015-10-19 | 2017-04-27 | 日本電信電話株式会社 | Word expansion device, classification device, machine learning device, method, and program |
CN106682149A (en) * | 2016-12-22 | 2017-05-17 | 湖南科技学院 | Label automatic generation method based on meta-search engine |
CN107577785A (en) * | 2017-09-15 | 2018-01-12 | 南京大学 | A kind of level multi-tag sorting technique suitable for law identification |
-
2018
- 2018-11-01 CN CN201811294777.8A patent/CN109543178B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004318381A (en) * | 2003-04-15 | 2004-11-11 | National Institute Of Advanced Industrial & Technology | Similarity computing method, similarity computing program, and computer-readable storage medium storing it |
JP2017078919A (en) * | 2015-10-19 | 2017-04-27 | 日本電信電話株式会社 | Word expansion device, classification device, machine learning device, method, and program |
CN106682149A (en) * | 2016-12-22 | 2017-05-17 | 湖南科技学院 | Label automatic generation method based on meta-search engine |
CN107577785A (en) * | 2017-09-15 | 2018-01-12 | 南京大学 | A kind of level multi-tag sorting technique suitable for law identification |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112084290A (en) * | 2019-06-13 | 2020-12-15 | 北京沃东天骏信息技术有限公司 | Data retrieval method, device, equipment and storage medium |
CN112084290B (en) * | 2019-06-13 | 2024-04-05 | 北京沃东天骏信息技术有限公司 | Data retrieval method, device, equipment and storage medium |
CN110675241A (en) * | 2019-08-15 | 2020-01-10 | 上海新颜人工智能科技有限公司 | Label calibration system and method |
CN110929513A (en) * | 2019-10-31 | 2020-03-27 | 北京三快在线科技有限公司 | Text-based label system construction method and device |
CN110928981A (en) * | 2019-11-18 | 2020-03-27 | 佰聆数据股份有限公司 | Method, system and storage medium for establishing and perfecting iteration of text label system |
CN111177388A (en) * | 2019-12-30 | 2020-05-19 | 联想(北京)有限公司 | Processing method and computer equipment |
CN111177388B (en) * | 2019-12-30 | 2023-07-21 | 联想(北京)有限公司 | Processing method and computer equipment |
CN113065312A (en) * | 2020-01-02 | 2021-07-02 | 北京沃东天骏信息技术有限公司 | Text label extraction method and device |
CN111353045A (en) * | 2020-03-18 | 2020-06-30 | 智者四海(北京)技术有限公司 | Method for constructing text classification system |
CN111353045B (en) * | 2020-03-18 | 2023-12-22 | 智者四海(北京)技术有限公司 | Method for constructing text classification system |
CN111221974B (en) * | 2020-04-22 | 2020-08-14 | 成都索贝数码科技股份有限公司 | Method for constructing news text classification model based on hierarchical structure multi-label system |
CN111221974A (en) * | 2020-04-22 | 2020-06-02 | 成都索贝数码科技股份有限公司 | Method for constructing news text classification model based on hierarchical structure multi-label system |
CN111524043A (en) * | 2020-04-24 | 2020-08-11 | 南京擎盾信息科技有限公司 | Method and device for automatically generating litigation risk assessment questionnaire |
CN111666771A (en) * | 2020-06-05 | 2020-09-15 | 北京百度网讯科技有限公司 | Semantic label extraction device, electronic equipment and readable storage medium of document |
CN111666771B (en) * | 2020-06-05 | 2024-03-08 | 北京百度网讯科技有限公司 | Semantic tag extraction device, electronic equipment and readable storage medium for document |
CN112148868A (en) * | 2020-09-27 | 2020-12-29 | 南京大学 | Law recommendation method based on law co-occurrence |
CN112365372B (en) * | 2020-10-09 | 2024-01-12 | 银江技术股份有限公司 | Quality detection and evaluation method and system for referee document |
CN112365372A (en) * | 2020-10-09 | 2021-02-12 | 银江股份有限公司 | Judgment document oriented quality detection and evaluation method and system |
CN112925902B (en) * | 2021-02-22 | 2024-01-30 | 新智认知数据服务有限公司 | Method, system and electronic equipment for intelligently extracting text abstract from case text |
CN112925902A (en) * | 2021-02-22 | 2021-06-08 | 新智认知数据服务有限公司 | Method and system for intelligently extracting text abstract in case text and electronic equipment |
CN113505192A (en) * | 2021-05-25 | 2021-10-15 | 平安银行股份有限公司 | Data tag library construction method and device, electronic equipment and computer storage medium |
CN113948087A (en) * | 2021-09-13 | 2022-01-18 | 北京数美时代科技有限公司 | Voice tag determination method, system, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109543178B (en) | 2023-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543178A (en) | A kind of judicial style label system construction method and system | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN110442760B (en) | Synonym mining method and device for question-answer retrieval system | |
CN106598944B (en) | A kind of civil aviaton's security public sentiment sentiment analysis method | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN105022725B (en) | A kind of text emotion trend analysis method applied to finance Web fields | |
CN103678576B (en) | The text retrieval system analyzed based on dynamic semantics | |
CN102929937B (en) | Based on the data processing method of the commodity classification of text subject model | |
CN107818138A (en) | A kind of case legal regulation recommends method and system | |
CN106951438A (en) | A kind of event extraction system and method towards open field | |
CN101751455B (en) | Method for automatically generating title by adopting artificial intelligence technology | |
CN106844424A (en) | A kind of file classification method based on LDA | |
CN107153658A (en) | A kind of public sentiment hot word based on weighted keyword algorithm finds method | |
CN107239439A (en) | Public sentiment sentiment classification method based on word2vec | |
CN111190900B (en) | JSON data visualization optimization method in cloud computing mode | |
CN103309862B (en) | Webpage type recognition method and system | |
CN107315734B (en) | A kind of method and system to be standardized based on time window and semantic variant word | |
CN104281645A (en) | Method for identifying emotion key sentence on basis of lexical semantics and syntactic dependency | |
CN106599032A (en) | Text event extraction method in combination of sparse coding and structural perceptron | |
CN106294744A (en) | Interest recognition methods and system | |
CN109960756A (en) | Media event information inductive method | |
CN107291895B (en) | Quick hierarchical document query method | |
CN105843796A (en) | Microblog emotional tendency analysis method and device | |
CN101923556B (en) | Method and device for searching webpages according to sentence serial numbers | |
CN105512333A (en) | Product comment theme searching method based on emotional tendency |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province Applicant after: Yinjiang Technology Co.,Ltd. Address before: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province Applicant before: ENJOYOR Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |