CN106991092A - The method and apparatus that similar judgement document is excavated based on big data - Google Patents
The method and apparatus that similar judgement document is excavated based on big data Download PDFInfo
- Publication number
- CN106991092A CN106991092A CN201610038106.XA CN201610038106A CN106991092A CN 106991092 A CN106991092 A CN 106991092A CN 201610038106 A CN201610038106 A CN 201610038106A CN 106991092 A CN106991092 A CN 106991092A
- Authority
- CN
- China
- Prior art keywords
- text
- keyword
- word
- judgement document
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The purpose of the application is a kind of method and apparatus that similar judgement document is excavated based on big data, by obtaining the published judgement document of magnanimity, and obtain the case of each judgement document by;Content of text based on each judgement document obtains some keyword relational informations on the text subject characteristic information of judgement document's merit and on party's dispute content and party's demand content in the judgement document, and sets up based on the text subject characteristic information and some keyword relational informations the Text eigenvector of the judgement document;The feature dictionary on keyword is updated based on some keyword relational informations, effectively each judgement document of magnanimity is accurately shown in the form of Text eigenvector, and have updated feature dictionary on keyword, so as to the similar judgement document of quick obtaining, the operating efficiency efficiency for improving and excavating similar judgement document has been reached.
Description
Technical field
The application is related to computer realm, more particularly to a kind of based on the similar judgement document of big data excavation
Technology.
Background technology
With developing rapidly for Internet technology, explosive growth is presented in the text data information on network,
However, finding out a small amount of effective text data information in the text data information of these magnanimity just
Become more and more difficult.For example, the automatically request-answering system, intelligent retrieval system, mail in magnanimity are sieved
Select system etc. to exist to find out effective text data information in the system of a large amount of text data informations and get over
Carry out more difficult and time consuming effort.
In the prior art, in Court business scene, judge is to making facts confirmation in part of trying a case and sentencing
Certainly, it is necessary to excavate effective similar judgement document in advance or in real time before result.For example, the people
Law court is similar in trial merit by relatively multiple judges, during the close different cases of party's demand
Court verdict, it is whether reasonable with the court verdict for the judge that audits;Meanwhile, judge is actually hearing and decide a case
During, the judgement document of the similar existing case of merit can be also referred to, the fact that form final is assert
With the court verdict of judgement document.Because people's court is in the reality of the effective similar judgement document of search
Depend on what substantial amounts of manpower mark and search, time-consuming effort again, moreover manpower were searched out in the operation of border
The quality of similar judgement document places one's entire reliance upon personal experience, it is impossible to better meet Court business demand,
Cause inefficiency;Again due to the different style of court record judgement documents at different levels, crucial merit and
The demand of party's key is generally excavated by search pattern or traditional natural language processing method, is held
The merit made mistake and party's demand are easily excavated, party's dispute point can not be excavated especially, causes to dig
The accuracy of the effective similar judgement document excavated is low;Again due to for examining input case
Confidentiality, it is impossible to which input in real time causes inquiry similar in the similar judgement document of a text query that tries a case
Judgement document poor real, while when inquiring similar judgement document, due to similar sanction
The word of sentencing document is more, content is complicated and judgement document court verdict need it is artificial extract, cause to look into
The visualization of the court verdict of the similar judgement document ask is low, causes law court's processing trying a case
The inefficiency during business of example text sheet.
Therefore, in the prior art, due to searching a certain input case text in the text data of magnanimity
Similar judgement document take time and effort, poor real and accuracy it is low, cause normal process search industry
The inefficiency of business.
The content of the invention
The purpose of the application is to provide a kind of method and apparatus that similar judgement document is excavated based on big data,
To solve to search a certain input case text in the published judgement document of magnanimity in the prior art
Similar judgement document take time and effort, poor real and accuracy it is low, cause normal process search industry
The problem of inefficiency of business.
It is used to the first equipment end there is provided one kind according to the one side of the application excavate based on big data
The method of similar judgement document, including:
Obtain the published judgement document of magnanimity, and obtain the case of each judgement document by;
Content of text based on each judgement document obtains the text on judgement document's merit
This theme feature information and in the judgement document party dispute content and party's demand in
Some keyword relational informations held, and based on the text subject characteristic information and some keys
Word relevant information sets up the Text eigenvector of the judgement document;
The feature dictionary on keyword is updated based on some keyword relational informations.
It is used to the second equipment end there is provided one kind according to further aspect of the application dig based on big data
The method for digging similar judgement document, including:
Input case text is obtained, based on the feature dictionary in searching database on keyword, is extracted
Some candidate keywords of the input case text;
Content of text and some candidate keywords based on the input case text obtain described
The text subject characteristic information and some keyword relational informations of case text are inputted, and based on the text
This theme feature information and some keyword relational informations set up the text of the input case text
Eigen vector;
From the searching database obtain with it is described input case text have phase accomplice by it is some
The judgement document of candidate;
Calculate the Text eigenvector of the judgement document of the candidate and the text of the input case text
The similarity of eigen vector, similar judgement document is chosen based on the similarity.
It is used to excavate similar judge based on big data there is provided one kind according to further aspect of the application
First equipment of document, including:
Judgement document's acquisition device, for obtaining the published judgement document of magnanimity, and obtains each institute
State the case of judgement document by;
Text feature excavating gear, for based on each judgement document content of text obtain on
The text subject characteristic information of judgement document's merit and striven on party in the judgement document
Some keyword relational informations of content and party's demand content are discussed, and it is special based on the text subject
Reference ceases the Text eigenvector that the judgement document is set up with some keyword relational informations;
Feature dictionary sets up device, for being updated based on some keyword relational informations on key
The feature dictionary of word.
It is used to excavate similar judge based on big data there is provided one kind according to further aspect of the application
Second equipment of document, including:
Input unit, for obtain input case text, based in searching database on keyword
Feature dictionary, extracts some candidate keywords of the input case text;
Case text feature excavating gear is inputted, for the content of text based on the input case text
And if some candidate keywords obtain the input case text text subject characteristic information and
Dry keyword relational information, and it is related to some keywords based on the text subject characteristic information
Information sets up the Text eigenvector of the input case text;
Candidate's judgement document's acquisition device, for being obtained and the input case from the searching database
Example text originally have phase accomplice by some candidates judgement document;
Similar judgement document's acquisition device, for calculate the candidate judgement document text feature to
The similarity of the Text eigenvector of amount and the input case text, phase is chosen based on the similarity
As judgement document.
It is used to excavate similar judge based on big data there is provided one kind according to further aspect of the application
The system of document, the system includes the first equipment and the second equipment, wherein,
First equipment includes:Judgement document's acquisition device, for obtaining the published judge of magnanimity
Document, and obtain the case of each judgement document by;Text feature excavating gear, for based on every
The content of text of judgement document described in one obtains the text subject feature on judgement document's merit
Information and some passes on party's dispute content and party's demand content in the judgement document
Keyword relevant information, and based on the text subject characteristic information and some keyword relational informations
Set up the Text eigenvector of the judgement document;Feature dictionary sets up device, for based on some institutes
State feature dictionary of the keyword relational information renewal on keyword;Text structure makeup is put, for inciting somebody to action
The judgement document carries out structuring processing, obtains the text structure information after structuring;Text knot
Structure information acquisition device, for the sanction based on judgement document described in the text structure acquisition of information
Sentence relevant information, judge's relevant information include party's information, case type, case by and judgement
As a result;Dispensing device, for by the Text eigenvector of all judgement documents, the Feature Words
Storehouse and judge's relevant information are sent into the searching database of the second equipment;
Second equipment includes:Reception device, for receiving the first equipment institute from the first equipment
Text eigenvector, the feature dictionary and the judge of the open judgement document obtained is related
Information, and preserve into the searching database, it is described judge relevant information include party's information,
Case type, case by and court verdict;Text structure information receiver, for receiving described
The text carried out the judgement document after the resulting structuring of structuring processing transmitted by one equipment
Structured message;Text structure information acquisition device, for obtaining the similar judgement document's
Text structure information;Input unit, for obtaining input case text, based in searching database
On the feature dictionary of keyword, some candidate keywords of the input case text are extracted;Input
Case text feature excavating gear, for the content of text based on the input case text and some institutes
State text subject characteristic information and some keywords that candidate keywords obtain the input case text
Relevant information, and set up based on the text subject characteristic information and some keyword relational informations
The Text eigenvector of the input case text;Candidate's judgement document's acquisition device, for from described
In searching database obtain with it is described input case text have phase accomplice by some candidates judge
Document;Similar judgement document's acquisition device, the text feature of the judgement document for calculating the candidate
The similarity of the Text eigenvector of case text is inputted described in vector sum, is chosen based on the similarity
Similar judgement document.
Compared with prior art, one kind according to embodiments herein is used for the first equipment end group
The method and apparatus of similar judgement document is excavated in big data, by obtaining the published judge's text of magnanimity
Book, and obtain the case of each judgement document by;Content of text based on each judgement document
Obtain on the text subject characteristic information of judgement document's merit and in the judgement document
Some keyword relational informations of party's dispute content and party's demand content, and based on the text
The text that this theme feature information and some keyword relational informations set up the judgement document is special
Levy vector;The published each judgement document of magnanimity is effectively passed through into judge's text with judgement document
The text subject characteristic information of writing desk feelings and in the judgement document party dispute content and work as
These three key elements of some keyword relational informations of thing people's demand content excavate the text of judgement document
Characteristic vector, and accurately being shown in the form of Text eigenvector, it is to avoid artificial time-consuming consumption
The magnanimity judgement document for removing to power many analysis words, content complexity and different style, so that effectively
Improve the operating efficiency for excavating similar judgement document;And based on some keyword relational informations
The feature dictionary on keyword is updated, effectively by the content of text of judgement document with all passes
The form for the feature dictionary that keyword and its word theme feature and expansion word are set up carries out height identification, makes
Can the similar judgement document of quick obtaining and its corresponding Text eigenvector, reached raising dig
The effect of the operating efficiency of the similar judgement document of pick.
Further, one kind according to embodiments herein is used for the second equipment end based on big number
According to the method and apparatus for excavating similar judgement document, by obtaining input case text first, based on inspection
Feature dictionary in rope database on keyword, some candidates for extracting the input case text are closed
Keyword so that input case text obtains keyword and can found in searching database, so that effectively
Improve the lookup for carrying out similar judgement document to input case text by keyword in ground;It is then based on
The content of text and some candidate keywords of the input case text obtain the input case
The text subject characteristic information and some keyword relational informations of text, and it is special based on the text subject
Reference cease and some keyword relational informations set up it is described input case text text feature to
Amount, effectively can be expressed the relevant information for inputting case text by the form of Text eigenvector
Out;Finally from the searching database obtain with it is described input case text have phase accomplice by
The judgement document of some candidates;Calculate the Text eigenvector of the judgement document of the candidate and described defeated
Enter the similarity of the Text eigenvector of case text, similar judge's text is chosen based on the similarity
Book, effectively by the Text eigenvector of the judgement document of the candidate sent from the first equipment and in real time
The Text eigenvector for the input case text excavated carries out Similarity Measure, obtains similar judge
Document, enabling rapidly accurately filtered out and input case from the published judgement document of magnanimity
Example text sheet similar judgement document, it is to avoid artificial go that analysis word is more, content complexity with taking time and effort
And the judgement document of the magnanimity of different style, so as to be effectively improved the work for excavating Similar Text
Efficiency.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, this Shen
Other features, objects and advantages please will become more apparent upon:
Fig. 1 shows that be used for the first equipment end according to one kind of the application one side is excavated based on big data
The method flow schematic diagram of similar judgement document;
Fig. 2 shows to be based on for the first equipment end according to a preferred embodiment of the application one side
Big data excavates the method flow schematic diagram of the Text eigenvector of judgement document;
Fig. 3 shows that be used for the second equipment end according to one kind of the application one side is excavated based on big data
The method flow schematic diagram of similar judgement document;
Fig. 4 shows to be based on for the second equipment end according to a preferred embodiment of the application one side
Big data excavates the method flow schematic diagram of the Text eigenvector of judgement document;
Fig. 5 is shown according to a kind of based on the similar judgement document's of big data excavation of the application one side
Holistic approach schematic flow sheet;
Fig. 6 shows to be used to excavate similar judge's text based on big data according to one kind of the application one side
The structural representation of first equipment of book;
Fig. 7 shows to be used for excavation of first equipment based on big data according to a kind of of the application one side
The law court of similar judgement document examines the schematic flow sheet for respectively sentencing the stage;
Fig. 8 shows to be dug for cloud computing server according to a preferred embodiment of the application one side
Dig the structural representation of the text feature excavating gear 12 of the Text eigenvector of judgement document;
Fig. 9 shows to be used to excavate similar judge's text based on big data according to one kind of the application one side
The structural representation of second equipment of book;
Figure 10 shows to be used to dig based on big data according to a preferred embodiment of the application one side
The input case text feature excavating gear 22 dug in law court's intranet server of similar judgement document
Structural representation;
Figure 11 shows that being based on big data according to one kind of the application one side excavates similar judgement document
System schematic.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows that be used for the first equipment end according to one kind of the application one side is excavated based on big data
The method flow schematic diagram of similar judgement document.The method comprising the steps of S11, step S12 and step
S13。
Wherein, the step S11:The published judgement document of magnanimity is obtained, and obtains each sanction
Sentence the case of document by;The step S12:Content of text based on each judgement document, which is obtained, to close
The party in the text subject characteristic information of judgement document's merit and on the judgement document
Some keyword relational informations of content of disputing on and party's demand content, and based on the text subject
Characteristic information and some keyword relational informations set up the Text eigenvector of the judgement document;
The step S13:The feature dictionary on keyword is updated based on some keyword relational informations.
In step s 11, wherein the case of the judgement document by include but is not limited to contract dispute case by,
Matrimonial dispute case by the infringement of, ownership and voluntary service dispute case by and applicable special procedure case case
By etc..Certainly, the case of the judgement document in existing and all Court business scenes for being likely to occur from now on
By if being applicable the application, adducible mode is contained in the application.
In step s 13, wherein the feature dictionary on keyword includes, magnanimity is published to cut out
Sentence the corresponding expansion word relevant information of all keyword relational informations and keyword of document.
Here, judgement document that the judgement document includes but is not limited in Court business scene etc., bag
Include Court of First Instance assert true document, Court of Second Instance assert true document, Court of Retrial assert true document,
The bill of complaint, billof defence, inquiry record and testimony of witnesses etc..
The detailed of specific embodiment is carried out to the application by taking the judgement document in Court business scene as an example below
Explain.Certainly, it is specific real using being carried out exemplified by the judgement document in Court business scene to the application herein
Explaining in detail for example is applied, purpose only by way of example, embodiments herein not limited to this, other
Software program in can equally realize following embodiments.
Due to more than, judgement document in people's court's business scenario not only word and content is complicated, and
And due to the difference of region so that the record different style of judgement document, therefore need to be disclosed to magnanimity
Judgement document carry out text feature processing so that judicial functionary can as soon as possible from magnanimity
The similar judgement document of demand is found out in disclosed judgement document, wherein should come in terms of three below
The judgement document of search need, three aspects be respectively judgement document's merit, party dispute in
Hold and party's demand content.
It should be noted that the text subject characteristic information includes but is not limited in Court business scene
On judgement document's merit in judgement document, the keyword includes but is not limited in Court business scene
Party dispute content and party's demand content etc. in judgement document, below with working as in judgement document
Thing people dispute content and party's demand content are for the keyword of judgement document and on judgement document's merit
The judgement document is entered for the preferred embodiment that text subject characteristic information is the application one side
The excavation of row Text eigenvector.
One preferred embodiment of the application one side is by obtaining the published judgement document of magnanimity
And obtain the case of each judgement document by;Content of text based on each judgement document is obtained
On the text subject characteristic information of judgement document's merit and on working as thing in the judgement document
Some keyword relational informations of people's dispute content and party's demand content, and based on the text master
Topic characteristic information and some keyword relational informations set up the text feature of the judgement document to
Amount, due to by extracting being used as with party's dispute content and party's demand content in judgement document
Keyword, and the extraction word relevant with party's demand content with party's dispute content is used as key
The expansion word of word shows the judgement document in Court business scene in the form of Text eigenvector
Come, and excavate the content on judgement document's merit as the special information of text subject so that be efficiently accurate
It is true by more than word and the content of text of the complicated judgement document of content is accurately expressed so that
Judicial functionary can quickly move through judgement document's merit, party's dispute content and party's demand
Content search is further, related based on some keywords to required similar judgement document
Feature dictionary of the information updating on keyword so that judicial functionary can input keyword and its
While expansion word, found as soon as possible from feature dictionary relevant with the keyword and its expansion word inputted
Judgement document, be effectively improved the operating efficiency in Court business scene.
Specifically, in the step S11, the published judgement document of magnanimity is obtained.For example,
The published judgement document of magnanimity is captured in Court business scene, because according to the rule of the Supreme People's Court
Fixed, almost all of judgement document is required for external disclosure, therefore is authorized by the Supreme People's Court
Afterwards, published all judgement documents can be captured;And the acquisition published judgement document of magnanimity
Judgement document institutes all in Court business scene can be captured by a common webpage capture device right
The title answered, content, judgement is numbered, judgement law court, judge, the information such as time decision.
Further, step is also included after the step S11 and before the step S12
S14 (not shown) and step S15 (not shown), the step S14 (not shown) are cut out described
Sentence document and carry out structuring processing, obtain the text structure information after structuring;The step S15
Judge relevant information of the (not shown) based on judgement document described in the text structure acquisition of information,
It is described judge relevant information include party's information, case type, case by and court verdict.
In embodiments herein, the step S14 (not shown) is mainly in the step S11
The published judgement document of magnanimity of middle acquisition carries out Text Pretreatment and structuring processing.For example, will
In step s 11 from Court business scene by webpage capture to the published judgement document of magnanimity
Afterwards, it is necessary to extract the content of text of captured judgement document, the word processing to judgement document is carried out
With structuring processing.In the step S14 (not shown), pass through webpage segmentation method first
(pageparse) judgement document's Chinese version content is extracted, in the webpage segmentation method (pageparse)
In the main content that different piece in judgement document is extracted by configuration webpage template;Then by inciting somebody to action
The characters such as the Chinese space in judgement document are substituted for English, and numerical value is normalized into Arabic numerals, goes
Except newline in document content, normalization document numbering and justice court title etc. are carried out to judgement document
Text Pretreatment;Then structuring processing is carried out to the judgement document of the process Text Pretreatment, its
In, in terms of the structuring processing includes following four:(1), extract judgement document in plaintiff,
Defendant's name, normalization expression title and plaintiff and defendant in content, (two), extract judgement document
In case type, wherein the case type is broadly divided into criminal suit, civil appeal, administration is told
Dispute, IP dispute, written verdict, compensate case, perform 7 big judgement document's types such as case,
(3), structuring extract judgement document in case case by, and normalize to people's court trial
Standard case is by the case in storehouse by upper, and the court verdict of judgement document is extracted in (four), structuring, i.e., mainly
Extract court verdict object, principal penalty, accessary penalty, indemnity and party's victory or defeat etc..
Further, content of text of the step S12 based on each judgement document obtain on
The text subject characteristic information of judgement document's merit and striven on party in the judgement document
Some keyword relational informations of content and party's demand content are discussed, and it is special based on the text subject
Reference ceases the Text eigenvector that the judgement document is set up with some keyword relational informations, tool
Body, the specific implementation procedures of step S12 are as shown in Fig. 2 wherein, Fig. 2 is shown according to this
Apply for that a preferred embodiment of one side is used for the first equipment end and is based on big data excavation judge's text
The method flow schematic diagram of the Text eigenvector of book.The step S12 specifically include step S121,
Step S122, step S123 and step S124.
Wherein, the step S121 includes:Extract the text subject characteristic information of the judgement document
With the word theme feature of each word in the judgement document;The step S122 includes:Obtain
Context relation between each described word, the word based on each word of the context relation amendment
Theme feature, and based on each revised word word theme feature and the text master
The matching degree of characteristic information is inscribed, the keyword relational information of some judgement documents is determined, wherein,
The keyword relational information includes keyword, keyword importance information and the corresponding word of keyword
Theme feature;The step S123 includes:Based on the keyword relational information, described cut out is updated
Sentence the text subject characteristic information of document;The step S124 includes:It is related based on the keyword
Acquisition of information expansion word relevant information, the expansion word relevant information includes the expansion word of the keyword
With the expansion word degree of correlation, and set up based on the keyword relational information and the expansion word relevant information
Bag of words characteristic information, and based on the text subject characteristic information and the bag of words characteristic information updated,
Determine the Text eigenvector of the judgement document.
Specifically, in the step S121, the text subject characteristic information tool of the judgement document
Body is used for the merit for indicating the judgement document, in the embodiment of the present application preferably by topic model
Method come extract acquisition judgement document text subject characteristic information and each word word theme
Feature, wherein the topic model method is consistent with agent model method of the prior art.Certainly,
Other it is existing or be likely to occur from now on extract judgement document in text subject characteristic informations and each
The method of the word theme feature of word is such as applicable to the application, should also be included in the application protection domain
Within, and be incorporated herein by reference herein.
Further, the context words that the step S122 includes obtaining between each described word are total to
Existing relation;Obtain the context transfer probability between word described in any two;Based on the cliction up and down
Language cooccurrence relation and the context transfer probability, correct the word theme feature of each word;It is based on
The word theme feature of each revised word and the text subject characteristic information
With degree, the keyword relational information of some judgement documents is determined, wherein, the keyword phase
Closing information includes keyword, keyword importance information and the corresponding word theme feature of keyword.
In embodiments herein, the step S122 depends on the sanction extracted in step S121
The text subject characteristic information of document and the word theme feature of each word are sentenced, according to each institute of acquisition
Context words cooccurrence relation between predicate language;Obtain the context transfer between word described in any two
Probability;Based on the context words cooccurrence relation and the context transfer probability, each word is corrected
The word theme feature of language;Based on the word theme feature of each revised word and described
The matching degree of text subject characteristic information, determines the keyword and its correspondingly of some judgement documents
Word theme feature, and obtain the importance information of the keyword.For example, for a judge
I-th of word Wi in document Ds, if making the corresponding theme topic of the word be Tj, root
Understand that the transition probability that word Wi occurs in judgement document Ds is according to topic model method:
Pj (Wi | Ds)=P (Wi | Tj) × P (Tj | Ds);Wherein, P (Wi | Tj) under a theme Tj word Wi turn
Probability is moved, P (Tj | Ds) is the transition probability of the theme Tj in a judgement document Ds, is then enumerated one by one
The theme topic of word, obtains all transition probability Pj (Wi | Ds), wherein j values are 1 to k
Natural positive integer, according to all transition probabilities of gained be judgement document Ds in i-th
Word Wi selects a theme topic, wherein, most simply conventional method is to take to make Pj (Wi | Ds) value
Maximum theme Tj, i.e. max [j] Pj (Wi | Ds);If then i-th of word in judgement document Ds
Wi have selected a theme different from the word theme feature obtained in step S121 at this moment
Topic, will be to each theme in the transition probability and judgement document of the word under given theme
Transition probability impact accordingly, due to the transition probability of the word under given theme
With the transition probability of each theme in judgement document again can be in turn influence word Wi in judge's text
The calculating of the transition probability occurred in book Ds, therefore the once transfer is carried out to all judgement documents
Probability P j (Wi | Ds) calculating, and reselect the word theme topic of word and regard an iteration as.
After so according to above method n loop iteration of progress, the word master after judgement document's convergence is obtained
Inscribe the keyword that the corresponding word of feature is judgement document, the corresponding word theme feature of the keyword
As determined after iteration, the keyword determined by the method in above-described embodiment can more have
Effect expresses the keyword of the judgement document and its word feature of keyword exactly.
In embodiments herein, in the step S123, based on true in the step S122
Fixed keyword relational information, updates the text subject characteristic information of the judgement document.For example, logical
Below equation is crossed to update the text subject characteristic information of judgement document:
Wherein D represents the text subject characteristic information after updating, and judgement document includes n key
Word, wiIt is importance information of i-th of keyword in judgement document, IiIt is keyword wiWord
Theme feature, by the word theme feature weighted sum to the keyword in above judgement document, is obtained
The text subject characteristic information of judgement document, can effectively remove unessential word in judgement document
With the influence to building text subject characteristic information.
Further, expansion word is obtained based on the keyword relational information in the step S124
Relevant information, the expansion word relevant information includes the expansion word and the expansion word degree of correlation of the keyword.
Wherein described expansion word includes the synonym and the keyword of the keyword in the judgement document
The word of middle height correlation.In embodiments herein, by the theme for calculating any two word
Characteristic similarity, to excavate synonym.For example, for keyword A, taking similarity highest some
Individual word, is used as keyword A synonym.Wherein, by excavating the word algorithm of height correlation
(word2vector) come the word of the height correlation that calculates keyword, the algorithm is to each word meter
Term vector is calculated, the term vector similarity of any two word is then calculated, to excavate the word of height correlation
Language.For example, for keyword A, taking several words of term vector similarity highest, being used as key
The word of word A height correlation.
Further, expansion word correlation is obtained based on the keyword relational information in step S124
Information, the expansion word relevant information includes the expansion word and the expansion word degree of correlation of the keyword, and
Bag of words characteristic information is set up based on the keyword relational information and the expansion word relevant information, specifically
Ground, the step S124 includes being based on the keyword and its corresponding word theme feature, it is determined that
The expansion word and the expansion word degree of correlation of the keyword, wherein, the expansion word includes the keyword
Synonym and in the judgement document height correlation correlation word;Based on the keyword and its
Corresponding word theme feature and the expansion word and the expansion word degree of correlation, using bag of words, set up
Bag of words characteristic information.
In the embodiment of the present application, the bag of words characteristic information is used to indicate the keyword in judgement document
And its corresponding word feature of expansion word.In bag of words characteristic information, the characteristic value of keyword feature is
Importance information of the keyword in judgement document, the characteristic value of synonym feature is keyword importance
The product of information and synonymous degree, the characteristic value of correlation word feature is keyword importance information and phase
The product of pass degree.For example, it is assumed that one 100,000 different words are had in all judgement documents, that
The bag of words characteristic information of each judgement document is the vector of 100,000 dimensions, and the position is marked per dimensional vector
Whether the word put occurs in judgement document.For example, it is assumed that word word1 is bag of words characteristic information
In the 1st dimension, word word2 is the 2nd dimension in bag of words characteristic information, and word word3 is bag of words
The 10th dimension in characteristic information, word word4 is the 30th dimension in bag of words characteristic information, word3
With word1 similar words each other, similarity is weight13, word4 and word2 similar word each other
Language, similarity is weight24;Wherein judgement document A comprising word word1, word3 and
Weight4, and their importance information in A are respectively weight1, weight3, weight4,
So judgement document A bag of words characteristic information the 1st dimension characteristic value be
Weight1+weight13*weight3, the characteristic value of the 2nd dimension is weight24*weight4, the 10th
The characteristic value of dimension is weight3+weight1*weight13, and the characteristic value of the 30th dimension is weight4.
Wherein, the word feature of the word of the height correlation of keyword can also be obtained by above computational methods
Characteristic value, therefore the characteristic value in the bag of words characteristic information of gained includes the word theme feature of keyword
The corresponding characteristic value of the word theme feature of corresponding characteristic value and expansion word.
Further, the step S124 is based on the text subject characteristic information and institute's predicate updated
Bag characteristic information, determines the Text eigenvector of the judgement document, specifically, the step S124
Including the updated text subject characteristic information and the bag of words characteristic information are merged, really
The urtext feature of the fixed judgement document;Entered by the urtext feature to the judgement document
Row feature normalizing, determines the Text eigenvector of the judgement document.
For example, by the text subject characteristic information of the judgement document obtained in the step S123 and
Bag of words characteristic information is spliced into a characteristic vector, generates the urtext feature of judgement document.For example,
The text subject characteristic information of judgement document is the characteristic vector of one 10 dimension, and bag of words characteristic information is
The characteristic vector of one 100 dimension, the then urtext for having judgement document is characterized as one 110 spy tieed up
Levy vector.The feature normalization method for recycling machine learning field conventional, enters to urtext feature
Row feature normalizing, generates the Text eigenvector of judgement document.For example, it is assumed that all judgement documents
Same feature meets normal distribution, thus can by every dimensional feature normalizing into standard normal distribution.
Further, the step S13 is based on some keyword relational informations renewals on key
The feature dictionary of word, specifically, the step S13 are included using the keyword as index, to each
The word theme feature and expansion word of the keyword set up the feature dictionary on keyword.Example
Such as, in Court business scene, by the word of party's demand content in judgement document and thing is worked as
The word of people's dispute content based on keyword lookup and works as thing as the keyword for extracting judgement document
The related word of people's demand content and the related word of party's dispute content as keyword expansion
Open up word and feature extraction is carried out to judgement document, obtain the keyword of judgement document and the spy of expansion word composition
Levy dictionary.
Further, one kind of the one side of described the application is used for the similar text of the first equipment end excavation
This method also include step S16 (not shown) by the Text eigenvector of all judgement documents,
The feature dictionary and judge's relevant information are sent into the searching database of the second equipment.For example,
In Court business scene, by the Text eigenvector of the judgement document obtained in the step S12,
The feature dictionary of the judgement document obtained in the step S13 and (do not show in the step S14
Go out) in the text structure information of judgement document that obtains send to the second equipment so that the second equipment
In the calculating logic for relying on the feature dictionary that calculates of the first equipment and simplifying, it is ensured that the first equipment and the
Two equipment are directed to can export identical Text eigenvector and feature dictionary with a judgement document.
Fig. 3 shows that be used for the second equipment end according to one kind of the application one side is excavated based on big data
The method flow schematic diagram of similar judgement document.The method should include step S21, step S22, step
S23 and step S24.
Wherein, the step S21:Obtain input case text, based in searching database on close
The feature dictionary of keyword, extracts some candidate keywords of the input case text;The step S22:
Content of text and some candidate keywords based on the input case text obtain the input
The text subject characteristic information and some keyword relational informations of case text, and based on the text master
The text that topic characteristic information and some keyword relational informations set up the input case text is special
Levy vector;The step S23:Obtain and have with the input case text from the searching database
Have phase accomplice by some candidates judgement document;The step S24:Calculate the judge of the candidate
The similarity of the Text eigenvector of the Text eigenvector of document and the input case text, is based on
The similarity chooses similar judgement document.
It should be noted that it is described input case text include but is not limited to existing judgement document and
Example text of trying a case book.Certainly, other input case texts that are existing or being likely to occur from now on are such as applicable to
The application, should also be included within the application protection domain, and be incorporated herein by reference herein.
In embodiments herein, step S25 (not shown) is also included before the step S21,
Described in the step S25 (not shown) includes receiving acquired in first equipment from the first equipment
Text eigenvector, the feature dictionary and the judge's relevant information of open judgement document, and protect
Deposit into the searching database, it is described judge relevant information include party's information, case type,
Case by and court verdict.For example, the searching database on-line storage in Court business scene Intranet
The Text eigenvector of judgement document, the feature dictionary and affiliated judge's relevant information.Specific storage
The related information of judgement document include following eight aspects:(1), the case in every kind of judgement document
Type and case are by corresponding judgement document.Wherein, Key is case type and case by value is judge
Numbering of the document in internal system.(2), the structured message of existing judgement document.Wherein, Key
It is numbering of the judgement document in internal system, value is the text knot generated by structuring extraction module
Structure information.(3), the Text eigenvector of existing judgement document.Wherein, Key is judgement document
In the numbering of internal system, value is the Text eigenvector of text feature module generation.(4),
Whole keywords of existing judgement document.Wherein, Key is a constant, and value is keyword subject
Whole keywords of module generation.(5), the word theme feature of each keyword.Wherein, Key
It is keyword, value is the keyword word theme feature of keyword subject module generation.(6) it is, every
The synonym of individual keyword.Wherein, Key is keyword, and value is the synonym of keyword and its same
Adopted degree.(7), the related term of each keyword.Key is keyword, and value is the phase of keyword
Close word and its degree of correlation.(8), characteristic value mean variance of the judgement document per dimensional feature.Key is
Feature number, value is the average and variance of characteristic value.
It should be noted that the input case that the text type includes but is not limited in Court business scene
The case type of example text sheet, wherein the case type includes criminal suit, civil appeal, administration is told
Dispute, IP dispute, written verdict, compensate case, perform case and in example of trying a case in careful rank
Section.Certainly, other text types that are existing or being likely to occur from now on are such as applicable to the application, also should
Within the application protection domain, and it is incorporated herein by reference herein.
Further, the step S21 obtains input case text, based in searching database on
The feature dictionary of keyword, extracts some candidate keywords of the input case text, specifically,
The step S21 include obtain input case text, based on it is described input case text case by, from
Feature dictionary in searching database on keyword extracts some candidates of the input case text
Keyword.For example, searched in the published judgement document of magnanimity in Court business scene with it is described
Input the similar judgement document of case text, due to judgement document's merit case in Court business scene by
Type is different, therefore for the ease of rapidly finding the judgement document similar to input case text, then
Case based on input case text is by being extracted in the feature dictionary of keyword from searching database
With inputting the word that the word of case text mutually occurs simultaneously, some candidate keys of input case text are used as
Word, it can be ensured that the keyword that input case text mining comes out is present in searching database.
Further, if the step S22 include based on it is described input case text content of text and
The dry candidate keywords obtain the text subject characteristic information of the input case text and some passes
Keyword relevant information, and based on the text subject characteristic information and some keyword relational informations
The Text eigenvector of the input case text is set up, specifically, the step S22 is specifically held
Row process is as shown in figure 4, wherein, Fig. 4 shows to be preferable to carry out according to one of the application one side
Example is used for the method flow that the second equipment end excavates the Text eigenvector of judgement document based on big data
Schematic diagram.The step S22 specifically includes step S221, step S222 and step S223.
Wherein, the step S221 includes:Each word and institute based on the input case text
The whole keywords for having the judgement document are compared, and are waited with being extracted from the input case text
Keyword and its word theme feature are selected, and the input case is obtained based on the word theme feature
The text subject characteristic information of text;The step S222 includes:Obtain each candidate key
Context relation between word, the word theme based on each candidate keywords of the context relation amendment
Feature, and based on each revised candidate keywords word theme feature and the text
The matching degree of theme feature information, determines the keyword relational information of the input case text;Institute
Stating step S223 includes:Based on the keyword relational information, the input case text is updated
Text subject characteristic information and obtain expansion word relevant information, and based on the keyword relational information and
The expansion word relevant information sets up the bag of words characteristic information of the input case text, and based on institute more
New text subject characteristic information and the bag of words characteristic information, determines the text of the input case text
Eigen vector.
In the embodiment of the present application, it is defeated in real time that law court's Intranet in Court business scene mainly completes user
Enter the Text eigenvector of case text.Based on the input case text in the step S221
Each word and whole keywords of all judgement documents be compared, with from the input case
Candidate keywords and its word theme feature are extracted in example text sheet.For example, the method in Court business scene
Institute's Intranet excavates the input case text key word inputted online provided with a hypothesis:Online input
The input case text keyword, it is necessary to be also the keyword of existing judgement document.Therefore,
The module inquired about in the published judgement document of magnanimity with input case text have phase accomplice by
The content identical keyword of all being disputed on party's demand content and party of judgement document, and and
Input case text word takes common factor, as the candidate keywords of the input case text inputted online,
In being effectively guaranteed the keyword that goes out of input case text selecting and being all published judgement document
Keyword, so as to excavate the judge text similar to input case text in existing judgement document
Book and its corresponding Text eigenvector and feature, from all keywords in published judgement document
The middle candidate keywords for determining input case text cause on the basis of the judgement document of processing magnanimity
Simplify the calculating logic of input case text.
Specifically, the input case is obtained based on the word theme feature in the step S221
The text subject feature of example text sheet, the text subject of the judgement document is characterized as the judgement document's
Case type, extracts input case example text preferably by topic model method in the embodiment of the present application
The word theme feature of this text subject feature and each word, wherein the topic model method with
Agent model method of the prior art is consistent.Certainly, other existing or being likely to occur from now on are carried
Take the text subject feature in judgement document and the method for the word theme feature of each word for example applicable
In the application, it should also be included within the application protection domain, and be incorporated herein by reference herein.
Specifically, in the step S222, first between candidate keywords described in acquisition any two
Context transfer probability;Based on the context words cooccurrence relation and the context transfer probability,
Correct the word theme feature of each word;Based on each revised word word theme
Feature and the matching for obtaining the special information of the text subject using topic model in the step S221
Degree, determines the keyword and its corresponding word theme feature of the judgement document, and obtain described
The importance information of keyword.For example, for i-th of candidate key in input case text Ds
Word Wi, if making the corresponding theme topic of the candidate keywords be Tj, according to topic model method
Understand that the transition probability that candidate keywords Wi occurs in input case text Ds is:
Pj (Wi | Ds)=P (Wi | Tj) × P (Tj | Ds);Wherein, P (Wi | Tj) under a theme Tj word Wi turn
Probability is moved, P (Tj | Ds) is the transition probability of the theme Tj in a judgement document Ds, is then enumerated one by one
The theme topic of candidate keywords, obtains all transition probability Pj (Wi | Ds), wherein j values
It is input case text according to all transition probabilities of gained for 1 to k natural positive integer
I-th of candidate keywords Wi in Ds selects a theme topic, wherein, the side most simply commonly used
Method is to take the theme Tj for making Pj (Wi | Ds) value maximum, i.e. max [j] Pj (Wi | Ds);If then input case
I-th of candidate keywords Wi in this Ds of example text have selected one and in step S221 at this moment
The different theme topic of the word theme feature of acquisition, will turning to the word under given theme
The transition probability for moving probability and each theme in input case text is impacted accordingly, by institute
State the transition probability of the word under given theme and turning for each theme in input case text
Move influence candidate keywords Wi transfers for occurring in case text Ds is inputted that probability again can be in turn
The calculating of probability, therefore the transition probability Pj (Wi | Ds) calculating is carried out once to input case text,
And reselect the word theme topic of word and regard an iteration as.So enter according to the above method
After n loop iteration of row, the corresponding time of word theme feature inputted after the convergence of case text is obtained
It is the keyword of input case text to select keyword, and the corresponding word theme feature of the keyword is
Determined after iteration, the keyword determined by the method in above-described embodiment more can be effectively accurate
Really express the keyword of the input case text and its word feature of keyword so that based on pass
The text subject characteristic information that keyword is obtained more can be close with inputting the case type of case text, more
The particular content of the expression input case text of the energy degree of accuracy, so that by inputting case text
The similarity for the similar judgement document that text subject characteristic information is found is higher, improves lookup similar
Judgement document accuracy.
In embodiments herein, based on the keyword and its correspondence institute in the step S223
Predicate language theme feature, updates the text subject characteristic information of the input case text.For example, logical
Below equation is crossed to update the text subject characteristic information of input case text:
Wherein D represents the text subject characteristic information after updating, and text includes n keyword,
wiIt is importance information of i-th of keyword in input case text, IiIt is keyword wiWord
Theme feature, by the word theme feature weighted sum of the keyword in the input case text to more than,
Obtain inputting the text subject characteristic information of case text, can effectively remove in input case text
Unessential keyword and the influence to building text subject characteristic information.
Specifically, the step S223 is related to the expansion word based on the keyword relational information
Information is set up in described input this paper bag of words characteristic information, wherein, the expansion word bag of the keyword
Include the synonym of keyword and the word of the height correlation in the input case text.In the step
It is synonymous to excavate first by calculating the theme feature similarity of any two keyword in rapid S223
Word.For example, for keyword A, taking several words of similarity highest, being used as keyword A's
Synonym.Wherein, key is calculated by excavating the word algorithm (word2vector) of height correlation
The word of the height correlation of word, the algorithm calculates term vector to each word, then calculates any two
The term vector similarity of individual word, to excavate the word of height correlation.For example, for keyword A,
Several words of term vector similarity highest are taken, the word of keyword A height correlation is used as.Connect
Synonym and its synonym feature based on the keyword and inputted described in case text
The word and its related term feature of height correlation, obtain the expansion word correlation letter of the input case text
Breath, based on the keyword relational information and the expansion word relevant information, using bag of words, builds
The bag of words characteristic information of the vertical input case text.
In the embodiment of the present application, the bag of words characteristic information is used to indicate the pass in input case text
Keyword and its corresponding word feature of expansion word.In bag of words characteristic information, the feature of keyword feature
Value is importance information of the keyword in input case text, and the characteristic value of synonym feature is crucial
The product of word importance information and synonymous degree, the characteristic value of correlation word feature is keyword importance
The product of information and degree of correlation.For example, it is assumed that one having 100,000 not in the input case text
Same word, then the bag of words characteristic information of input case text is all the vectors of 100,000 dimensions, is often tieed up
Vector marks whether the word of the position occurs in input case text.For example, it is assumed that word word1
It is the 1st dimension in bag of words characteristic information, word word2 is the 2nd dimension in bag of words characteristic information, word
Language word3 is the 10th dimension in bag of words characteristic information, during word word4 is bag of words characteristic information
30th dimension, word3 and word1 similar word each other, similarity is weight13, word4 and
Word2 similar words each other, similarity is weight24;Wherein judgement document A includes word word1,
Word3 and weight4, and their importances in A are respectively weight1, weight3,
Weight4, then judgement document A bag of words characteristic information the 1st dimension characteristic value be
Weight1+weight13*weight3, the characteristic value of the 2nd dimension is weight24*weight4, the 10th
The characteristic value of dimension is weight3+weight1*weight13, and the characteristic value of the 30th dimension is weight4.
Wherein, the word feature of the word of the height correlation of keyword can also be obtained by above computational methods
Characteristic value, therefore the characteristic value of information includes the word theme feature of keyword in the bag of words feature of gained
The corresponding feature of the word theme feature of corresponding characteristic value and synonym and the word of height correlation
Value.
In embodiments herein, the step S223 is based on the text subject characteristic information updated
With the bag of words characteristic information, in the Text eigenvector for determining the input case text, specifically,
The updated text subject characteristic information and the bag of words characteristic information are merged, institute is determined
State the urtext feature of input case text;It is special by the urtext to the input case text
Progress feature normalizing is levied, the Text eigenvector of the input case text is determined.
For example, the text subject feature of the input case text obtained in the step S223 is believed
Breath and bag of words characteristic information are spliced into a characteristic vector, and the urtext of generation input case text is special
Levy.For example, the text subject characteristic information of input case text is the characteristic vector of one 10 dimension,
Bag of words characteristic information is the characteristic vector of one 100 dimension, then has the urtext of input case text special
Levy as the characteristic vector of one 110 dimension.The feature normalization method for recycling machine learning field conventional,
Feature normalizing, the Text eigenvector of generation input case text are carried out to urtext feature.For example,
Assuming that the same feature of input case text meets normal distribution, therefore can be by per dimensional feature normalizing
Into the normal distribution of standard.
In embodiments herein, the step S24 is based in the step S23 from the retrieval
In database obtain with it is described input case text have phase accomplice by some candidates judgement document,
Calculate the Text eigenvector of the judgement document of the candidate and the text spy of the input case text
The similarity of vector is levied, similar judgement document is chosen based on the similarity.
It should be noted that calculating the algorithm of the similarity of Text eigenvector in the step S24
Including but not limited to Euclidean distance algorithm and cosine similarity algorithm etc..Certainly, other existing or the presents
The algorithm of the similarity for the calculating Text eigenvector being likely to occur afterwards is such as applicable to the application, should also wrap
It is contained within the application protection domain, and is incorporated herein by reference herein.
For example, the case type and case case of the input case text inputted first according to user are by looking into
Ask same case type and case case by the existing judgement document of whole as the similar judge's text of candidate
Book, then retrieves the Text eigenvector of the similar judgement document of candidate.Then above-mentioned calculating text is used
The algorithm (Euclidean distance algorithm or cosine similarity algorithm) of characteristic vector similarity, calculates input
The similarity for inputting case text judgement document similar with each candidate.Then, it is defeated according to user
The number N of the similar judgement document of the demand entered, takes the N number of judgement document's conduct of similarity highest
Final required similar judgement document.Then the text structureization letter of similar judgement document is inquired about
Breath and judge's relevant information, and feed back to the user that demand obtains similar judgement document.Finally count
The court verdict of similar judgement document, by principal penalty, accessary penalty, indemnity, party's victory or defeat etc.
The dimension of text feature, in visual form, shows demand to obtain the use of similar judgement document
Family.Specifically, for example, the case type and case case of the input case text inputted according to user by,
Inquire about same case type and case case by the existing judgement document of whole as candidate judge text
Book has 100, of the judgement document for the candidate similar to input case text that user's request is returned
Number is 10, then the Text eigenvector for inputting case text is distinguished by above-mentioned similarity algorithm
Similarity Measure is carried out with the Text eigenvector of the judgement document of 100 candidates, and calculating is obtained
Similarity by from low to high order arrangement, take the judgement document of 10 candidates of similarity highest
As similar judgement document, and by the text structure information of 10 similar judgement documents
The user for the judgement document that to need acquisition similar is fed back to judge's relevant information.
Further, one kind of the one side of described the application is used for the similar text of the second equipment end excavation
This method also includes receiving transmitted by first equipment carries out structuring by the judgement document
Text structure information obtained by processing after structuring;Obtain the text of the similar judgement document
Structured message.For example, after by the Similarity Measure to the judgement document of candidate, by obtaining
What is had meets the text structure information of the similar judgement document of quantity required.
Fig. 5 is shown according to a kind of based on the similar judgement document's of big data excavation of the application one side
Holistic approach schematic flow sheet.Methods described include step S501, step S502, step S503,
Step S504, step S505, step S506, step S507, step S508, step S509,
Step S510 and step S511.
Wherein, the step S501 includes:Obtain magnanimity judgement document;The step S502 includes:
Text Pretreatment and structuring processing are carried out to the magnanimity judgement document;The step S503 includes:
Excavate the text subject characteristic information of judgement document;The step S504:Excavate magnanimity judgement document's
The feature dictionary of keyword relational information and foundation on keyword;The step S505 includes:It is raw
Into the Text eigenvector of judgement document;The step S506 includes:Text is judged described in on-line storage
The Text eigenvector and feature dictionary of book;The step S507 includes:Obtain input case text;
The step S508 includes:The text subject characteristic information and key of online mining input case text
Word relevant information;The step S509 includes:Online mining input case text text feature to
Amount;The step S510 includes:If online retrieving with input case text have phase accomplice by
The judgement document of dry candidate, and calculate the Text eigenvector of the judgement document of the candidate and described defeated
Enter the similarity of the Text eigenvector of case text;The step S511 includes:Obtain similar
Judgement document.
In embodiments herein, phase is excavated based on the published judgement document of magnanimity in Court business scene
As judgement document demand, by having obtained magnanimity after law court authorizes in step S501 described first
Disclosed judgement document, and described judgement document's progress Text Pretreatment is made in the step S502
Obtain judgement document to be converted into that the form of text mining can be carried out, while to judge's text after Text Pretreatment
Book carries out structuring processing and obtains text structure information, then passes through existing skill in the step S503
The text subject characteristic information that topic model method in art excavates judgement document to give expression to judge's text
Specific judgement document's merit of book.Because judgement document quantity is on the increase with timely in Court business
Between Court business scene in business it is busy so that use traditional artificial or natural language processing
Taken time and effort the judgement document that excavates similar, and the word in the published judgement document of magnanimity it is many and
Content is complicated, determines that the key element of similar judgement document is all hidden in the word of big section, therefore the application
Selected in the step S504 by based on input case text have phase accomplice by some candidates
Judgement document carry out party's demand content and party dispute content identical word excavated,
The keyword relational information of the judgement document of candidate is obtained, and the keyword relational information is expressed as
Whether the form of Text eigenvector more aspect quickly calculates judgement document with inputting case text phase
Seemingly, while will be disputed on content identical word phase with party's demand content of judgement document and party
The word of pass is used as the expansion word of the judgement document of candidate, and all keyword phases based on judgement document
Close information and expansion word relevant information sets up feature dictionary;Then in the step S505, it is based on
The text subject characteristic information and bag of words that the keyword relational information of the judgement document of candidate is updated are special
Reference ceases the Text eigenvector for obtaining judgement document, wherein the characteristic value in the Text eigenvector
As the word theme feature corresponding to keyword eigenvalue cluster into, per dimensional feature vector represent judge text
The vector of the same character representation of book;And then in the step S506, by all judge's texts
The Text eigenvector and feature dictionary of book all send into the searching database of the second equipment end progress
On-line storage, in case rapidly searching the similar judgement document of the input case text of input;Then
The input case text for requiring to look up similar judgement document is obtained in the step S507 again;Then
By means of the key of all judgement documents sended in the first equipment end in the step S508
Word relevant information inputs the text subject characteristic information and keyword relational information of case text to excavate;
And in the step S509 text subject characteristic information of the input case text based on acquisition and pass
The text subject characteristic information and bag of words that keyword relevant information obtains inputting after the renewal of case text are special
Reference is ceased, and the text subject characteristic information and bag of words characteristic information of the input case text are carried out
Merge the Text eigenvector for obtaining inputting case text;Then second in the step S510
Equipment end online retrieving with input case text have phase accomplice by some candidates judgement document, example
As case case is found out as the existing judgement document of whole as case type, and count respectively
Calculate the Text eigenvector of the Text eigenvector and input case text of the judgement document of these candidates
Similarity, and by the size of similarity according to sorting from high to low;Finally in the step S511
The middle quantity according to similar judgement document the need for input, will be similar in the step S510
The judgement document of the corresponding candidate of the identical quantity of degree sequence highest is used as the similar sanction for needing to obtain
Sentence document.
In Court business scene, it is necessary to by relatively multiple judges in trial merit similar, party
Court verdict during the close different cases of demand content, it is whether reasonable with the court verdict for the judge that audits;
Meanwhile, judge can also refer to the judgement of the similar existing case of merit during actual trial-case
As a result, the fact that form final identification and court verdict, therefore in these numerous and diverse Court business scenes
In, the judgement document for being required for prior or real-time excavation similar to input case text.But due to every
The content of individual case varies, and the growing number tried under Court business scene and rapid growth,
Therefore it has been difficult to meet the demand in Court business scene by traditional manual sorting means, therefore at this
The magnanimity in law court's business scenario is handled in the embodiment of application by using equipment as shown in Figure 5
Published judgement document, and the Text eigenvector of judgement document is excavated, to allow to rapidly
Find out the similar judgement document of input case text.
Fig. 6 shows to be used to excavate similar judge's text based on big data according to one kind of the application one side
The structural representation of first equipment of book.It is special that the equipment 1 includes judgement document's acquisition device 11, text
Levy excavating gear 12 and feature sets up device 13.
Wherein, judgement document's acquisition device 11 is used to obtain the published judgement document of magnanimity,
And obtain the case of each judgement document by;The text feature excavating gear 12 is used for based on every
The content of text of judgement document described in one obtains the text subject feature on judgement document's merit
Information and some passes on party's dispute content and party's demand content in the judgement document
Keyword relevant information, and based on the text subject characteristic information and some keyword relational informations
Set up the Text eigenvector of the judgement document;The feature dictionary, which sets up device 13, to be used to be based on
Some keyword relational informations update the feature dictionary on keyword.
Here, the equipment 1 includes but is not limited to user equipment or user equipment passes through with the network equipment
Network is integrated constituted equipment.The user equipment its include but is not limited to any one can be with user
The mobile electronic product of man-machine interaction is carried out by touch pad, it is described to move such as smart mobile phone, PDA
Dynamic electronic product can use any operating system, such as android operating systems, iOS operating systems.
Wherein, the network equipment can enter line number automatically including a kind of according to the instruction for being previously set or storing
Value calculates the electronic equipment with information processing, and its hardware includes but is not limited to microprocessor, special integrated electricity
Road (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..Institute
State network include but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, it is wireless from
Organize network (Ad Hoc networks) etc..Preferably, the equipment 1, which can also be, can use cloud meter
The cloud computing server that calculation means processing big data is calculated, is used as the first equipment using cloud computing server below
Preferred embodiment for the one side of the application is carried out in detail to the similar judgement document of excavation based on big data
It is thin to explain.Certainly, those skilled in the art will be understood that the said equipment 1 is only for example, and other are existing
Or the equipment 1 being likely to occur from now on is such as applicable to the application, also should be included in the application protection domain with
It is interior, and be incorporated herein by reference herein.
Constantly worked between above-mentioned each device, here, it will be understood by those skilled in the art that " lasting "
Refer to above-mentioned each device respectively in real time or according to the mode of operation requirement of setting or real-time adjustment.
Here, judgement document that the judgement document includes but is not limited in Court business scene etc., bag
Include Court of First Instance assert true document, Court of Second Instance assert true document, Court of Retrial assert true document,
The bill of complaint, billof defence, inquiry record and testimony of witnesses etc..
Taken below with the cloud computing that cloud computing means can be used to handle big data calculating in Court business scene
The first equipment that business device is excavated to judgement document is the preferred embodiment of the one side of the application to this
Application carries out explaining in detail for specific embodiment.Certainly, herein using the excavation sea in Court business scene
The cloud computing server of published judgement document is measured as the first equipment specific embodiment is carried out to the application
Explain in detail, purpose only by way of example, embodiments herein not limited to this, others it is soft
Following embodiments can be equally realized in part program.
It should be noted that the text subject characteristic information includes but is not limited in Court business scene
On judgement document's merit in judgement document, the keyword includes but is not limited in Court business scene
Party dispute content and party's demand content etc. in judgement document, below with working as in judgement document
Thing people dispute content and party's demand content are for the keyword of judgement document and on judgement document's merit
The judgement document is entered for the preferred embodiment that text subject characteristic information is the application one side
The excavation of row Text eigenvector.
In embodiments herein, judgement document's acquisition device 11 is published for obtaining magnanimity
Judgement document, and obtain the case of each judgement document by;Due to examining in Court business scene
Sentence business to carry out stage by stage, input case text can going deep into process of trial, its content very may be used
It is able to can change a lot.Therefore need that suits measures to local conditions trying flow each stage, be to excavating
The suitable data of system input, can make the similar cases that each stage excavates meet actual business requirement.Institute
With need to continue in the text feature excavating gear 12 stage by stage to the published judge of magnanimity
Document carries out the excavation of the similar judgement document based on big data, and therefrom extracts on the sanction
Sentence the text subject characteristic information of document merit and on party's dispute content in the judgement document
With some keyword relational informations of party's demand content, and based on the text subject characteristic information
The Text eigenvector of the judgement document is set up with some keyword relational informations.For example,
Cloud computing server is deposited published whole judgement document in Court business scene using Internet of Things network
Storage is got up, in case cloud computing server passes through offline feature in the text feature excavating gear 12
Work fully carries out text feature with the published judgement document of powerful calculating ability of cloud computing, and
Therefrom excavate the Text eigenvector of judgement document and set up in the feature dictionary in device 13 and dug
The feature dictionary of all judgement documents, and the network special line in by Court business scene are dug, once
Property is transferred in the on-line memory in law court's Intranet.
Further, one kind of the one side of described the application is used to excavate similar cut out based on big data
Sentencing the first equipment of document also includes:14 (not shown) are put in text structure makeup, for being cut out described
Sentence document and carry out structuring processing, obtain the text structure information after structuring;In judge's text
After book acquisition device 11 and before the text feature excavating gear 12, the text structure
Device carries out the judgement document in each stage in the trial business in the Court business scene of acquisition
Structuring is handled, and by the (not shown) of text structure information acquisition device 15, for based on institute
State judge's relevant information of judgement document described in text structure acquisition of information, judge's relevant information
Including party's information, case type, case by and court verdict.
It should be noted that being obtained out in the (not shown) of text structure information acquisition device 15
Judgement document judge's relevant information in case type include but is not limited to criminal suit, it is civil to tell
Please, administrative litigation, IP dispute, written verdict compensates case, performs the big judge of case etc. 7
Document type and each stage of law court's trial.Each stage such as Fig. 7 of wherein described law court's trial
It is shown.Certainly, other text subject features of judgement document that are existing or being likely to occur from now on can such as be fitted
For the application, it should also be included within the application protection domain, and be incorporated herein by reference herein.
Fig. 7 shows to be used for excavation of first equipment based on big data according to a kind of of the application one side
The law court of similar judgement document examines the schematic flow sheet for respectively sentencing the stage.Wherein, cloud computing server is based on
The equipment for excavating similar judgement document of cloud computing, tries flow according to people's court, sets stage by stage
The content of text of the corresponding judgement document excavated the need for counting each stage;People's court is considered simultaneously
The network characteristicses and security requirements of system, using in cloud computing server in Court business scene
Judgement document is excavated to meet the business need in Court business scene stage by stage in trial business
Ask.
Here, the cloud computing server of the application is needed in Court business scene to be processed as shown in Figure 7
Trial flow include:Put on record stage S71, try stage S72, judgement of first trial stage S73,
Second instance judgement stage S74, reconvict stage S75 and court verdict perform stage S76.Wherein, exist
The stage S71 that puts on record is received after the pleadings of suitor and the billof defence of defendant for people's court, and
Determine putting on record the stage after putting on record;The stage S72 that tries tries the stage for people's court;
Judgement of first trial stage S73 is people's court's judgement of first trial stage;Second instance judgement stage S74
Wound up the case the stage for people's court's second trial;The stage S75 that reconvicts reviews for people's court to wind up the case the stage;
The court verdict performs the last judgement knot made for people's court with regard to this trial case in stage S76
Fruit performs the stage.In first five described stage, judicial functionary, which has, excavates similar judgement document
Demand.
Wherein, the judgement document for needing excavation similar in each process of adjudication in the figure 7 counts accordingly
According to as follows respectively.In the stage S71 that puts on record it is corresponding it is related to judgement document have the bill of complaint and
Billof defence;It is described try in stage S72 it is corresponding it is related to judgement document have the bill of complaint,
Billof defence, inquiry record and testimony of witnesses;It is corresponding with judging in judgement of first trial stage S73
Document it is related to have Court of First Instance to assert true;It is corresponding with cutting out in second instance judgement stage S74
That sentences document correlation has above-mentioned shape and Court of Second Instance to assert the fact;It is right in the stage S75 that reconvicts
Answer related to judgement document to have Court of Retrial to assert true.Wherein, the bill of complaint is used to indicate
Plaintiff presents the charging document of Court of First Instance;The billof defence is used to indicate that Court of First Instance has received
After pleadings, it is desirable to the reply content that defendant provides;The record of addressing inquires to is used to indicate that people's court opens a court session
During trial, plaintiff agent is to the inquiry content of defendant and defendant's reply content, and defendant's is to original
The inquiry content of announcement and the reply content of plaintiff;The testimony of witnesses is used to indicate that people's court tries rank
Section, the testimony of witnesses of party, and inquiry of the former defendant's to adverse witness are recorded;It is described
Court of First Instance assert it is true be used to indicate Court of First Instance after investigation and trial, the fact that identification content;
The petition for appeal is used for after indicating to adjudiacate in the first instance, the second trial bill of complaint for the one party not agreed with a decision;
Assert true for indicating the fact that second trial or Court of Retrial are assert in the second trial/Court of Retrial.
Excavate similar judgement document's the cloud computing server in Court business scene is determined
, need to be to obtaining some judge's texts in judgement document's acquisition device 11 after the data text in each stage
Book carries out the extraction of the Text eigenvector of correlation in the text feature excavating gear 12, specifically
Ground, all devices that the text feature excavating gear 12 includes are as shown in Figure 8.
Wherein, Fig. 8 shows to be taken for cloud computing according to a preferred embodiment of the application one side
Business device excavates the structural representation of the text feature excavating gear 12 of the Text eigenvector of judgement document.
The text feature excavating gear 12 includes first and excavates unit 121, second excavating unit 122, the
Three excavate unit 123 and generation unit 124.Wherein described first excavation unit 121 is used to extract institute
State the text subject characteristic information of judgement document and the word theme of each word in the judgement document
Feature;The second excavation unit 122 is used to obtain the context relation between each described word, base
In the word theme feature of each word of the context relation amendment, and based on institute it is revised each
The matching degree of the word theme feature of the word and the text subject characteristic information, is determined some
The keyword relational information of the judgement document, wherein, the keyword relational information include keyword,
Keyword importance information and the corresponding word theme feature of keyword;Described 3rd excavates unit 123
For based on the keyword relational information, updating the text subject characteristic information of the judgement document;
The generation unit 124 is used to obtain expansion word relevant information, institute based on the keyword relational information
Expansion word and the expansion word degree of correlation of the expansion word relevant information including the keyword are stated, and based on described
Keyword relational information and the expansion word relevant information set up bag of words characteristic information, and are based on being updated
Text subject characteristic information and the bag of words characteristic information, determine the text feature of the judgement document
Vector.
Specifically, the described first text subject characteristic information for excavating the judgement document in unit 121 has
Body is used to indicate the merit in the judgement document, in the embodiment of the present application preferably by theme mould
Type method come extract acquisition judgement document text subject characteristic information and each word word master
Feature is inscribed, wherein the topic model method is consistent with agent model method of the prior art.Certainly,
Other it is existing or be likely to occur from now on extract judgement document in text subject characteristic informations and each
The method of the word theme feature of word is such as applicable to the application, should also be included in the application protection domain
Within, and be incorporated herein by reference herein.
Further, the second excavation unit 122 is used to obtain the context between each described word
Word cooccurrence relation;Obtain the context transfer probability between word described in any two;On described
Hereafter word cooccurrence relation and the context transfer probability, correct the word theme feature of each word;
Based on each revised word word theme feature and the text subject characteristic information
Matching degree, determine the keyword and its corresponding word theme feature of some judgement documents,
And obtain the importance information of the keyword.
In the embodiment of the present application, described second unit 122 is excavated based in the described first excavation unit
The text subject characteristic information of the judgement document extracted in 121 and the word theme feature of each word,
According to the word theme feature of context relation amendment each word between each described word, and it is based on
The matching of the word theme feature of each revised word and the text subject characteristic information
Degree, so that it is determined that the keyword of some judgement documents and its corresponding word theme feature, and
Obtain the importance information of the keyword.The keyword of judgement document is specifically determined herein and its right
The word theme feature answered, and obtain the keyword importance information specific embodiment with it is above-mentioned
The step 122 in specific embodiment it is corresponding, here is omitted.
In embodiments herein, the described 3rd excavates unit 123 is excavating unit based on second
The keyword relational information determined in 122, updates the text subject characteristic information of the judgement document.
For example, updating the text subject characteristic information of judgement document by below equation:
Wherein D represents the text subject characteristic information after updating, and text includes n keyword,
wiIt is importance of i-th of keyword in judgement document, IiIt is keyword wiWord theme feature,
By the word theme feature weighted sum to the keyword in above judgement document, obtain judgement document's
Text subject characteristic information, can effectively remove in judgement document unessential word and to building text
The influence of this theme feature information.
Further, the described 3rd unit 124 is excavated based on keyword relational information acquisition extension
Word relevant information, the expansion word that the expansion word relevant information includes the keyword is related to expansion word
Degree.Wherein described keyword includes the synonym and the keyword of the keyword in judge's text
The word of height correlation in book.In embodiments herein, by the master for calculating any two word
Characteristic similarity is inscribed, to excavate synonym.For example, for keyword A, if taking similarity highest
Dry word, is used as keyword A synonym.Wherein, by excavating the word algorithm of height correlation
(word2vector) come the word of the height correlation that calculates keyword, the algorithm is to each word meter
Term vector is calculated, the term vector similarity of any two word is then calculated, to excavate the word of height correlation
Language.For example, for keyword A, taking several words of term vector similarity highest, being used as key
The word of word A height correlation.
Further, the generation unit 124 is based on the keyword and its corresponding word theme is special
Levy, determine the expansion word and the expansion word degree of correlation of the keyword, wherein, the expansion word includes institute
State the synonym of keyword and in the judgement document height correlation correlation word;Closed based on described
Keyword and its corresponding word theme feature and the expansion word and the expansion word degree of correlation, utilize bag of words mould
Type, sets up bag of words characteristic information.
In the embodiment of the present application, the bag of words characteristic information is used to indicate the keyword in judgement document
And its corresponding word feature of expansion word.In bag of words characteristic information, the characteristic value of keyword feature is
Importance information of the keyword in judgement document, the characteristic value of synonym feature is keyword importance
The product of information and synonymous degree, the characteristic value of correlation word feature is keyword importance information and phase
The product of pass degree.For example, it is assumed that one 100,000 different words are had in all judgement documents, that
The bag of words feature of each judgement document is the vector of 100,000 dimensions, and the position is marked per dimensional vector
Whether word occurs in judgement document.For example, it is assumed that during word word1 is bag of words characteristic information
1st dimension, word word2 is the 2nd dimension in bag of words characteristic information, and word word3 is bag of words feature
The 10th dimension in information, word word4 is the 30th dimension in bag of words characteristic information, word3 and word1
Similar word each other, similarity is weight13, word4 and word2 similar word each other, similar
Spend for weight24;Wherein judgement document A includes word word1, word3 and weight4, and
And their importance information in A are respectively weight1, weight3, weight4, then Cai Panwen
The characteristic value that book A bag of words feature the 1st is tieed up is weight1+weight13*weight3, the 2nd dimension
Characteristic value is weight24*weight4, and the characteristic value of the 10th dimension is weight3+weight1*weight13,
The characteristic value of 30th dimension is weight4.Wherein, keyword can also be obtained by above computational methods
Height correlation word word feature characteristic value, therefore the feature in the bag of words characteristic information of gained
Value includes the corresponding characteristic value of word theme feature of keyword and the word theme feature of expansion word
Corresponding characteristic value.
Further, the generation unit 124 is by the updated text subject characteristic information and institute
Predicate bag characteristic information is merged, and determines the urtext feature of the judgement document;By to institute
The urtext feature for stating judgement document carries out feature normalizing, determines the text feature of the judgement document
Vector.Specifically, the sanction that the generation unit 124 will be obtained in the described 3rd excavation unit 123
The text subject characteristic information and bag of words characteristic information for sentencing document are spliced into a characteristic vector, and generation is cut out
Sentence the urtext feature of document, wherein, specifically generate the specific reality of the urtext of judgement document
Apply example corresponding with the embodiment in step S124 described above, here is omitted.
Further, the feature dictionary sets up device 13 using the keyword as index, to each
The word theme feature and expansion word of the keyword set up the feature dictionary on keyword.Example
Such as, in Court business scene, party's demand content in judgement document and party are disputed on interior
Hold identical word as the keyword of extraction judgement document, and told based on keyword lookup with party
Ask word and the related all words of party's dispute point word as the expansion word of keyword to cutting out
Sentence document and carry out feature extraction, obtain the keyword relational information and expansion word relevant information of judgement document
The feature dictionary of foundation.
Further, one kind of the one side of described the application is used to excavate similar cut out based on big data
Sentencing the first equipment of document also includes the (not shown) of dispensing device 16, for all judges are literary
The Text eigenvector of book, the feature dictionary and judge's relevant information are sent to the second equipment
In searching database.For example, in Court business scene, will be in the text feature excavating gear 12
The Text eigenvector of the judgement document of middle acquisition, sets up in device 13 in the feature dictionary and obtains
Judgement document feature dictionary and obtained in the (not shown) of dispensing device 14 judge text
The text structure information and text type of book are sent to the second equipment, so that the second equipment is relying on the
Feature dictionary and simplified calculating logic that one equipment is calculated, it is ensured that the first equipment and the second equipment pin
To identical Text eigenvector and feature dictionary can be exported with a judgement document;People is considered simultaneously
The network characteristicses and security requirements of people's court system, using in cloud computing server to Court business
Stage by stage judgement document is excavated to meet in Court business scene in trial business in scape
Business demand.
Because in Court business scene, the Text eigenvector of input case text in a review is all stored
In law court's intranet server, in addition to the judgement document after disclosed, in other Court business systems
Input case text in a review can not flow out law court's intranet server, in order to meet Court business
The confidentiality requirement of the relevant information to inputting case text in scape, present applicant proposes such as Fig. 9 institutes
The equipment shown, to meet the requirement of the confidentiality to the input case text in Court business scene, from
And improve the real-time of processing input case text.
Fig. 9 shows to be used to excavate similar judge's text based on big data according to one kind of the application one side
The structural representation of second equipment of book.It is special that the equipment 2 includes input unit 21, input case text
Levy excavating gear 22, candidate judgement document acquisition device 23 and similar judgement document's acquisition device 24.
Wherein, the input unit 21 is used to obtain input case text, based in searching database
On the feature dictionary of keyword, some candidate keywords of the input case text are extracted;It is described
Input case text feature excavating gear 22 be used for content of text based on the input case text and
Some candidate keywords obtain the text subject characteristic information for inputting case text and some
Keyword relational information, and believe based on the text subject characteristic information is related to some keywords
Breath sets up the Text eigenvector of the input case text;The candidate judgement document acquisition device 23
For from the searching database obtain with it is described input case text have phase accomplice by it is some
The judgement document of candidate;Similar judgement document's acquisition device 24 is used for the sanction for calculating the candidate
Sentence the similarity of the Text eigenvector of document and the Text eigenvector of the input case text, base
Similar judgement document is chosen in the similarity.
Here, the equipment 2 includes but is not limited to user equipment or user equipment passes through with the network equipment
Network is integrated constituted equipment.The user equipment its include but is not limited to any one can be with user
The mobile electronic product of man-machine interaction is carried out by touch pad, it is described to move such as smart mobile phone, PDA
Dynamic electronic product can use any operating system, such as android operating systems, iOS operating systems.
Wherein, the network equipment can enter line number automatically including a kind of according to the instruction for being previously set or storing
Value calculates the electronic equipment with information processing, and its hardware includes but is not limited to microprocessor, special integrated electricity
Road (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..Institute
State network include but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, it is wireless from
Organize network (Ad Hoc networks) etc..Preferably, the equipment 2, which can also be, can use cloud computing
Law court's intranet server of the simple calculating logic of onlineization of offline feature chemical industry tool in server is made
For the second equipment of the application one side, below using law court's intranet server as the second equipment as this Shen
The similar judgement document of excavation based on big data is explained in detail the preferred embodiment of one side please.
Certainly, those skilled in the art will be understood that the said equipment 2 is only for example, and other are existing or from now on may be used
The equipment 2 that can occur such as is applicable to the application, should also be included within the application protection domain, and
This is incorporated herein by reference.
Constantly worked between above-mentioned each device, here, it will be understood by those skilled in the art that " lasting "
Refer to above-mentioned each device respectively in real time or according to the mode of operation requirement of setting or real-time adjustment.
It should be noted that in the preferred embodiment of the application, at the cloud computing service end of equipment 1
Reason be excavate in the published judgement document of magnanimity in Court business scene with the equipment 2
The corresponding similar judgement document of input case text of input, i.e., what is carried out in cloud computing server is
The excavation that judgement document published to magnanimity is carried out, and the law court's intranet server of the equipment 2 is to pass through
Online characterization instrument only need to simply be calculated an input case text of input, by cloud computing
The feature dictionary of offline feature chemical industry tool output as characterizing online in law court's intranet server in server
The input of instrument, so that the online calculating logic in law court's intranet server is simplified, to ensure with portion
Judgement document be input to after two instruments export identical Text eigenvector, feature dictionary and
Structured message.Certainly, judge's text that the cloud computing server will have output by offline feature chemical industry
The correlated characteristic of book is disposably transferred in the on-line memory in law court's intranet server by network special line,
So as to not only meet the excavation to the published similar judgement document of magnanimity, in turn ensure that in law court
The confidentiality of confidential input case text in network server, and the phase that case text will be inputted
As judgement document excavate, and similar judgement document is obtained, so as to be effectively improved law court
The operating efficiency of the similar judgement document of excavation in business scenario.
It should be noted that it is described input case text include but is not limited to existing judgement document and
Example text of trying a case this etc..Certainly, other input case texts that are existing or being likely to occur from now on are for example applicable
In the application, it should also be included within the application protection domain, and be incorporated herein by reference herein.
In embodiments herein, second equipment also includes the (not shown) of reception device 25,
The (not shown) of reception device 25 receives the public affairs acquired in first equipment from the first equipment
The Text eigenvector, the feature dictionary and judge's relevant information of judgement document are opened, and is preserved
Into the searching database, judge's relevant information includes party's information, case type, case
By and court verdict.For example, searching database on-line storage in Court business scene Intranet is cut out
Sentence the Text eigenvector, the feature dictionary and affiliated judge's relevant information of document.It is especially stored in
The information of judgement document in reception device 25 includes following eight aspects:(1), every kind of judge's text
Case type and case in book is by corresponding judgement document.Wherein, Key be case type and case by,
Value is numbering of the judgement document in internal system.(2), the structured message of existing judgement document.
Wherein, Key is numbering of the judgement document in internal system, and value is generated by structuring extraction module
Text structure information.(3), the Text eigenvector of existing judgement document.Wherein, Key is
Judgement document is in the numbering of internal system, and value is the Text eigenvector of text feature module generation.
(4), whole keywords of existing judgement document.Wherein, Key is a constant, and value is to close
Whole keywords of keyword topic module generation.(5), the word theme feature of each keyword.Its
In, Key is keyword, and value is the keyword word theme feature of keyword subject module generation.
(6), the synonym of each keyword.Wherein, Key is keyword, and value is the synonymous of keyword
Word and its synonymous degree.(7), the related term of each keyword.Key is keyword, and value is to close
The related term and its degree of correlation of keyword.(8), characteristic value mean variance of the judgement document per dimensional feature.
Key is feature number, and value is the average and variance of characteristic value.
It should be noted that the input case that the text type includes but is not limited in Court business scene
The case type of example text sheet, wherein the case type includes criminal suit, civil appeal, administration is told
Dispute, IP dispute, written verdict, compensate case, perform case and in example of trying a case in careful rank
Section.Certainly, other text types that are existing or being likely to occur from now on are such as applicable to the application, also should
Within the application protection domain, and it is incorporated herein by reference herein.
Further, the input unit 21 obtains input case text, based in searching database
On the feature dictionary of keyword, some candidate keywords of the input case text are extracted, specifically
Ground, the input unit 21 includes obtaining input case text, based on the input case text
Case is by if the feature dictionary on keyword from searching database extracts the input case text
Dry candidate keywords.For example, being searched in the published judgement document of magnanimity in Court business scene
The judgement document similar to the input case text, due to judgement document's case in Court business scene
Feelings case is different by type, therefore for the ease of rapidly finding the judge text similar to input case text
Book, then the case based on input case text is by the feature dictionary from searching database on keyword
Middle extraction is used as some times of input case text with inputting the word that the word of case text mutually occurs simultaneously
Select keyword, it can be ensured that the keyword that input case text mining comes out is present in searching database.
Further, the input case text feature excavating gear 22 inquires about law court's intranet server
In the (not shown) of reception device 25 receive correlated characteristic dictionary, to input case text carry out exist
Line characterizes the Text eigenvector for calculating and obtaining inputting case text, specific as shown in Figure 10.Figure
10 show to be used to excavate similar based on big data according to a preferred embodiment of the application one side
The structure stream of input case text feature excavating gear 22 in law court's intranet server of judgement document
Cheng Tu.The input case text feature excavating gear 22 includes the 4th and excavates the digging of unit the 221, the 5th
Dig unit 222 and the 6th and excavate unit 223.
Wherein, the described 4th unit 221 is excavated for each word based on the input case text
It is compared with whole keywords of all judgement documents, to be carried from the input case text
Candidate keywords and its word theme feature are taken, and the input is obtained based on the word theme feature
The text subject characteristic information of case text;The 5th excavation unit 222 is used to obtain described in each
Context relation between candidate keywords, based on the context relation amendment each candidate keywords
Word theme feature, and based on each revised candidate keywords word theme feature and
The matching degree of the text subject characteristic information, determines that the keyword of the input case text is related
Information;Described 6th, which excavates unit 223, is used to be based on the keyword relational information, updates described defeated
Enter the text subject characteristic information of case text and obtain expansion word relevant information, and based on the key
Word relevant information and the expansion word relevant information set up described input this paper bag of words characteristic information, and
Based on the text subject characteristic information and the bag of words characteristic information updated, the input case is determined
The Text eigenvector of text.
In the embodiment of the present application, it is defeated in real time that law court's Intranet in Court business scene mainly completes user
Enter the Text eigenvector of case text.Dug in the 4th excavation unit 221 of law court's intranet server
The input case text key word inputted online is dug provided with a hypothesis:What is inputted online is described defeated
Enter the keyword of case text, it is necessary to be also the keyword of existing judgement document.Therefore, the module
In the published judgement document of magnanimity inquiry and input case text have phase accomplice by judge it is literary
Book all with party's demand content and party dispute content identical keyword, and with input case
Example text word takes common factor, as the candidate keywords of the input case text inputted online, effectively
Ensure that the keyword that goes out of input case text selecting is all the keyword in existing judgement document, from
And can be excavated in existing judgement document and the similar judgement document of input case text and its right
The Text eigenvector and feature dictionary answered, from all keywords in published judgement document really
Surely the candidate keywords of input case text cause on the basis of the processing published judgement document of magnanimity
The calculating logic of upper simplified input case text.Specifically, the 4th excavating gear 221 excavates defeated
Dug in the step S221 in the method and above-described embodiment of the text subject feature for entering case text
The method of pick text subject feature thinks correspondence, and here is omitted.
Specifically, the 5th excavation unit 222 in law court's intranet server determines the input
In the step S222 in the specific method and the above embodiments of the present application of the keyword of case text
The method of description is corresponding, passes through the side corresponding with the step S222 in above-described embodiment
Method determine keyword more can effectively and accurately express it is described input case text keyword and its
The word feature of keyword so that the text subject characteristic information obtained based on keyword more can with it is defeated
The case type for entering case text is close, more can the degree of accuracy expression input case text content of text,
So that by inputting similar judge's text that the text subject characteristic information of case text is found
The similarity of book is higher, improves the accuracy for searching similar judgement document.
It is related based on the keyword in the 6th excavating gear 223 in embodiments herein
Information, updates the text subject characteristic information of the input case text and obtains expansion word relevant information.
The text subject characteristic information and the application step described above of input case text are specifically updated herein
The method of the text subject characteristic information of renewal input case text in rapid S223 embodiment is consistent,
Here is omitted.Certainly, the synonym of the keyword of input case text is obtained herein and in input
The word of height correlation in case text and the specific method of bag of words feature and step described above
The synonym that keyword is obtained in S223 is consistent with the word of height correlation and the method for bag of words feature,
Also repeat no more herein.
In embodiments herein, the 6th excavating gear 223 is by the updated text subject
Characteristic information and the bag of words characteristic information are merged, and determine the original text of the input case text
Eigen;Feature normalizing is carried out by the urtext feature to the input case text, institute is determined
State the Text eigenvector of input case text.For example, defeated by what is obtained in the step S123
The text subject characteristic information and bag of words characteristic information for entering case text are spliced into a characteristic vector, raw
Into the urtext feature of input case text.For example, the text subject feature letter of input case text
Breath is the characteristic vector of one 10 dimension, and bag of words characteristic information is the characteristic vector of one 100 dimension, then
The urtext for having input case text is characterized as the characteristic vector of one 110 dimension.Recycle engineering
The conventional feature normalization method in habit field, feature normalizing, generation input are carried out to urtext feature
The Text eigenvector of case text.For example, it is assumed that the same feature of input case text meets just
State be distributed, therefore can by every dimensional feature normalizing into standard normal distribution.
In embodiments herein, similar judgement document's acquisition device 24 is based in the candidate
Obtain and have with the input case text from the searching database in judgement document's acquisition device 23
Have phase accomplice by some candidates judgement document, calculate the text feature of the judgement document of the candidate
The similarity of the Text eigenvector of case text is inputted described in vector sum, is chosen based on the similarity
Similar judgement document.
It should be noted that calculating Text eigenvector in similar judgement document's acquisition device 24
The algorithm of similarity include but is not limited to Euclidean distance algorithm and cosine similarity algorithm etc..Certainly,
Other it is existing or be likely to occur from now on calculating Text eigenvector similarity algorithm it is for example applicable
In the application, it should also be included within the application protection domain, and be incorporated herein by reference herein.
For example, the case type and case case of the input case text inputted first according to user are by looking into
Ask same case type and case case by the existing judgement document of whole as the similar judge's text of candidate
Book, then retrieves the Text eigenvector of the similar judgement document of candidate.Then above-mentioned calculating text is used
The algorithm (Euclidean distance algorithm or cosine similarity algorithm) of characteristic vector similarity, calculates input
The similarity for inputting case text judgement document similar with each candidate.Then, it is defeated according to user
The number N of the similar judgement document of the demand entered, takes the N number of judgement document's conduct of similarity highest
Final required similar judgement document.Then the text structureization letter of similar judgement document is inquired about
Breath and judge's relevant information, and feed back to the user that demand obtains similar judgement document.Finally count
The court verdict of similar judgement document, by principal penalty, accessary penalty, indemnity, party's victory or defeat etc.
The dimension of text feature, in visual form, shows demand to obtain the use of similar judgement document
Family.Specifically, for example, the case type and case case of the input case text inputted according to user by,
Inquire about same case type and case case by the existing judgement document of whole as candidate judge text
Book has 100, of the judgement document for the candidate similar to input case text that user's request is returned
Number is 10, then the Text eigenvector for inputting case text is distinguished by above-mentioned similarity algorithm
Similarity Measure is carried out with the Text eigenvector of the judgement document of 100 candidates, and calculating is obtained
Similarity by from low to high order arrangement, take the judgement document of 10 candidates of similarity highest
As similar judgement document, and by the text structure information of 10 similar judgement documents
The user for the judgement document that to need acquisition similar is fed back to judge's relevant information.
Further, one kind of the one side of described the application is used to excavate similar cut out based on big data
Sentencing the second equipment of document also includes:Text structure information receiver, for receiving described first
The text knot carried out the judgement document after the resulting structuring of structuring processing transmitted by equipment
Structure information;Text structure information acquisition device, the text for obtaining the similar judgement document
This structured message.For example, after by the Similarity Measure to the judgement document of candidate, will obtain
The text structure information of all similar judgement documents for meeting quantity required.
Figure 11 shows that being based on big data according to one kind of the application one side excavates similar judgement document
System schematic.The equipment includes cloud computing server 31 and law court's intranet server 32.Wherein,
The cloud computing server 31 includes published judgement document's acquisition device 311, offline feature chemical industry
Have device 312 and the Text eigenvector generating means 313 of published judgement document, the law court
The input case text-obtaining mechanism that intranet server 32 includes on-line memory 321, inputted online
322nd, the online Text eigenvector generating means for characterizing tool device 323, inputting case text
324th, the similar sanction of online similar judgement document's calculating instrument device 325 and input case text
Sentence document 326.
Wherein, the cloud computing server 31 and one kind of the application one side shown in Fig. 6 are used for
The function that the first equipment of similar judgement document is excavated based on big data is consistent, law court's Intranet service
One kind of device 32 and the application one side shown in Fig. 9 is used to judge based on big data excavations is similar
The function of second equipment of document is consistent.It is succinct for description below, in the cloud computing server 31
Published judgement document's acquisition device 311 exchanged with judgement document's acquisition device 11 in Fig. 6
Use, the text feature of the offline feature tool device 312 and the published judgement document
The used interchangeably of text feature excavating gear 12 in vector generator 313 and Fig. 6, it is described
The on-line memory 321 in law court's intranet server 32 is obtained with the candidate judgement document in Fig. 9
The used interchangeably of device 23 is taken, in the input case text-obtaining mechanism 322 inputted online and Fig. 9
The used interchangeably of input unit 21, the online characterization tool device 323 and the input case
The Text eigenvector generating means 324 of example text sheet are excavated with the input case text feature in Fig. 9 and filled
Put 22 used interchangeablies, online similar judgement document's calculating instrument device 325 and input case
Similar judgement document 326 judgement document's acquisition device 24 similar in Fig. 9 of text, which is exchanged, to be made
With its substantive content is identical.
In embodiments herein, in the trial business of Court business scene, the cloud computing service
Published judgement document's acquisition device 311 in device 31 is whole using Internet storage
Disclosed judgement document;The offline feature tool device 312 is fully with the powerful meter of cloud computing
Calculation ability, is characterized, and excavate the Feature Words on keyword to published institute judge's text
Storehouse;The Text eigenvector generating means 313 of the published judgement document excavate published sanction
Sentence the Text eigenvector of document and the feature dictionary on keyword, and pass through network special line, one
Secondary property is transferred to the on-line memory 321 in law court's intranet server 31.Law court's intranet server
On-line memory 321 in 32 store published judgement document Text eigenvector and on
The feature dictionary of keyword;The input case text-obtaining mechanism 322 inputted online obtains input
The related text content of case text;The online characterization tool device 323 inquiry on-line memory
In published judgement document on the feature dictionary of keyword with obtain correlation feature dictionary,
And characterization calculating is carried out to input case text, so that in the text feature of the input case text
The Text eigenvector of input case text is formed in vector generator 324;It is described online similar
Judgement document's calculating instrument device 325 inputs the input case text online and its corresponding text is special
Vector is levied, on-line memory is inquired about, online retrieving has phase accomplice public by with input case text
The Text eigenvector of the judgement document of some candidates opened, calculates the judge of the published candidate
The similarity of the Text eigenvector of the Text eigenvector of document and the input case text, row
The judgement document most like with the input case text is obtained after sequence.
Here, between above-mentioned offline feature tool device 312 and online characterization tool device 323
Calculating logic it is identical, the difference of the two is that online characterize is only needed to by letter in tool device 312
Single calculating, can be achieved with and the identical calculating logic of offline feature tool device 323.It is offline special
The feature dictionary on keyword that levying tool device 312 is exported, instrument dress is characterized as online
Put 323 input, and it is online characterize tool device 323 rely on off-line calculation on keyword
Feature dictionary and the online calculating logic simplified, to ensure that same number evidence is input to two tool devices
Afterwards, output result is identical.I.e. same judgement document is respectively through offline feature tool device
312 Text eigenvectors the same with after online characterization tool device 323, respectively obtaining and pass
In the feature dictionary of keyword so that be more effectively carried out between input case text and judgement document
The Similarity Measure of Text eigenvector, the excavation being effectively improved in Court business scene is similar
The operating efficiency and accuracy of judgement document;By the online calculating for characterizing tool device 323
Afterwards, the generation input case text in the Text eigenvector generating means 324 of input case text
Text eigenvector in case in online similar judgement document's calculating instrument device 325, counting respectively
Calculator has the Text eigenvector and input case of the judgement document of some candidates of identical text type
Similarity between the Text eigenvector of example text sheet, and in similar judge's text of input case text
In book 326, based on the quantity of the similar judgement document needed in Court business scene, by similarity
The judgement document of the candidate of highest respective numbers is used as similar judgement document.
In above-described embodiment of the application, setting for similar judgement document is excavated based on big data by the application
In above-described embodiment that the application is can be seen that in standby, by big data text analysis technique, Neng Gouyou
The text subject characteristic information of judgement document's merit of the similar judgement document of excavation of effect and on described
Some keyword relational informations three of party's dispute content and party's demand content will in judgement document
Element, and complete factor content and compare two-by-two, excavate similar judgement document to realize.The embodiment of the present application is led to
Cross judgement documents all to the whole nation first and set up Text eigenvector, including text subject characteristic information,
Text key word feature, expanded keyword feature.Then machine learning real-time computing technique is utilized, to reality
When the input case text (or only merit and the bill of complaint of party's demand) that inputs calculate text
Characteristic vector, recycles machine learning model, calculates most like with the input case text that inputs in real time
Existing court verdict judgement document.In above process, judicial functionary can be according to actual feelings
Condition input needs to find similar judgement document, and the application device does not limit the structure of input case text,
Fully meet the application scenarios of Court business.
Compared with prior art, one kind according to embodiments herein is used for the first equipment end group
The method and apparatus of similar judgement document is excavated in big data, by obtaining the published judge's text of magnanimity
Book, and obtain the case of each judgement document by;Content of text based on each judgement document
Obtain on the text subject characteristic information of judgement document's merit and in the judgement document
Some keyword relational informations of party's dispute content and party's demand content, and based on the text
The text that this theme feature information and some keyword relational informations set up the judgement document is special
Levy vector;The published each judgement document of magnanimity is effectively passed through into judge's text with judgement document
The text subject characteristic information of writing desk feelings and in the judgement document party dispute content and work as
These three key elements of some keyword relational informations of thing people's demand content excavate the text of judgement document
Characteristic vector, and accurately being shown in the form of Text eigenvector, it is to avoid artificial time-consuming consumption
The magnanimity judgement document for removing to power many analysis words, content complexity and different style, so that effectively
Improve the operating efficiency for excavating similar judgement document;And based on some keyword relational informations
The feature dictionary on keyword is updated, effectively by the content of text of judgement document with all passes
The form for the feature dictionary that keyword and its word theme feature and expansion word are set up carries out height identification, makes
Can the similar judgement document of quick obtaining and its corresponding Text eigenvector, reached raising dig
The effect of the operating efficiency of the similar judgement document of pick.
Further, one kind according to embodiments herein is used for the second equipment end based on big number
According to the method and apparatus for excavating similar judgement document, by obtaining input case text first, based on inspection
Feature dictionary in rope database on keyword, some candidates for extracting the input case text are closed
Keyword so that input case text obtains keyword and can found in searching database, so that effectively
Improve the lookup for carrying out similar judgement document to input case text by keyword in ground;It is then based on
The content of text and some candidate keywords of the input case text obtain the input case
The text subject characteristic information and some keyword relational informations of text, and it is special based on the text subject
Reference cease and some keyword relational informations set up it is described input case text text feature to
Amount, effectively can be expressed the relevant information for inputting case text by the form of Text eigenvector
Out;Finally from the searching database obtain with it is described input case text have phase accomplice by
The judgement document of some candidates;Calculate the Text eigenvector of the judgement document of the candidate and described defeated
Enter the similarity of the Text eigenvector of case text, similar judge's text is chosen based on the similarity
Book, effectively by the Text eigenvector of the judgement document of the candidate sent from the first equipment and in real time
The Text eigenvector for the input case text excavated carries out Similarity Measure, obtains similar judge
Document, enabling rapidly accurately filtered out and input case from the published judgement document of magnanimity
Example text sheet similar judgement document, it is to avoid artificial go that analysis word is more, content complexity with taking time and effort
And the judgement document of the magnanimity of different style, so as to be effectively improved the work for excavating Similar Text
Efficiency.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, example
Such as, it can be set using application specific integrated circuit (ASIC), general purpose computer or any other similar hardware
It is standby to realize.In one embodiment, the software program of the application can be realized by computing device
Steps described above or function.Similarly, the software program (including related data structure) of the application
Can be stored in computer readable recording medium storing program for performing, for example, RAM memory, magnetically or optically driver or
Floppy disc and similar devices.In addition, some steps or function of the application can employ hardware to realize, example
Such as, as coordinating with processor so as to performing the circuit of each step or function.
In addition, the part of the application can be applied to computer program product, such as computer program
Instruction, when it is computer-executed, by the operation of the computer, can call or provide basis
The present processes and/or technical scheme.And the programmed instruction of the present processes is called, it may be deposited
Store up in fixed or moveable recording medium, and/or by broadcast or other signal bearing medias
Data flow and be transmitted, and/or be stored according to the computer equipment of described program instruction operation
In working storage.Here, including a device, the device bag according to one embodiment of the application
The memory for storing computer program instructions and the processor for execute program instructions are included, wherein,
When the computer program instructions are by the computing device, the plant running is triggered based on foregoing according to this
The methods and/or techniques scheme of multiple embodiments of application.
It is obvious to a person skilled in the art that the application is not limited to the thin of above-mentioned one exemplary embodiment
Section, and in the case of without departing substantially from spirit herein or essential characteristic, can be with other specific
Form realizes the application.Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary
, and be nonrestrictive, scope of the present application is limited by appended claims rather than described above
It is fixed, it is intended that all changes fallen in the implication and scope of the equivalency of claim are included
In the application.The right that any reference in claim should not be considered as involved by limitation will
Ask.Furthermore, it is to be understood that the word of " comprising " one is not excluded for other units or step, odd number is not excluded for plural number.Dress
Software can also be passed through by a unit or device by putting the multiple units stated in claim or device
Or hardware is realized.The first, the second grade word is used for representing title, and is not offered as any specific
Order.
Claims (27)
1. a kind of be used for the method that the first equipment end excavates similar judgement document based on big data, wherein,
Methods described includes:
Obtain the published judgement document of magnanimity, and obtain the case of each judgement document by;
Content of text based on each judgement document obtains the text on judgement document's merit
This theme feature information and in the judgement document party dispute content and party's demand in
Some keyword relational informations held, and based on the text subject characteristic information and some keys
Word relevant information sets up the Text eigenvector of the judgement document;
The feature dictionary on keyword is updated based on some keyword relational informations.
2. according to the method described in claim 1, wherein, methods described also includes:
The judgement document is subjected to structuring processing, the text structure information after structuring is obtained;
Based on judge's relevant information of judgement document described in the text structure acquisition of information, the sanction
Sentence relevant information including party's information, case type, case by and court verdict.
3. method according to claim 1 or 2, wherein, methods described also includes:
The Text eigenvector of all judgement documents, the feature dictionary and the judge is related
Information is sent into the searching database of the second equipment.
4. the method according to any one of claims 1 to 3, wherein, it is described based on each described
The content of text of judgement document obtain on judgement document's merit text subject characteristic information and
On in the judgement document when thing dispute content it is related to some keywords of party's demand content
Information, and set up described based on the text subject characteristic information and some keyword relational informations
The Text eigenvector of judgement document includes:
Extract the judgement document text subject characteristic information and the judgement document in each word
Word theme feature;
The context relation between each described word is obtained, based on described each word of context relation amendment
The word theme feature of language, and based on each revised word word theme feature and institute
The matching degree of text subject characteristic information is stated, the keyword correlation letter of some judgement documents is determined
Breath, wherein, the keyword relational information includes keyword, keyword importance information and keyword
Corresponding word theme feature;
Based on the keyword relational information, the text subject characteristic information of the judgement document is updated;
Expansion word relevant information, the expansion word relevant information are obtained based on the keyword relational information
Expansion word and the expansion word degree of correlation including the keyword, and based on the keyword relational information and
The expansion word relevant information sets up bag of words characteristic information, and based on the text subject feature letter updated
Breath and the bag of words characteristic information, determine the Text eigenvector of the judgement document.
5. method according to claim 4, wherein, it is upper between described each described word of acquisition
Hereafter relation, based on the word theme feature of each word of the context relation amendment, and is based on institute
The matching of the word theme feature of each revised word and the text subject characteristic information
Degree, determines the keyword relational information of some judgement documents, wherein, the keyword is related
Information, which includes keyword, keyword importance information and the corresponding word theme feature of keyword, to be included:
Obtain the context words cooccurrence relation between each described word;
Obtain the context transfer probability between word described in any two;
Based on the context words cooccurrence relation and the context transfer probability, each word is corrected
Word theme feature;
Based on each revised word word theme feature and the text subject feature
The matching degree of information, determines that the keyword and its corresponding word theme of some judgement documents is special
Levy, and obtain the importance information of the keyword.
6. method according to claim 4, wherein, it is described to be based on the keyword relational information
Expansion word relevant information is obtained, the expansion word relevant information includes expansion word and the expansion of the keyword
The word degree of correlation is opened up, and bag of words are set up based on the keyword relational information and the expansion word relevant information
Characteristic information includes:
Based on the keyword and its corresponding word theme feature, the expansion word of the keyword is determined
With the expansion word degree of correlation, wherein, the synonym of the expansion word including the keyword and cut out described
Sentence the correlation word of height correlation in document;
Based on the keyword and its corresponding word theme feature and the expansion word and expansion word phase
Guan Du, using bag of words, sets up bag of words characteristic information.
7. method according to claim 4, wherein, it is described special based on the text subject updated
Reference ceases and the bag of words characteristic information, and determining the Text eigenvector of the judgement document includes:
The updated text subject characteristic information and the bag of words characteristic information are merged, really
The urtext feature of the fixed judgement document;
Feature normalizing is carried out by the urtext feature to the judgement document, judge's text is determined
The Text eigenvector of book.
8. method according to any one of claim 1 to 7, wherein, based on some passes
Keyword relevant information updates to be included on the feature dictionary of keyword:
Using the keyword as index, the word theme feature and expansion word of each keyword are built
The vertical feature dictionary on keyword.
9. a kind of be used for the method that the second equipment end excavates similar judgement document based on big data, wherein,
Methods described includes:
Input case text is obtained, based on the feature dictionary in searching database on keyword, is extracted
Some candidate keywords of the input case text;
Content of text and some candidate keywords based on the input case text obtain described
The text subject characteristic information and some keyword relational informations of case text are inputted, and based on the text
This theme feature information and some keyword relational informations set up the text of the input case text
Eigen vector;
From the searching database obtain with it is described input case text have phase accomplice by it is some
The judgement document of candidate;
Calculate the Text eigenvector of the judgement document of the candidate and the text of the input case text
The similarity of eigen vector, similar judgement document is chosen based on the similarity.
10. method according to claim 9, wherein, methods described also includes:
The text spy of the open judgement document acquired in first equipment is received from the first equipment
Vectorial, described feature dictionary and judge's relevant information are levied, and is preserved into the searching database,
It is described judge relevant information include party's information, case type, case by and court verdict.
11. the method according to claim 9 or 10, wherein, it is described to obtain input case text,
Based on the feature dictionary in searching database on keyword, some of the input case text are extracted
Candidate keywords include:
Input case text is obtained, the case based on the input case text is by from searching database
Feature dictionary on keyword extracts some candidate keywords of the input case text.
12. the method according to any one of claim 9 to 11, wherein, it is described based on described defeated
The content of text and some candidate keywords for entering case text obtain the input case text
Text subject characteristic information and some keyword relational informations, and based on the text subject characteristic information
The Text eigenvector for setting up the input case text with some keyword relational informations includes:
All keys of each word and all judgement documents based on the input case text
Word is compared, to extract candidate keywords and its word theme feature from the input case text,
And the text subject characteristic information of the input case text is obtained based on the word theme feature;
The context relation between each described candidate keywords is obtained, based on the context relation amendment
The word theme feature of each candidate keywords, and based on each revised described candidate keywords of institute
Word theme feature and the text subject characteristic information matching degree, determine the input case
The keyword relational information of text;
Based on the keyword relational information, the text subject feature letter of the input case text is updated
Breath and acquisition expansion word relevant information, and it is related to the expansion word based on the keyword relational information
Information sets up described input this paper bag of words characteristic information, and based on the text subject feature letter updated
Breath and the bag of words characteristic information, determine the Text eigenvector of the input case text.
13. the method according to any one of claim 9 to 12, wherein, methods described also includes:
Receive the structuring that carries out the judgement document transmitted by first equipment and handle resulting
Text structure information after structuring;
Obtain the text structure information of the similar judgement document.
14. a kind of the first equipment for being used to excavate similar judgement document based on big data, wherein, described the
One equipment includes:
Judgement document's acquisition device, for obtaining the published judgement document of magnanimity, and obtains each institute
State the case of judgement document by;
Text feature excavating gear, for based on each judgement document content of text obtain on
The text subject characteristic information of judgement document's merit and striven on party in the judgement document
Some keyword relational informations of content and party's demand content are discussed, and it is special based on the text subject
Reference ceases the Text eigenvector that the judgement document is set up with some keyword relational informations;
Feature dictionary sets up device, for being updated based on some keyword relational informations on key
The feature dictionary of word.
15. the first equipment according to claim 14, wherein, first equipment also includes:
Text structure makeup is put, and for the judgement document to be carried out into structuring processing, obtains structuring
Text structure information afterwards;
Text structure information acquisition device, for based on sanction described in the text structure acquisition of information
Sentence judge's relevant information of document, judge's relevant information include party's information, case type,
Case by and court verdict.
16. the first equipment according to any one of claims 14 or 15, wherein, described first
Equipment also includes:
Dispensing device, for by the Text eigenvector of all judgement documents, the feature dictionary
And judge's relevant information is sent into the searching database of the second equipment.
17. the first equipment according to any one of claim 14 to 16, wherein, the text
Feature mining device includes:
First excavates unit, text subject characteristic information and the sanction for extracting the judgement document
Sentence the word theme feature of the word of each in document;
Second excavates unit, for obtaining the context relation between each described word, on described
The hereafter word theme feature of each word of relation amendment, and based on each revised described word of institute
Word theme feature and the text subject characteristic information matching degree, determine some judges
The keyword relational information of document, wherein, the keyword relational information includes keyword, keyword
Importance information and the corresponding word theme feature of keyword;
3rd excavates unit, for based on the keyword relational information, updating the judgement document's
Text subject characteristic information;
Generation unit, it is described for obtaining expansion word relevant information based on the keyword relational information
Expansion word relevant information includes the expansion word and the expansion word degree of correlation of the keyword, and is closed based on described
Keyword relevant information and the expansion word relevant information set up bag of words characteristic information, and based on being updated
Text subject characteristic information and the bag of words characteristic information, determine the text feature of the judgement document to
Amount.
18. the first equipment according to any one of claim 14 to 17, wherein, described second
Excavating unit is used for:
Obtain the context words cooccurrence relation between each described word;
Obtain the context transfer probability between word described in any two;
Based on the context words cooccurrence relation and the context transfer probability, each word is corrected
Word theme feature;
Based on each revised word word theme feature and the text subject feature
The matching degree of information, determines that the keyword and its corresponding word theme of some judgement documents is special
Levy, and obtain the importance information of the keyword.
19. the first equipment according to any one of claim 14 to 18, wherein, the generation
Unit is used for:
Based on the keyword and its corresponding word theme feature, the expansion word of the keyword is determined
With the expansion word degree of correlation, wherein, the synonym of the expansion word including the keyword and cut out described
Sentence the correlation word of height correlation in document;
Based on the keyword and its corresponding word theme feature and the expansion word and expansion word phase
Guan Du, using bag of words, sets up bag of words characteristic information.
20. the first equipment according to any one of claim 14 to 19, wherein, the generation
Literary unit is used for:
The updated text subject characteristic information and the bag of words characteristic information are merged, really
The urtext feature of the fixed judgement document;
Feature normalizing is carried out by the urtext feature to the judgement document, judge's text is determined
The Text eigenvector of book.
21. the first equipment according to any one of claim 14 to 20, wherein, the feature
Dictionary, which sets up device, to be used for:
Using the keyword as index, the word theme feature and expansion word of each keyword are built
The vertical feature dictionary on keyword.
22. a kind of the second equipment for being used to excavate similar judgement document based on big data, wherein, it is described
Second equipment includes:
Input unit, for obtain input case text, based in searching database on keyword
Feature dictionary, extracts some candidate keywords of the input case text;
Case text feature excavating gear is inputted, for the content of text based on the input case text
And if some candidate keywords obtain the input case text text subject characteristic information and
Dry keyword relational information, and it is related to some keywords based on the text subject characteristic information
Information sets up the Text eigenvector of the input case text;
Candidate's judgement document's acquisition device, for being obtained and the input case from the searching database
Example text originally have phase accomplice by some candidates judgement document;
Similar judgement document's acquisition device, for calculate the candidate judgement document text feature to
The similarity of the Text eigenvector of amount and the input case text, phase is chosen based on the similarity
As judgement document.
23. the second equipment according to claim 22, wherein, second equipment also includes:
Reception device, for receiving the open judge acquired in first equipment from the first equipment
The Text eigenvector of document, the feature dictionary and judge's relevant information, and preserve to described
In searching database, it is described judge relevant information include party's information, case type, case by with sentence
Certainly result.
24. the second equipment according to claim 22 or 23, wherein, the input unit is used
In:
Input case text is obtained, the case based on the input case text is by from searching database
Feature dictionary on keyword extracts some candidate keywords of the input case text.
25. the second equipment according to any one of claim 22 to 24, wherein, the input
Case text feature excavating gear includes:
4th excavates unit, for each word based on the input case text and all sanctions
The whole keywords for sentencing document are compared, to extract candidate keywords from the input case text
And its word theme feature, and the text for inputting case text is obtained based on the word theme feature
This theme feature information;
5th excavates unit, for obtaining the context relation between each described candidate keywords, is based on
The word theme feature of each candidate keywords of the context relation amendment, and it is revised to be based on institute
The matching journey of the word theme feature of each candidate keywords and the text subject characteristic information
Degree, determines the keyword relational information of the input case text;
6th excavates unit, for based on the keyword relational information, updating the input case example text
This text subject characteristic information and acquisition expansion word relevant information, and based on keyword correlation letter
Breath and the expansion word relevant information set up described input this paper bag of words characteristic information, and based on institute more
New text subject characteristic information and the bag of words characteristic information, determines the text of the input case text
Eigen vector.
26. the second equipment according to any one of claim 22 to 25, wherein, described second
Equipment also includes:
Text structure information receiver, for receiving being cut out described transmitted by first equipment
Sentence document and carry out the text structure information after the resulting structuring of structuring processing;
Text structure information acquisition device, the text structure for obtaining the similar judgement document
Change information.
27. a kind of system for being used to excavate similar judgement document based on big data, wherein, the system
Including the first equipment and the second equipment:
First equipment includes:
Judgement document's acquisition device, for obtaining the published judgement document of magnanimity, and obtains each institute
State the case of judgement document by;
Text feature excavating gear, for based on each judgement document content of text obtain on
The text subject characteristic information of judgement document's merit and striven on party in the judgement document
Some keyword relational informations of content and party's demand content are discussed, and it is special based on the text subject
Reference ceases the Text eigenvector that the judgement document is set up with some keyword relational informations;
Feature dictionary sets up device, for being updated based on some keyword relational informations on key
The feature dictionary of word;
Text structure makeup is put, and for the judgement document to be carried out into structuring processing, obtains structuring
Text structure information afterwards;
Text structure information acquisition device, for based on sanction described in the text structure acquisition of information
Sentence judge's relevant information of document, judge's relevant information include party's information, case type,
Case by and court verdict;
Dispensing device, for by the Text eigenvector of all judgement documents, the feature dictionary
And judge's relevant information is sent into the searching database of the second equipment;
Second equipment includes:
Reception device, for receiving the open judge acquired in first equipment from the first equipment
The Text eigenvector of document, the feature dictionary and judge's relevant information, and preserve to described
In searching database, it is described judge relevant information include party's information, case type, case by with sentence
Certainly result;
Text structure information receiver, for receiving being cut out described transmitted by first equipment
Sentence document and carry out the text structure information after the resulting structuring of structuring processing;
Text structure information acquisition device, the text structure for obtaining the similar judgement document
Change information;
Input unit, for obtain input case text, based in searching database on keyword
Feature dictionary, extracts some candidate keywords of the input case text;
Case text feature excavating gear is inputted, for the content of text based on the input case text
And if some candidate keywords obtain the input case text text subject characteristic information and
Dry keyword relational information, and it is related to some keywords based on the text subject characteristic information
Information sets up the Text eigenvector of the input case text;
Candidate's judgement document's acquisition device, for being obtained and the input case from the searching database
Example text originally have phase accomplice by some candidates judgement document;
Similar judgement document's acquisition device, for calculate the candidate judgement document text feature to
The similarity of the Text eigenvector of amount and the input case text, phase is chosen based on the similarity
As judgement document.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610038106.XA CN106991092B (en) | 2016-01-20 | 2016-01-20 | Method and equipment for mining similar referee documents based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610038106.XA CN106991092B (en) | 2016-01-20 | 2016-01-20 | Method and equipment for mining similar referee documents based on big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106991092A true CN106991092A (en) | 2017-07-28 |
CN106991092B CN106991092B (en) | 2021-11-05 |
Family
ID=59413645
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610038106.XA Active CN106991092B (en) | 2016-01-20 | 2016-01-20 | Method and equipment for mining similar referee documents based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106991092B (en) |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107562938A (en) * | 2017-09-21 | 2018-01-09 | 重庆工商大学 | A kind of law court intelligently tries method |
CN107633465A (en) * | 2017-08-21 | 2018-01-26 | 厦门能见易判信息科技有限公司 | Intelligence aids in method of deciding a case |
CN107918921A (en) * | 2017-11-21 | 2018-04-17 | 南京擎盾信息科技有限公司 | Criminal case court verdict measure and system |
CN108038091A (en) * | 2017-10-30 | 2018-05-15 | 上海思贤信息技术股份有限公司 | A kind of similar calculating of judgement document's case based on figure and search method and system |
CN108197163A (en) * | 2017-12-14 | 2018-06-22 | 上海银江智慧智能化技术有限公司 | A kind of structuring processing method based on judgement document |
CN108304386A (en) * | 2018-03-05 | 2018-07-20 | 上海思贤信息技术股份有限公司 | A kind of logic-based rule infers the method and device of legal documents court verdict |
CN109285094A (en) * | 2017-07-19 | 2019-01-29 | 北京国双科技有限公司 | The processing method and processing device of legal documents |
CN109284359A (en) * | 2018-09-13 | 2019-01-29 | 巫溪县片刻网络科技有限公司 | A kind of trial ancillary data management platform |
CN109426905A (en) * | 2017-08-29 | 2019-03-05 | 北京国双科技有限公司 | A kind of determination method and device that the criminal document measurement of penalty deviates |
CN109472722A (en) * | 2017-09-08 | 2019-03-15 | 北京国双科技有限公司 | Obtain the method and device that judgement document to be generated finds out section relevant information through trying |
CN109472017A (en) * | 2017-09-08 | 2019-03-15 | 北京国双科技有限公司 | Obtain the method and device that judgement document the court to be generated thinks section relevant information |
CN109583669A (en) * | 2017-09-28 | 2019-04-05 | 北京国双科技有限公司 | Data capture method, device, storage medium and processor |
CN110019669A (en) * | 2017-10-31 | 2019-07-16 | 北京国双科技有限公司 | A kind of text searching method and device |
CN110019697A (en) * | 2017-08-29 | 2019-07-16 | 北京国双科技有限公司 | A kind of method for pushing and device of criminal document |
CN110019670A (en) * | 2017-10-31 | 2019-07-16 | 北京国双科技有限公司 | A kind of text searching method and device |
CN110019672A (en) * | 2017-11-09 | 2019-07-16 | 北京国双科技有限公司 | A kind of method for pushing of similar case, system, storage medium and processor |
CN110019663A (en) * | 2017-09-30 | 2019-07-16 | 北京国双科技有限公司 | A kind of method for pushing, system, storage medium and the processor of case information |
CN110019668A (en) * | 2017-10-31 | 2019-07-16 | 北京国双科技有限公司 | A kind of text searching method and device |
CN110162590A (en) * | 2019-02-22 | 2019-08-23 | 北京捷风数据技术有限公司 | A kind of database displaying method and device thereof of calling for tenders of project text combination economic factor |
CN110209760A (en) * | 2019-06-13 | 2019-09-06 | 北京百度网讯科技有限公司 | Go through the associated method and apparatus of part of trying a case, electronic equipment, computer-readable medium |
WO2019170015A1 (en) * | 2018-03-09 | 2019-09-12 | 北京国双科技有限公司 | Judicial document searching method and device |
CN110362799A (en) * | 2019-06-17 | 2019-10-22 | 平安科技(深圳)有限公司 | Processing method, device and computer equipment are generated based on the award arbitrated online |
CN110472048A (en) * | 2019-07-19 | 2019-11-19 | 平安科技(深圳)有限公司 | A kind of auxiliary judgement method, apparatus and terminal device |
CN110727787A (en) * | 2019-10-11 | 2020-01-24 | 北京明略软件系统有限公司 | Case text matching method and device, electronic equipment and storage medium |
CN110738039A (en) * | 2019-09-03 | 2020-01-31 | 平安科技(深圳)有限公司 | Prompting method, device, storage medium and server for case auxiliary information |
CN110827177A (en) * | 2018-08-13 | 2020-02-21 | 北京国双科技有限公司 | Case-like document searching method and device |
CN110941645A (en) * | 2018-09-21 | 2020-03-31 | 北京国双科技有限公司 | Method, device, storage medium and processor for automatically judging case string |
CN110955760A (en) * | 2018-09-26 | 2020-04-03 | 北京国双科技有限公司 | Evaluation method of judgment result and related device |
CN110968662A (en) * | 2018-09-27 | 2020-04-07 | 北京国双科技有限公司 | Judicial data processing method and device, storage medium and processor |
CN110990522A (en) * | 2018-09-30 | 2020-04-10 | 北京国双科技有限公司 | Legal document determining method and system |
CN111008261A (en) * | 2018-09-19 | 2020-04-14 | 北京国双科技有限公司 | Method and device for determining referee document based on preposed document |
CN111144095A (en) * | 2019-11-26 | 2020-05-12 | 方正璞华软件(武汉)股份有限公司 | Method and device for generating work damage case sanction book |
CN111259160A (en) * | 2018-11-30 | 2020-06-09 | 百度在线网络技术(北京)有限公司 | Knowledge graph construction method, device, equipment and storage medium |
CN111291152A (en) * | 2018-12-07 | 2020-06-16 | 北大方正集团有限公司 | Case document recommendation method, device, equipment and storage medium |
CN111382769A (en) * | 2018-12-29 | 2020-07-07 | 阿里巴巴集团控股有限公司 | Information processing method, device and system |
CN112784007A (en) * | 2020-07-16 | 2021-05-11 | 上海芯翌智能科技有限公司 | Text matching method and device, storage medium and computer equipment |
CN112925877A (en) * | 2019-12-06 | 2021-06-08 | 中国科学院软件研究所 | One-person multi-case association identification method and system based on depth measurement learning |
WO2021164226A1 (en) * | 2020-02-20 | 2021-08-26 | 平安科技(深圳)有限公司 | Method and apparatus for querying knowledge map of legal cases, device and storage medium |
CN117453856A (en) * | 2023-10-19 | 2024-01-26 | 中国司法大数据研究院有限公司 | Method and device for extracting calendar and examination case series based on multi-source data fusion |
CN117830060A (en) * | 2024-03-04 | 2024-04-05 | 天津财经大学 | Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145153A (en) * | 2006-09-13 | 2008-03-19 | 阿里巴巴公司 | Method and system for searching information |
US7577652B1 (en) * | 2008-08-20 | 2009-08-18 | Yahoo! Inc. | Measuring topical coherence of keyword sets |
CN101655857A (en) * | 2009-09-18 | 2010-02-24 | 西安建筑科技大学 | Method for mining data in construction regulation field based on associative regulation mining technology |
JP2013030098A (en) * | 2011-07-29 | 2013-02-07 | Kddi R & D Laboratories Inc | Importance level determination device, importance level determination method, and program |
CN102982063A (en) * | 2012-09-18 | 2013-03-20 | 华东师范大学 | Control method based on tuple elaboration of relation keywords extension |
CN103294820A (en) * | 2013-06-14 | 2013-09-11 | 广东电网公司电力科学研究院 | WEB page classifying method and system based on semantic extension |
US20140040301A1 (en) * | 2012-08-02 | 2014-02-06 | Rule 14 | Real-time and adaptive data mining |
CN103970806A (en) * | 2013-02-05 | 2014-08-06 | 百度在线网络技术(北京)有限公司 | Method and device for establishing lyric-feelings classification models |
CN104298715A (en) * | 2014-09-16 | 2015-01-21 | 北京航空航天大学 | TF-IDF based multiple-index result merging and sequencing method |
CN104424291A (en) * | 2013-09-02 | 2015-03-18 | 阿里巴巴集团控股有限公司 | Method and device for sorting search results |
CN104572849A (en) * | 2014-12-17 | 2015-04-29 | 西安美林数据技术股份有限公司 | Automatic standardized filing method based on text semantic mining |
CN104881424A (en) * | 2015-03-13 | 2015-09-02 | 国家电网公司 | Regular expression-based acquisition, storage and analysis method of power big data |
-
2016
- 2016-01-20 CN CN201610038106.XA patent/CN106991092B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145153A (en) * | 2006-09-13 | 2008-03-19 | 阿里巴巴公司 | Method and system for searching information |
US7577652B1 (en) * | 2008-08-20 | 2009-08-18 | Yahoo! Inc. | Measuring topical coherence of keyword sets |
CN101655857A (en) * | 2009-09-18 | 2010-02-24 | 西安建筑科技大学 | Method for mining data in construction regulation field based on associative regulation mining technology |
JP2013030098A (en) * | 2011-07-29 | 2013-02-07 | Kddi R & D Laboratories Inc | Importance level determination device, importance level determination method, and program |
US20140040301A1 (en) * | 2012-08-02 | 2014-02-06 | Rule 14 | Real-time and adaptive data mining |
CN102982063A (en) * | 2012-09-18 | 2013-03-20 | 华东师范大学 | Control method based on tuple elaboration of relation keywords extension |
CN103970806A (en) * | 2013-02-05 | 2014-08-06 | 百度在线网络技术(北京)有限公司 | Method and device for establishing lyric-feelings classification models |
CN103294820A (en) * | 2013-06-14 | 2013-09-11 | 广东电网公司电力科学研究院 | WEB page classifying method and system based on semantic extension |
CN104424291A (en) * | 2013-09-02 | 2015-03-18 | 阿里巴巴集团控股有限公司 | Method and device for sorting search results |
CN104298715A (en) * | 2014-09-16 | 2015-01-21 | 北京航空航天大学 | TF-IDF based multiple-index result merging and sequencing method |
CN104572849A (en) * | 2014-12-17 | 2015-04-29 | 西安美林数据技术股份有限公司 | Automatic standardized filing method based on text semantic mining |
CN104881424A (en) * | 2015-03-13 | 2015-09-02 | 国家电网公司 | Regular expression-based acquisition, storage and analysis method of power big data |
Non-Patent Citations (3)
Title |
---|
WU D 等: "Identification of web query intent based on query text and web knowledge", 《PCSPA2010 FIRST INTERNATIONAL CONFERENCE ON》 * |
向李兴: "基于自然语义处理的裁判文书推荐系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
宋巍 等: "基于检索历史上下文的个性化查询重构技术研究", 《中文信息学报》 * |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109285094A (en) * | 2017-07-19 | 2019-01-29 | 北京国双科技有限公司 | The processing method and processing device of legal documents |
CN107633465A (en) * | 2017-08-21 | 2018-01-26 | 厦门能见易判信息科技有限公司 | Intelligence aids in method of deciding a case |
CN110019697A (en) * | 2017-08-29 | 2019-07-16 | 北京国双科技有限公司 | A kind of method for pushing and device of criminal document |
CN109426905A (en) * | 2017-08-29 | 2019-03-05 | 北京国双科技有限公司 | A kind of determination method and device that the criminal document measurement of penalty deviates |
CN109426905B (en) * | 2017-08-29 | 2022-03-18 | 北京国双科技有限公司 | Criminal document criminal deviation judging method and device |
CN109472722A (en) * | 2017-09-08 | 2019-03-15 | 北京国双科技有限公司 | Obtain the method and device that judgement document to be generated finds out section relevant information through trying |
CN109472722B (en) * | 2017-09-08 | 2021-08-17 | 北京国双科技有限公司 | Method and device for obtaining relevant information of approved finding segment of official document to be generated |
CN109472017A (en) * | 2017-09-08 | 2019-03-15 | 北京国双科技有限公司 | Obtain the method and device that judgement document the court to be generated thinks section relevant information |
CN109472017B (en) * | 2017-09-08 | 2022-09-20 | 北京国双科技有限公司 | Method and device for obtaining relevant information of text court deeds of referee to be generated |
CN107562938A (en) * | 2017-09-21 | 2018-01-09 | 重庆工商大学 | A kind of law court intelligently tries method |
CN109583669A (en) * | 2017-09-28 | 2019-04-05 | 北京国双科技有限公司 | Data capture method, device, storage medium and processor |
CN110019663A (en) * | 2017-09-30 | 2019-07-16 | 北京国双科技有限公司 | A kind of method for pushing, system, storage medium and the processor of case information |
CN108038091B (en) * | 2017-10-30 | 2021-12-14 | 上海思贤信息技术股份有限公司 | Graph-based referee document case similarity calculation and retrieval method and system |
CN108038091A (en) * | 2017-10-30 | 2018-05-15 | 上海思贤信息技术股份有限公司 | A kind of similar calculating of judgement document's case based on figure and search method and system |
CN110019670A (en) * | 2017-10-31 | 2019-07-16 | 北京国双科技有限公司 | A kind of text searching method and device |
CN110019669B (en) * | 2017-10-31 | 2021-06-29 | 北京国双科技有限公司 | Text retrieval method and device |
CN110019669A (en) * | 2017-10-31 | 2019-07-16 | 北京国双科技有限公司 | A kind of text searching method and device |
CN110019668A (en) * | 2017-10-31 | 2019-07-16 | 北京国双科技有限公司 | A kind of text searching method and device |
CN110019672A (en) * | 2017-11-09 | 2019-07-16 | 北京国双科技有限公司 | A kind of method for pushing of similar case, system, storage medium and processor |
CN107918921B (en) * | 2017-11-21 | 2021-10-08 | 南京擎盾信息科技有限公司 | Criminal case judgment result measuring method and system |
CN107918921A (en) * | 2017-11-21 | 2018-04-17 | 南京擎盾信息科技有限公司 | Criminal case court verdict measure and system |
CN108197163B (en) * | 2017-12-14 | 2021-08-10 | 上海银江智慧智能化技术有限公司 | Structured processing method based on referee document |
CN108197163A (en) * | 2017-12-14 | 2018-06-22 | 上海银江智慧智能化技术有限公司 | A kind of structuring processing method based on judgement document |
CN108304386A (en) * | 2018-03-05 | 2018-07-20 | 上海思贤信息技术股份有限公司 | A kind of logic-based rule infers the method and device of legal documents court verdict |
WO2019170015A1 (en) * | 2018-03-09 | 2019-09-12 | 北京国双科技有限公司 | Judicial document searching method and device |
CN110827177A (en) * | 2018-08-13 | 2020-02-21 | 北京国双科技有限公司 | Case-like document searching method and device |
CN109284359A (en) * | 2018-09-13 | 2019-01-29 | 巫溪县片刻网络科技有限公司 | A kind of trial ancillary data management platform |
CN111008261A (en) * | 2018-09-19 | 2020-04-14 | 北京国双科技有限公司 | Method and device for determining referee document based on preposed document |
CN111008261B (en) * | 2018-09-19 | 2023-08-25 | 北京国双科技有限公司 | Method and device for determining referee document based on prepositive document |
CN110941645A (en) * | 2018-09-21 | 2020-03-31 | 北京国双科技有限公司 | Method, device, storage medium and processor for automatically judging case string |
CN110955760A (en) * | 2018-09-26 | 2020-04-03 | 北京国双科技有限公司 | Evaluation method of judgment result and related device |
CN110968662A (en) * | 2018-09-27 | 2020-04-07 | 北京国双科技有限公司 | Judicial data processing method and device, storage medium and processor |
CN110990522A (en) * | 2018-09-30 | 2020-04-10 | 北京国双科技有限公司 | Legal document determining method and system |
CN110990522B (en) * | 2018-09-30 | 2023-07-04 | 北京国双科技有限公司 | Legal document determining method and system |
CN111259160A (en) * | 2018-11-30 | 2020-06-09 | 百度在线网络技术(北京)有限公司 | Knowledge graph construction method, device, equipment and storage medium |
CN111259160B (en) * | 2018-11-30 | 2023-08-29 | 百度在线网络技术(北京)有限公司 | Knowledge graph construction method, device, equipment and storage medium |
CN111291152A (en) * | 2018-12-07 | 2020-06-16 | 北大方正集团有限公司 | Case document recommendation method, device, equipment and storage medium |
CN111382769A (en) * | 2018-12-29 | 2020-07-07 | 阿里巴巴集团控股有限公司 | Information processing method, device and system |
CN111382769B (en) * | 2018-12-29 | 2023-09-22 | 阿里巴巴集团控股有限公司 | Information processing method, device and system |
CN110162590A (en) * | 2019-02-22 | 2019-08-23 | 北京捷风数据技术有限公司 | A kind of database displaying method and device thereof of calling for tenders of project text combination economic factor |
CN110209760A (en) * | 2019-06-13 | 2019-09-06 | 北京百度网讯科技有限公司 | Go through the associated method and apparatus of part of trying a case, electronic equipment, computer-readable medium |
CN110362799A (en) * | 2019-06-17 | 2019-10-22 | 平安科技(深圳)有限公司 | Processing method, device and computer equipment are generated based on the award arbitrated online |
CN110362799B (en) * | 2019-06-17 | 2024-02-06 | 平安科技(深圳)有限公司 | On-line arbitration-based method and device for generating and processing resolution book and computer equipment |
CN110472048A (en) * | 2019-07-19 | 2019-11-19 | 平安科技(深圳)有限公司 | A kind of auxiliary judgement method, apparatus and terminal device |
CN110738039A (en) * | 2019-09-03 | 2020-01-31 | 平安科技(深圳)有限公司 | Prompting method, device, storage medium and server for case auxiliary information |
CN110727787A (en) * | 2019-10-11 | 2020-01-24 | 北京明略软件系统有限公司 | Case text matching method and device, electronic equipment and storage medium |
CN111144095B (en) * | 2019-11-26 | 2024-04-05 | 方正璞华软件(武汉)股份有限公司 | Method and device for generating work case judgment |
CN111144095A (en) * | 2019-11-26 | 2020-05-12 | 方正璞华软件(武汉)股份有限公司 | Method and device for generating work damage case sanction book |
CN112925877B (en) * | 2019-12-06 | 2023-07-07 | 中国科学院软件研究所 | One-person-multiple-case association identification method and system based on deep measurement learning |
CN112925877A (en) * | 2019-12-06 | 2021-06-08 | 中国科学院软件研究所 | One-person multi-case association identification method and system based on depth measurement learning |
WO2021164226A1 (en) * | 2020-02-20 | 2021-08-26 | 平安科技(深圳)有限公司 | Method and apparatus for querying knowledge map of legal cases, device and storage medium |
CN112784007A (en) * | 2020-07-16 | 2021-05-11 | 上海芯翌智能科技有限公司 | Text matching method and device, storage medium and computer equipment |
CN112784007B (en) * | 2020-07-16 | 2023-02-21 | 上海芯翌智能科技有限公司 | Text matching method and device, storage medium and computer equipment |
CN117453856A (en) * | 2023-10-19 | 2024-01-26 | 中国司法大数据研究院有限公司 | Method and device for extracting calendar and examination case series based on multi-source data fusion |
CN117453856B (en) * | 2023-10-19 | 2024-05-07 | 中国司法大数据研究院有限公司 | Method and device for extracting hold court trial pieces of calendar series based on multi-source data fusion |
CN117830060A (en) * | 2024-03-04 | 2024-04-05 | 天津财经大学 | Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph |
CN117830060B (en) * | 2024-03-04 | 2024-05-28 | 天津财经大学 | Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN106991092B (en) | 2021-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106991092A (en) | The method and apparatus that similar judgement document is excavated based on big data | |
CN110825881B (en) | Method for establishing electric power knowledge graph | |
CN108984745B (en) | Neural network text classification method fusing multiple knowledge maps | |
CN110309331A (en) | A kind of cross-module state depth Hash search method based on self-supervisory | |
Wen et al. | Research on keyword extraction based on word2vec weighted textrank | |
CN109271537B (en) | Text-to-image generation method and system based on distillation learning | |
CN107526799A (en) | A kind of knowledge mapping construction method based on deep learning | |
CN106855853A (en) | Entity relation extraction system based on deep neural network | |
CN110555084B (en) | Remote supervision relation classification method based on PCNN and multi-layer attention | |
CN110889786A (en) | Legal action insured advocate security use judging service method based on LSTM technology | |
CN109165275B (en) | Intelligent substation operation ticket information intelligent search matching method based on deep learning | |
CN110866121A (en) | Knowledge graph construction method for power field | |
CN111914555B (en) | Automatic relation extraction system based on Transformer structure | |
CN112800184B (en) | Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction | |
CN111062214A (en) | Integrated entity linking method and system based on deep learning | |
Soysal et al. | An introduction to zero-shot learning: An essential review | |
CN116721176B (en) | Text-to-face image generation method and device based on CLIP supervision | |
CN112084788B (en) | Automatic labeling method and system for implicit emotion tendencies of image captions | |
CN115203429B (en) | Automatic knowledge graph expansion method for constructing ontology framework in auditing field | |
Ronghui et al. | Application of Improved Convolutional Neural Network in Text Classification. | |
CN115098646A (en) | Multilevel relation analysis and mining method for image-text data | |
CN113688233A (en) | Text understanding method for semantic search of knowledge graph | |
Tan et al. | Sentiment analysis of chinese short text based on multiple features | |
Hua et al. | Deep semantic correlation with adversarial learning for cross-modal retrieval | |
Perwej et al. | The State-of-the-Art Handwritten Recognition of Arabic Script Using Simplified Fuzzy ARTMAP and Hidden Markov Models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |