CN106991092A - The method and apparatus that similar judgement document is excavated based on big data - Google Patents

The method and apparatus that similar judgement document is excavated based on big data Download PDF

Info

Publication number
CN106991092A
CN106991092A CN201610038106.XA CN201610038106A CN106991092A CN 106991092 A CN106991092 A CN 106991092A CN 201610038106 A CN201610038106 A CN 201610038106A CN 106991092 A CN106991092 A CN 106991092A
Authority
CN
China
Prior art keywords
text
keyword
word
judgement document
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610038106.XA
Other languages
Chinese (zh)
Other versions
CN106991092B (en
Inventor
王浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610038106.XA priority Critical patent/CN106991092B/en
Publication of CN106991092A publication Critical patent/CN106991092A/en
Application granted granted Critical
Publication of CN106991092B publication Critical patent/CN106991092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The purpose of the application is a kind of method and apparatus that similar judgement document is excavated based on big data, by obtaining the published judgement document of magnanimity, and obtain the case of each judgement document by;Content of text based on each judgement document obtains some keyword relational informations on the text subject characteristic information of judgement document's merit and on party's dispute content and party's demand content in the judgement document, and sets up based on the text subject characteristic information and some keyword relational informations the Text eigenvector of the judgement document;The feature dictionary on keyword is updated based on some keyword relational informations, effectively each judgement document of magnanimity is accurately shown in the form of Text eigenvector, and have updated feature dictionary on keyword, so as to the similar judgement document of quick obtaining, the operating efficiency efficiency for improving and excavating similar judgement document has been reached.

Description

The method and apparatus that similar judgement document is excavated based on big data
Technical field
The application is related to computer realm, more particularly to a kind of based on the similar judgement document of big data excavation Technology.
Background technology
With developing rapidly for Internet technology, explosive growth is presented in the text data information on network, However, finding out a small amount of effective text data information in the text data information of these magnanimity just Become more and more difficult.For example, the automatically request-answering system, intelligent retrieval system, mail in magnanimity are sieved Select system etc. to exist to find out effective text data information in the system of a large amount of text data informations and get over Carry out more difficult and time consuming effort.
In the prior art, in Court business scene, judge is to making facts confirmation in part of trying a case and sentencing Certainly, it is necessary to excavate effective similar judgement document in advance or in real time before result.For example, the people Law court is similar in trial merit by relatively multiple judges, during the close different cases of party's demand Court verdict, it is whether reasonable with the court verdict for the judge that audits;Meanwhile, judge is actually hearing and decide a case During, the judgement document of the similar existing case of merit can be also referred to, the fact that form final is assert With the court verdict of judgement document.Because people's court is in the reality of the effective similar judgement document of search Depend on what substantial amounts of manpower mark and search, time-consuming effort again, moreover manpower were searched out in the operation of border The quality of similar judgement document places one's entire reliance upon personal experience, it is impossible to better meet Court business demand, Cause inefficiency;Again due to the different style of court record judgement documents at different levels, crucial merit and The demand of party's key is generally excavated by search pattern or traditional natural language processing method, is held The merit made mistake and party's demand are easily excavated, party's dispute point can not be excavated especially, causes to dig The accuracy of the effective similar judgement document excavated is low;Again due to for examining input case Confidentiality, it is impossible to which input in real time causes inquiry similar in the similar judgement document of a text query that tries a case Judgement document poor real, while when inquiring similar judgement document, due to similar sanction The word of sentencing document is more, content is complicated and judgement document court verdict need it is artificial extract, cause to look into The visualization of the court verdict of the similar judgement document ask is low, causes law court's processing trying a case The inefficiency during business of example text sheet.
Therefore, in the prior art, due to searching a certain input case text in the text data of magnanimity Similar judgement document take time and effort, poor real and accuracy it is low, cause normal process search industry The inefficiency of business.
The content of the invention
The purpose of the application is to provide a kind of method and apparatus that similar judgement document is excavated based on big data, To solve to search a certain input case text in the published judgement document of magnanimity in the prior art Similar judgement document take time and effort, poor real and accuracy it is low, cause normal process search industry The problem of inefficiency of business.
It is used to the first equipment end there is provided one kind according to the one side of the application excavate based on big data The method of similar judgement document, including:
Obtain the published judgement document of magnanimity, and obtain the case of each judgement document by;
Content of text based on each judgement document obtains the text on judgement document's merit This theme feature information and in the judgement document party dispute content and party's demand in Some keyword relational informations held, and based on the text subject characteristic information and some keys Word relevant information sets up the Text eigenvector of the judgement document;
The feature dictionary on keyword is updated based on some keyword relational informations.
It is used to the second equipment end there is provided one kind according to further aspect of the application dig based on big data The method for digging similar judgement document, including:
Input case text is obtained, based on the feature dictionary in searching database on keyword, is extracted Some candidate keywords of the input case text;
Content of text and some candidate keywords based on the input case text obtain described The text subject characteristic information and some keyword relational informations of case text are inputted, and based on the text This theme feature information and some keyword relational informations set up the text of the input case text Eigen vector;
From the searching database obtain with it is described input case text have phase accomplice by it is some The judgement document of candidate;
Calculate the Text eigenvector of the judgement document of the candidate and the text of the input case text The similarity of eigen vector, similar judgement document is chosen based on the similarity.
It is used to excavate similar judge based on big data there is provided one kind according to further aspect of the application First equipment of document, including:
Judgement document's acquisition device, for obtaining the published judgement document of magnanimity, and obtains each institute State the case of judgement document by;
Text feature excavating gear, for based on each judgement document content of text obtain on The text subject characteristic information of judgement document's merit and striven on party in the judgement document Some keyword relational informations of content and party's demand content are discussed, and it is special based on the text subject Reference ceases the Text eigenvector that the judgement document is set up with some keyword relational informations;
Feature dictionary sets up device, for being updated based on some keyword relational informations on key The feature dictionary of word.
It is used to excavate similar judge based on big data there is provided one kind according to further aspect of the application Second equipment of document, including:
Input unit, for obtain input case text, based in searching database on keyword Feature dictionary, extracts some candidate keywords of the input case text;
Case text feature excavating gear is inputted, for the content of text based on the input case text And if some candidate keywords obtain the input case text text subject characteristic information and Dry keyword relational information, and it is related to some keywords based on the text subject characteristic information Information sets up the Text eigenvector of the input case text;
Candidate's judgement document's acquisition device, for being obtained and the input case from the searching database Example text originally have phase accomplice by some candidates judgement document;
Similar judgement document's acquisition device, for calculate the candidate judgement document text feature to The similarity of the Text eigenvector of amount and the input case text, phase is chosen based on the similarity As judgement document.
It is used to excavate similar judge based on big data there is provided one kind according to further aspect of the application The system of document, the system includes the first equipment and the second equipment, wherein,
First equipment includes:Judgement document's acquisition device, for obtaining the published judge of magnanimity Document, and obtain the case of each judgement document by;Text feature excavating gear, for based on every The content of text of judgement document described in one obtains the text subject feature on judgement document's merit Information and some passes on party's dispute content and party's demand content in the judgement document Keyword relevant information, and based on the text subject characteristic information and some keyword relational informations Set up the Text eigenvector of the judgement document;Feature dictionary sets up device, for based on some institutes State feature dictionary of the keyword relational information renewal on keyword;Text structure makeup is put, for inciting somebody to action The judgement document carries out structuring processing, obtains the text structure information after structuring;Text knot Structure information acquisition device, for the sanction based on judgement document described in the text structure acquisition of information Sentence relevant information, judge's relevant information include party's information, case type, case by and judgement As a result;Dispensing device, for by the Text eigenvector of all judgement documents, the Feature Words Storehouse and judge's relevant information are sent into the searching database of the second equipment;
Second equipment includes:Reception device, for receiving the first equipment institute from the first equipment Text eigenvector, the feature dictionary and the judge of the open judgement document obtained is related Information, and preserve into the searching database, it is described judge relevant information include party's information, Case type, case by and court verdict;Text structure information receiver, for receiving described The text carried out the judgement document after the resulting structuring of structuring processing transmitted by one equipment Structured message;Text structure information acquisition device, for obtaining the similar judgement document's Text structure information;Input unit, for obtaining input case text, based in searching database On the feature dictionary of keyword, some candidate keywords of the input case text are extracted;Input Case text feature excavating gear, for the content of text based on the input case text and some institutes State text subject characteristic information and some keywords that candidate keywords obtain the input case text Relevant information, and set up based on the text subject characteristic information and some keyword relational informations The Text eigenvector of the input case text;Candidate's judgement document's acquisition device, for from described In searching database obtain with it is described input case text have phase accomplice by some candidates judge Document;Similar judgement document's acquisition device, the text feature of the judgement document for calculating the candidate The similarity of the Text eigenvector of case text is inputted described in vector sum, is chosen based on the similarity Similar judgement document.
Compared with prior art, one kind according to embodiments herein is used for the first equipment end group The method and apparatus of similar judgement document is excavated in big data, by obtaining the published judge's text of magnanimity Book, and obtain the case of each judgement document by;Content of text based on each judgement document Obtain on the text subject characteristic information of judgement document's merit and in the judgement document Some keyword relational informations of party's dispute content and party's demand content, and based on the text The text that this theme feature information and some keyword relational informations set up the judgement document is special Levy vector;The published each judgement document of magnanimity is effectively passed through into judge's text with judgement document The text subject characteristic information of writing desk feelings and in the judgement document party dispute content and work as These three key elements of some keyword relational informations of thing people's demand content excavate the text of judgement document Characteristic vector, and accurately being shown in the form of Text eigenvector, it is to avoid artificial time-consuming consumption The magnanimity judgement document for removing to power many analysis words, content complexity and different style, so that effectively Improve the operating efficiency for excavating similar judgement document;And based on some keyword relational informations The feature dictionary on keyword is updated, effectively by the content of text of judgement document with all passes The form for the feature dictionary that keyword and its word theme feature and expansion word are set up carries out height identification, makes Can the similar judgement document of quick obtaining and its corresponding Text eigenvector, reached raising dig The effect of the operating efficiency of the similar judgement document of pick.
Further, one kind according to embodiments herein is used for the second equipment end based on big number According to the method and apparatus for excavating similar judgement document, by obtaining input case text first, based on inspection Feature dictionary in rope database on keyword, some candidates for extracting the input case text are closed Keyword so that input case text obtains keyword and can found in searching database, so that effectively Improve the lookup for carrying out similar judgement document to input case text by keyword in ground;It is then based on The content of text and some candidate keywords of the input case text obtain the input case The text subject characteristic information and some keyword relational informations of text, and it is special based on the text subject Reference cease and some keyword relational informations set up it is described input case text text feature to Amount, effectively can be expressed the relevant information for inputting case text by the form of Text eigenvector Out;Finally from the searching database obtain with it is described input case text have phase accomplice by The judgement document of some candidates;Calculate the Text eigenvector of the judgement document of the candidate and described defeated Enter the similarity of the Text eigenvector of case text, similar judge's text is chosen based on the similarity Book, effectively by the Text eigenvector of the judgement document of the candidate sent from the first equipment and in real time The Text eigenvector for the input case text excavated carries out Similarity Measure, obtains similar judge Document, enabling rapidly accurately filtered out and input case from the published judgement document of magnanimity Example text sheet similar judgement document, it is to avoid artificial go that analysis word is more, content complexity with taking time and effort And the judgement document of the magnanimity of different style, so as to be effectively improved the work for excavating Similar Text Efficiency.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, this Shen Other features, objects and advantages please will become more apparent upon:
Fig. 1 shows that be used for the first equipment end according to one kind of the application one side is excavated based on big data The method flow schematic diagram of similar judgement document;
Fig. 2 shows to be based on for the first equipment end according to a preferred embodiment of the application one side Big data excavates the method flow schematic diagram of the Text eigenvector of judgement document;
Fig. 3 shows that be used for the second equipment end according to one kind of the application one side is excavated based on big data The method flow schematic diagram of similar judgement document;
Fig. 4 shows to be based on for the second equipment end according to a preferred embodiment of the application one side Big data excavates the method flow schematic diagram of the Text eigenvector of judgement document;
Fig. 5 is shown according to a kind of based on the similar judgement document's of big data excavation of the application one side Holistic approach schematic flow sheet;
Fig. 6 shows to be used to excavate similar judge's text based on big data according to one kind of the application one side The structural representation of first equipment of book;
Fig. 7 shows to be used for excavation of first equipment based on big data according to a kind of of the application one side The law court of similar judgement document examines the schematic flow sheet for respectively sentencing the stage;
Fig. 8 shows to be dug for cloud computing server according to a preferred embodiment of the application one side Dig the structural representation of the text feature excavating gear 12 of the Text eigenvector of judgement document;
Fig. 9 shows to be used to excavate similar judge's text based on big data according to one kind of the application one side The structural representation of second equipment of book;
Figure 10 shows to be used to dig based on big data according to a preferred embodiment of the application one side The input case text feature excavating gear 22 dug in law court's intranet server of similar judgement document Structural representation;
Figure 11 shows that being based on big data according to one kind of the application one side excavates similar judgement document System schematic.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows that be used for the first equipment end according to one kind of the application one side is excavated based on big data The method flow schematic diagram of similar judgement document.The method comprising the steps of S11, step S12 and step S13。
Wherein, the step S11:The published judgement document of magnanimity is obtained, and obtains each sanction Sentence the case of document by;The step S12:Content of text based on each judgement document, which is obtained, to close The party in the text subject characteristic information of judgement document's merit and on the judgement document Some keyword relational informations of content of disputing on and party's demand content, and based on the text subject Characteristic information and some keyword relational informations set up the Text eigenvector of the judgement document; The step S13:The feature dictionary on keyword is updated based on some keyword relational informations.
In step s 11, wherein the case of the judgement document by include but is not limited to contract dispute case by, Matrimonial dispute case by the infringement of, ownership and voluntary service dispute case by and applicable special procedure case case By etc..Certainly, the case of the judgement document in existing and all Court business scenes for being likely to occur from now on By if being applicable the application, adducible mode is contained in the application.
In step s 13, wherein the feature dictionary on keyword includes, magnanimity is published to cut out Sentence the corresponding expansion word relevant information of all keyword relational informations and keyword of document.
Here, judgement document that the judgement document includes but is not limited in Court business scene etc., bag Include Court of First Instance assert true document, Court of Second Instance assert true document, Court of Retrial assert true document, The bill of complaint, billof defence, inquiry record and testimony of witnesses etc..
The detailed of specific embodiment is carried out to the application by taking the judgement document in Court business scene as an example below Explain.Certainly, it is specific real using being carried out exemplified by the judgement document in Court business scene to the application herein Explaining in detail for example is applied, purpose only by way of example, embodiments herein not limited to this, other Software program in can equally realize following embodiments.
Due to more than, judgement document in people's court's business scenario not only word and content is complicated, and And due to the difference of region so that the record different style of judgement document, therefore need to be disclosed to magnanimity Judgement document carry out text feature processing so that judicial functionary can as soon as possible from magnanimity The similar judgement document of demand is found out in disclosed judgement document, wherein should come in terms of three below The judgement document of search need, three aspects be respectively judgement document's merit, party dispute in Hold and party's demand content.
It should be noted that the text subject characteristic information includes but is not limited in Court business scene On judgement document's merit in judgement document, the keyword includes but is not limited in Court business scene Party dispute content and party's demand content etc. in judgement document, below with working as in judgement document Thing people dispute content and party's demand content are for the keyword of judgement document and on judgement document's merit The judgement document is entered for the preferred embodiment that text subject characteristic information is the application one side The excavation of row Text eigenvector.
One preferred embodiment of the application one side is by obtaining the published judgement document of magnanimity And obtain the case of each judgement document by;Content of text based on each judgement document is obtained On the text subject characteristic information of judgement document's merit and on working as thing in the judgement document Some keyword relational informations of people's dispute content and party's demand content, and based on the text master Topic characteristic information and some keyword relational informations set up the text feature of the judgement document to Amount, due to by extracting being used as with party's dispute content and party's demand content in judgement document Keyword, and the extraction word relevant with party's demand content with party's dispute content is used as key The expansion word of word shows the judgement document in Court business scene in the form of Text eigenvector Come, and excavate the content on judgement document's merit as the special information of text subject so that be efficiently accurate It is true by more than word and the content of text of the complicated judgement document of content is accurately expressed so that Judicial functionary can quickly move through judgement document's merit, party's dispute content and party's demand Content search is further, related based on some keywords to required similar judgement document Feature dictionary of the information updating on keyword so that judicial functionary can input keyword and its While expansion word, found as soon as possible from feature dictionary relevant with the keyword and its expansion word inputted Judgement document, be effectively improved the operating efficiency in Court business scene.
Specifically, in the step S11, the published judgement document of magnanimity is obtained.For example, The published judgement document of magnanimity is captured in Court business scene, because according to the rule of the Supreme People's Court Fixed, almost all of judgement document is required for external disclosure, therefore is authorized by the Supreme People's Court Afterwards, published all judgement documents can be captured;And the acquisition published judgement document of magnanimity Judgement document institutes all in Court business scene can be captured by a common webpage capture device right The title answered, content, judgement is numbered, judgement law court, judge, the information such as time decision.
Further, step is also included after the step S11 and before the step S12 S14 (not shown) and step S15 (not shown), the step S14 (not shown) are cut out described Sentence document and carry out structuring processing, obtain the text structure information after structuring;The step S15 Judge relevant information of the (not shown) based on judgement document described in the text structure acquisition of information, It is described judge relevant information include party's information, case type, case by and court verdict.
In embodiments herein, the step S14 (not shown) is mainly in the step S11 The published judgement document of magnanimity of middle acquisition carries out Text Pretreatment and structuring processing.For example, will In step s 11 from Court business scene by webpage capture to the published judgement document of magnanimity Afterwards, it is necessary to extract the content of text of captured judgement document, the word processing to judgement document is carried out With structuring processing.In the step S14 (not shown), pass through webpage segmentation method first (pageparse) judgement document's Chinese version content is extracted, in the webpage segmentation method (pageparse) In the main content that different piece in judgement document is extracted by configuration webpage template;Then by inciting somebody to action The characters such as the Chinese space in judgement document are substituted for English, and numerical value is normalized into Arabic numerals, goes Except newline in document content, normalization document numbering and justice court title etc. are carried out to judgement document Text Pretreatment;Then structuring processing is carried out to the judgement document of the process Text Pretreatment, its In, in terms of the structuring processing includes following four:(1), extract judgement document in plaintiff, Defendant's name, normalization expression title and plaintiff and defendant in content, (two), extract judgement document In case type, wherein the case type is broadly divided into criminal suit, civil appeal, administration is told Dispute, IP dispute, written verdict, compensate case, perform 7 big judgement document's types such as case, (3), structuring extract judgement document in case case by, and normalize to people's court trial Standard case is by the case in storehouse by upper, and the court verdict of judgement document is extracted in (four), structuring, i.e., mainly Extract court verdict object, principal penalty, accessary penalty, indemnity and party's victory or defeat etc..
Further, content of text of the step S12 based on each judgement document obtain on The text subject characteristic information of judgement document's merit and striven on party in the judgement document Some keyword relational informations of content and party's demand content are discussed, and it is special based on the text subject Reference ceases the Text eigenvector that the judgement document is set up with some keyword relational informations, tool Body, the specific implementation procedures of step S12 are as shown in Fig. 2 wherein, Fig. 2 is shown according to this Apply for that a preferred embodiment of one side is used for the first equipment end and is based on big data excavation judge's text The method flow schematic diagram of the Text eigenvector of book.The step S12 specifically include step S121, Step S122, step S123 and step S124.
Wherein, the step S121 includes:Extract the text subject characteristic information of the judgement document With the word theme feature of each word in the judgement document;The step S122 includes:Obtain Context relation between each described word, the word based on each word of the context relation amendment Theme feature, and based on each revised word word theme feature and the text master The matching degree of characteristic information is inscribed, the keyword relational information of some judgement documents is determined, wherein, The keyword relational information includes keyword, keyword importance information and the corresponding word of keyword Theme feature;The step S123 includes:Based on the keyword relational information, described cut out is updated Sentence the text subject characteristic information of document;The step S124 includes:It is related based on the keyword Acquisition of information expansion word relevant information, the expansion word relevant information includes the expansion word of the keyword With the expansion word degree of correlation, and set up based on the keyword relational information and the expansion word relevant information Bag of words characteristic information, and based on the text subject characteristic information and the bag of words characteristic information updated, Determine the Text eigenvector of the judgement document.
Specifically, in the step S121, the text subject characteristic information tool of the judgement document Body is used for the merit for indicating the judgement document, in the embodiment of the present application preferably by topic model Method come extract acquisition judgement document text subject characteristic information and each word word theme Feature, wherein the topic model method is consistent with agent model method of the prior art.Certainly, Other it is existing or be likely to occur from now on extract judgement document in text subject characteristic informations and each The method of the word theme feature of word is such as applicable to the application, should also be included in the application protection domain Within, and be incorporated herein by reference herein.
Further, the context words that the step S122 includes obtaining between each described word are total to Existing relation;Obtain the context transfer probability between word described in any two;Based on the cliction up and down Language cooccurrence relation and the context transfer probability, correct the word theme feature of each word;It is based on The word theme feature of each revised word and the text subject characteristic information With degree, the keyword relational information of some judgement documents is determined, wherein, the keyword phase Closing information includes keyword, keyword importance information and the corresponding word theme feature of keyword.
In embodiments herein, the step S122 depends on the sanction extracted in step S121 The text subject characteristic information of document and the word theme feature of each word are sentenced, according to each institute of acquisition Context words cooccurrence relation between predicate language;Obtain the context transfer between word described in any two Probability;Based on the context words cooccurrence relation and the context transfer probability, each word is corrected The word theme feature of language;Based on the word theme feature of each revised word and described The matching degree of text subject characteristic information, determines the keyword and its correspondingly of some judgement documents Word theme feature, and obtain the importance information of the keyword.For example, for a judge I-th of word Wi in document Ds, if making the corresponding theme topic of the word be Tj, root Understand that the transition probability that word Wi occurs in judgement document Ds is according to topic model method: Pj (Wi | Ds)=P (Wi | Tj) × P (Tj | Ds);Wherein, P (Wi | Tj) under a theme Tj word Wi turn Probability is moved, P (Tj | Ds) is the transition probability of the theme Tj in a judgement document Ds, is then enumerated one by one The theme topic of word, obtains all transition probability Pj (Wi | Ds), wherein j values are 1 to k Natural positive integer, according to all transition probabilities of gained be judgement document Ds in i-th Word Wi selects a theme topic, wherein, most simply conventional method is to take to make Pj (Wi | Ds) value Maximum theme Tj, i.e. max [j] Pj (Wi | Ds);If then i-th of word in judgement document Ds Wi have selected a theme different from the word theme feature obtained in step S121 at this moment Topic, will be to each theme in the transition probability and judgement document of the word under given theme Transition probability impact accordingly, due to the transition probability of the word under given theme With the transition probability of each theme in judgement document again can be in turn influence word Wi in judge's text The calculating of the transition probability occurred in book Ds, therefore the once transfer is carried out to all judgement documents Probability P j (Wi | Ds) calculating, and reselect the word theme topic of word and regard an iteration as. After so according to above method n loop iteration of progress, the word master after judgement document's convergence is obtained Inscribe the keyword that the corresponding word of feature is judgement document, the corresponding word theme feature of the keyword As determined after iteration, the keyword determined by the method in above-described embodiment can more have Effect expresses the keyword of the judgement document and its word feature of keyword exactly.
In embodiments herein, in the step S123, based on true in the step S122 Fixed keyword relational information, updates the text subject characteristic information of the judgement document.For example, logical Below equation is crossed to update the text subject characteristic information of judgement document:
Wherein D represents the text subject characteristic information after updating, and judgement document includes n key Word, wiIt is importance information of i-th of keyword in judgement document, IiIt is keyword wiWord Theme feature, by the word theme feature weighted sum to the keyword in above judgement document, is obtained The text subject characteristic information of judgement document, can effectively remove unessential word in judgement document With the influence to building text subject characteristic information.
Further, expansion word is obtained based on the keyword relational information in the step S124 Relevant information, the expansion word relevant information includes the expansion word and the expansion word degree of correlation of the keyword. Wherein described expansion word includes the synonym and the keyword of the keyword in the judgement document The word of middle height correlation.In embodiments herein, by the theme for calculating any two word Characteristic similarity, to excavate synonym.For example, for keyword A, taking similarity highest some Individual word, is used as keyword A synonym.Wherein, by excavating the word algorithm of height correlation (word2vector) come the word of the height correlation that calculates keyword, the algorithm is to each word meter Term vector is calculated, the term vector similarity of any two word is then calculated, to excavate the word of height correlation Language.For example, for keyword A, taking several words of term vector similarity highest, being used as key The word of word A height correlation.
Further, expansion word correlation is obtained based on the keyword relational information in step S124 Information, the expansion word relevant information includes the expansion word and the expansion word degree of correlation of the keyword, and Bag of words characteristic information is set up based on the keyword relational information and the expansion word relevant information, specifically Ground, the step S124 includes being based on the keyword and its corresponding word theme feature, it is determined that The expansion word and the expansion word degree of correlation of the keyword, wherein, the expansion word includes the keyword Synonym and in the judgement document height correlation correlation word;Based on the keyword and its Corresponding word theme feature and the expansion word and the expansion word degree of correlation, using bag of words, set up Bag of words characteristic information.
In the embodiment of the present application, the bag of words characteristic information is used to indicate the keyword in judgement document And its corresponding word feature of expansion word.In bag of words characteristic information, the characteristic value of keyword feature is Importance information of the keyword in judgement document, the characteristic value of synonym feature is keyword importance The product of information and synonymous degree, the characteristic value of correlation word feature is keyword importance information and phase The product of pass degree.For example, it is assumed that one 100,000 different words are had in all judgement documents, that The bag of words characteristic information of each judgement document is the vector of 100,000 dimensions, and the position is marked per dimensional vector Whether the word put occurs in judgement document.For example, it is assumed that word word1 is bag of words characteristic information In the 1st dimension, word word2 is the 2nd dimension in bag of words characteristic information, and word word3 is bag of words The 10th dimension in characteristic information, word word4 is the 30th dimension in bag of words characteristic information, word3 With word1 similar words each other, similarity is weight13, word4 and word2 similar word each other Language, similarity is weight24;Wherein judgement document A comprising word word1, word3 and Weight4, and their importance information in A are respectively weight1, weight3, weight4, So judgement document A bag of words characteristic information the 1st dimension characteristic value be Weight1+weight13*weight3, the characteristic value of the 2nd dimension is weight24*weight4, the 10th The characteristic value of dimension is weight3+weight1*weight13, and the characteristic value of the 30th dimension is weight4. Wherein, the word feature of the word of the height correlation of keyword can also be obtained by above computational methods Characteristic value, therefore the characteristic value in the bag of words characteristic information of gained includes the word theme feature of keyword The corresponding characteristic value of the word theme feature of corresponding characteristic value and expansion word.
Further, the step S124 is based on the text subject characteristic information and institute's predicate updated Bag characteristic information, determines the Text eigenvector of the judgement document, specifically, the step S124 Including the updated text subject characteristic information and the bag of words characteristic information are merged, really The urtext feature of the fixed judgement document;Entered by the urtext feature to the judgement document Row feature normalizing, determines the Text eigenvector of the judgement document.
For example, by the text subject characteristic information of the judgement document obtained in the step S123 and Bag of words characteristic information is spliced into a characteristic vector, generates the urtext feature of judgement document.For example, The text subject characteristic information of judgement document is the characteristic vector of one 10 dimension, and bag of words characteristic information is The characteristic vector of one 100 dimension, the then urtext for having judgement document is characterized as one 110 spy tieed up Levy vector.The feature normalization method for recycling machine learning field conventional, enters to urtext feature Row feature normalizing, generates the Text eigenvector of judgement document.For example, it is assumed that all judgement documents Same feature meets normal distribution, thus can by every dimensional feature normalizing into standard normal distribution.
Further, the step S13 is based on some keyword relational informations renewals on key The feature dictionary of word, specifically, the step S13 are included using the keyword as index, to each The word theme feature and expansion word of the keyword set up the feature dictionary on keyword.Example Such as, in Court business scene, by the word of party's demand content in judgement document and thing is worked as The word of people's dispute content based on keyword lookup and works as thing as the keyword for extracting judgement document The related word of people's demand content and the related word of party's dispute content as keyword expansion Open up word and feature extraction is carried out to judgement document, obtain the keyword of judgement document and the spy of expansion word composition Levy dictionary.
Further, one kind of the one side of described the application is used for the similar text of the first equipment end excavation This method also include step S16 (not shown) by the Text eigenvector of all judgement documents, The feature dictionary and judge's relevant information are sent into the searching database of the second equipment.For example, In Court business scene, by the Text eigenvector of the judgement document obtained in the step S12, The feature dictionary of the judgement document obtained in the step S13 and (do not show in the step S14 Go out) in the text structure information of judgement document that obtains send to the second equipment so that the second equipment In the calculating logic for relying on the feature dictionary that calculates of the first equipment and simplifying, it is ensured that the first equipment and the Two equipment are directed to can export identical Text eigenvector and feature dictionary with a judgement document.
Fig. 3 shows that be used for the second equipment end according to one kind of the application one side is excavated based on big data The method flow schematic diagram of similar judgement document.The method should include step S21, step S22, step S23 and step S24.
Wherein, the step S21:Obtain input case text, based in searching database on close The feature dictionary of keyword, extracts some candidate keywords of the input case text;The step S22: Content of text and some candidate keywords based on the input case text obtain the input The text subject characteristic information and some keyword relational informations of case text, and based on the text master The text that topic characteristic information and some keyword relational informations set up the input case text is special Levy vector;The step S23:Obtain and have with the input case text from the searching database Have phase accomplice by some candidates judgement document;The step S24:Calculate the judge of the candidate The similarity of the Text eigenvector of the Text eigenvector of document and the input case text, is based on The similarity chooses similar judgement document.
It should be noted that it is described input case text include but is not limited to existing judgement document and Example text of trying a case book.Certainly, other input case texts that are existing or being likely to occur from now on are such as applicable to The application, should also be included within the application protection domain, and be incorporated herein by reference herein.
In embodiments herein, step S25 (not shown) is also included before the step S21, Described in the step S25 (not shown) includes receiving acquired in first equipment from the first equipment Text eigenvector, the feature dictionary and the judge's relevant information of open judgement document, and protect Deposit into the searching database, it is described judge relevant information include party's information, case type, Case by and court verdict.For example, the searching database on-line storage in Court business scene Intranet The Text eigenvector of judgement document, the feature dictionary and affiliated judge's relevant information.Specific storage The related information of judgement document include following eight aspects:(1), the case in every kind of judgement document Type and case are by corresponding judgement document.Wherein, Key is case type and case by value is judge Numbering of the document in internal system.(2), the structured message of existing judgement document.Wherein, Key It is numbering of the judgement document in internal system, value is the text knot generated by structuring extraction module Structure information.(3), the Text eigenvector of existing judgement document.Wherein, Key is judgement document In the numbering of internal system, value is the Text eigenvector of text feature module generation.(4), Whole keywords of existing judgement document.Wherein, Key is a constant, and value is keyword subject Whole keywords of module generation.(5), the word theme feature of each keyword.Wherein, Key It is keyword, value is the keyword word theme feature of keyword subject module generation.(6) it is, every The synonym of individual keyword.Wherein, Key is keyword, and value is the synonym of keyword and its same Adopted degree.(7), the related term of each keyword.Key is keyword, and value is the phase of keyword Close word and its degree of correlation.(8), characteristic value mean variance of the judgement document per dimensional feature.Key is Feature number, value is the average and variance of characteristic value.
It should be noted that the input case that the text type includes but is not limited in Court business scene The case type of example text sheet, wherein the case type includes criminal suit, civil appeal, administration is told Dispute, IP dispute, written verdict, compensate case, perform case and in example of trying a case in careful rank Section.Certainly, other text types that are existing or being likely to occur from now on are such as applicable to the application, also should Within the application protection domain, and it is incorporated herein by reference herein.
Further, the step S21 obtains input case text, based in searching database on The feature dictionary of keyword, extracts some candidate keywords of the input case text, specifically, The step S21 include obtain input case text, based on it is described input case text case by, from Feature dictionary in searching database on keyword extracts some candidates of the input case text Keyword.For example, searched in the published judgement document of magnanimity in Court business scene with it is described Input the similar judgement document of case text, due to judgement document's merit case in Court business scene by Type is different, therefore for the ease of rapidly finding the judgement document similar to input case text, then Case based on input case text is by being extracted in the feature dictionary of keyword from searching database With inputting the word that the word of case text mutually occurs simultaneously, some candidate keys of input case text are used as Word, it can be ensured that the keyword that input case text mining comes out is present in searching database.
Further, if the step S22 include based on it is described input case text content of text and The dry candidate keywords obtain the text subject characteristic information of the input case text and some passes Keyword relevant information, and based on the text subject characteristic information and some keyword relational informations The Text eigenvector of the input case text is set up, specifically, the step S22 is specifically held Row process is as shown in figure 4, wherein, Fig. 4 shows to be preferable to carry out according to one of the application one side Example is used for the method flow that the second equipment end excavates the Text eigenvector of judgement document based on big data Schematic diagram.The step S22 specifically includes step S221, step S222 and step S223.
Wherein, the step S221 includes:Each word and institute based on the input case text The whole keywords for having the judgement document are compared, and are waited with being extracted from the input case text Keyword and its word theme feature are selected, and the input case is obtained based on the word theme feature The text subject characteristic information of text;The step S222 includes:Obtain each candidate key Context relation between word, the word theme based on each candidate keywords of the context relation amendment Feature, and based on each revised candidate keywords word theme feature and the text The matching degree of theme feature information, determines the keyword relational information of the input case text;Institute Stating step S223 includes:Based on the keyword relational information, the input case text is updated Text subject characteristic information and obtain expansion word relevant information, and based on the keyword relational information and The expansion word relevant information sets up the bag of words characteristic information of the input case text, and based on institute more New text subject characteristic information and the bag of words characteristic information, determines the text of the input case text Eigen vector.
In the embodiment of the present application, it is defeated in real time that law court's Intranet in Court business scene mainly completes user Enter the Text eigenvector of case text.Based on the input case text in the step S221 Each word and whole keywords of all judgement documents be compared, with from the input case Candidate keywords and its word theme feature are extracted in example text sheet.For example, the method in Court business scene Institute's Intranet excavates the input case text key word inputted online provided with a hypothesis:Online input The input case text keyword, it is necessary to be also the keyword of existing judgement document.Therefore, The module inquired about in the published judgement document of magnanimity with input case text have phase accomplice by The content identical keyword of all being disputed on party's demand content and party of judgement document, and and Input case text word takes common factor, as the candidate keywords of the input case text inputted online, In being effectively guaranteed the keyword that goes out of input case text selecting and being all published judgement document Keyword, so as to excavate the judge text similar to input case text in existing judgement document Book and its corresponding Text eigenvector and feature, from all keywords in published judgement document The middle candidate keywords for determining input case text cause on the basis of the judgement document of processing magnanimity Simplify the calculating logic of input case text.
Specifically, the input case is obtained based on the word theme feature in the step S221 The text subject feature of example text sheet, the text subject of the judgement document is characterized as the judgement document's Case type, extracts input case example text preferably by topic model method in the embodiment of the present application The word theme feature of this text subject feature and each word, wherein the topic model method with Agent model method of the prior art is consistent.Certainly, other existing or being likely to occur from now on are carried Take the text subject feature in judgement document and the method for the word theme feature of each word for example applicable In the application, it should also be included within the application protection domain, and be incorporated herein by reference herein.
Specifically, in the step S222, first between candidate keywords described in acquisition any two Context transfer probability;Based on the context words cooccurrence relation and the context transfer probability, Correct the word theme feature of each word;Based on each revised word word theme Feature and the matching for obtaining the special information of the text subject using topic model in the step S221 Degree, determines the keyword and its corresponding word theme feature of the judgement document, and obtain described The importance information of keyword.For example, for i-th of candidate key in input case text Ds Word Wi, if making the corresponding theme topic of the candidate keywords be Tj, according to topic model method Understand that the transition probability that candidate keywords Wi occurs in input case text Ds is: Pj (Wi | Ds)=P (Wi | Tj) × P (Tj | Ds);Wherein, P (Wi | Tj) under a theme Tj word Wi turn Probability is moved, P (Tj | Ds) is the transition probability of the theme Tj in a judgement document Ds, is then enumerated one by one The theme topic of candidate keywords, obtains all transition probability Pj (Wi | Ds), wherein j values It is input case text according to all transition probabilities of gained for 1 to k natural positive integer I-th of candidate keywords Wi in Ds selects a theme topic, wherein, the side most simply commonly used Method is to take the theme Tj for making Pj (Wi | Ds) value maximum, i.e. max [j] Pj (Wi | Ds);If then input case I-th of candidate keywords Wi in this Ds of example text have selected one and in step S221 at this moment The different theme topic of the word theme feature of acquisition, will turning to the word under given theme The transition probability for moving probability and each theme in input case text is impacted accordingly, by institute State the transition probability of the word under given theme and turning for each theme in input case text Move influence candidate keywords Wi transfers for occurring in case text Ds is inputted that probability again can be in turn The calculating of probability, therefore the transition probability Pj (Wi | Ds) calculating is carried out once to input case text, And reselect the word theme topic of word and regard an iteration as.So enter according to the above method After n loop iteration of row, the corresponding time of word theme feature inputted after the convergence of case text is obtained It is the keyword of input case text to select keyword, and the corresponding word theme feature of the keyword is Determined after iteration, the keyword determined by the method in above-described embodiment more can be effectively accurate Really express the keyword of the input case text and its word feature of keyword so that based on pass The text subject characteristic information that keyword is obtained more can be close with inputting the case type of case text, more The particular content of the expression input case text of the energy degree of accuracy, so that by inputting case text The similarity for the similar judgement document that text subject characteristic information is found is higher, improves lookup similar Judgement document accuracy.
In embodiments herein, based on the keyword and its correspondence institute in the step S223 Predicate language theme feature, updates the text subject characteristic information of the input case text.For example, logical Below equation is crossed to update the text subject characteristic information of input case text:
Wherein D represents the text subject characteristic information after updating, and text includes n keyword, wiIt is importance information of i-th of keyword in input case text, IiIt is keyword wiWord Theme feature, by the word theme feature weighted sum of the keyword in the input case text to more than, Obtain inputting the text subject characteristic information of case text, can effectively remove in input case text Unessential keyword and the influence to building text subject characteristic information.
Specifically, the step S223 is related to the expansion word based on the keyword relational information Information is set up in described input this paper bag of words characteristic information, wherein, the expansion word bag of the keyword Include the synonym of keyword and the word of the height correlation in the input case text.In the step It is synonymous to excavate first by calculating the theme feature similarity of any two keyword in rapid S223 Word.For example, for keyword A, taking several words of similarity highest, being used as keyword A's Synonym.Wherein, key is calculated by excavating the word algorithm (word2vector) of height correlation The word of the height correlation of word, the algorithm calculates term vector to each word, then calculates any two The term vector similarity of individual word, to excavate the word of height correlation.For example, for keyword A, Several words of term vector similarity highest are taken, the word of keyword A height correlation is used as.Connect Synonym and its synonym feature based on the keyword and inputted described in case text The word and its related term feature of height correlation, obtain the expansion word correlation letter of the input case text Breath, based on the keyword relational information and the expansion word relevant information, using bag of words, builds The bag of words characteristic information of the vertical input case text.
In the embodiment of the present application, the bag of words characteristic information is used to indicate the pass in input case text Keyword and its corresponding word feature of expansion word.In bag of words characteristic information, the feature of keyword feature Value is importance information of the keyword in input case text, and the characteristic value of synonym feature is crucial The product of word importance information and synonymous degree, the characteristic value of correlation word feature is keyword importance The product of information and degree of correlation.For example, it is assumed that one having 100,000 not in the input case text Same word, then the bag of words characteristic information of input case text is all the vectors of 100,000 dimensions, is often tieed up Vector marks whether the word of the position occurs in input case text.For example, it is assumed that word word1 It is the 1st dimension in bag of words characteristic information, word word2 is the 2nd dimension in bag of words characteristic information, word Language word3 is the 10th dimension in bag of words characteristic information, during word word4 is bag of words characteristic information 30th dimension, word3 and word1 similar word each other, similarity is weight13, word4 and Word2 similar words each other, similarity is weight24;Wherein judgement document A includes word word1, Word3 and weight4, and their importances in A are respectively weight1, weight3, Weight4, then judgement document A bag of words characteristic information the 1st dimension characteristic value be Weight1+weight13*weight3, the characteristic value of the 2nd dimension is weight24*weight4, the 10th The characteristic value of dimension is weight3+weight1*weight13, and the characteristic value of the 30th dimension is weight4. Wherein, the word feature of the word of the height correlation of keyword can also be obtained by above computational methods Characteristic value, therefore the characteristic value of information includes the word theme feature of keyword in the bag of words feature of gained The corresponding feature of the word theme feature of corresponding characteristic value and synonym and the word of height correlation Value.
In embodiments herein, the step S223 is based on the text subject characteristic information updated With the bag of words characteristic information, in the Text eigenvector for determining the input case text, specifically, The updated text subject characteristic information and the bag of words characteristic information are merged, institute is determined State the urtext feature of input case text;It is special by the urtext to the input case text Progress feature normalizing is levied, the Text eigenvector of the input case text is determined.
For example, the text subject feature of the input case text obtained in the step S223 is believed Breath and bag of words characteristic information are spliced into a characteristic vector, and the urtext of generation input case text is special Levy.For example, the text subject characteristic information of input case text is the characteristic vector of one 10 dimension, Bag of words characteristic information is the characteristic vector of one 100 dimension, then has the urtext of input case text special Levy as the characteristic vector of one 110 dimension.The feature normalization method for recycling machine learning field conventional, Feature normalizing, the Text eigenvector of generation input case text are carried out to urtext feature.For example, Assuming that the same feature of input case text meets normal distribution, therefore can be by per dimensional feature normalizing Into the normal distribution of standard.
In embodiments herein, the step S24 is based in the step S23 from the retrieval In database obtain with it is described input case text have phase accomplice by some candidates judgement document, Calculate the Text eigenvector of the judgement document of the candidate and the text spy of the input case text The similarity of vector is levied, similar judgement document is chosen based on the similarity.
It should be noted that calculating the algorithm of the similarity of Text eigenvector in the step S24 Including but not limited to Euclidean distance algorithm and cosine similarity algorithm etc..Certainly, other existing or the presents The algorithm of the similarity for the calculating Text eigenvector being likely to occur afterwards is such as applicable to the application, should also wrap It is contained within the application protection domain, and is incorporated herein by reference herein.
For example, the case type and case case of the input case text inputted first according to user are by looking into Ask same case type and case case by the existing judgement document of whole as the similar judge's text of candidate Book, then retrieves the Text eigenvector of the similar judgement document of candidate.Then above-mentioned calculating text is used The algorithm (Euclidean distance algorithm or cosine similarity algorithm) of characteristic vector similarity, calculates input The similarity for inputting case text judgement document similar with each candidate.Then, it is defeated according to user The number N of the similar judgement document of the demand entered, takes the N number of judgement document's conduct of similarity highest Final required similar judgement document.Then the text structureization letter of similar judgement document is inquired about Breath and judge's relevant information, and feed back to the user that demand obtains similar judgement document.Finally count The court verdict of similar judgement document, by principal penalty, accessary penalty, indemnity, party's victory or defeat etc. The dimension of text feature, in visual form, shows demand to obtain the use of similar judgement document Family.Specifically, for example, the case type and case case of the input case text inputted according to user by, Inquire about same case type and case case by the existing judgement document of whole as candidate judge text Book has 100, of the judgement document for the candidate similar to input case text that user's request is returned Number is 10, then the Text eigenvector for inputting case text is distinguished by above-mentioned similarity algorithm Similarity Measure is carried out with the Text eigenvector of the judgement document of 100 candidates, and calculating is obtained Similarity by from low to high order arrangement, take the judgement document of 10 candidates of similarity highest As similar judgement document, and by the text structure information of 10 similar judgement documents The user for the judgement document that to need acquisition similar is fed back to judge's relevant information.
Further, one kind of the one side of described the application is used for the similar text of the second equipment end excavation This method also includes receiving transmitted by first equipment carries out structuring by the judgement document Text structure information obtained by processing after structuring;Obtain the text of the similar judgement document Structured message.For example, after by the Similarity Measure to the judgement document of candidate, by obtaining What is had meets the text structure information of the similar judgement document of quantity required.
Fig. 5 is shown according to a kind of based on the similar judgement document's of big data excavation of the application one side Holistic approach schematic flow sheet.Methods described include step S501, step S502, step S503, Step S504, step S505, step S506, step S507, step S508, step S509, Step S510 and step S511.
Wherein, the step S501 includes:Obtain magnanimity judgement document;The step S502 includes: Text Pretreatment and structuring processing are carried out to the magnanimity judgement document;The step S503 includes: Excavate the text subject characteristic information of judgement document;The step S504:Excavate magnanimity judgement document's The feature dictionary of keyword relational information and foundation on keyword;The step S505 includes:It is raw Into the Text eigenvector of judgement document;The step S506 includes:Text is judged described in on-line storage The Text eigenvector and feature dictionary of book;The step S507 includes:Obtain input case text; The step S508 includes:The text subject characteristic information and key of online mining input case text Word relevant information;The step S509 includes:Online mining input case text text feature to Amount;The step S510 includes:If online retrieving with input case text have phase accomplice by The judgement document of dry candidate, and calculate the Text eigenvector of the judgement document of the candidate and described defeated Enter the similarity of the Text eigenvector of case text;The step S511 includes:Obtain similar Judgement document.
In embodiments herein, phase is excavated based on the published judgement document of magnanimity in Court business scene As judgement document demand, by having obtained magnanimity after law court authorizes in step S501 described first Disclosed judgement document, and described judgement document's progress Text Pretreatment is made in the step S502 Obtain judgement document to be converted into that the form of text mining can be carried out, while to judge's text after Text Pretreatment Book carries out structuring processing and obtains text structure information, then passes through existing skill in the step S503 The text subject characteristic information that topic model method in art excavates judgement document to give expression to judge's text Specific judgement document's merit of book.Because judgement document quantity is on the increase with timely in Court business Between Court business scene in business it is busy so that use traditional artificial or natural language processing Taken time and effort the judgement document that excavates similar, and the word in the published judgement document of magnanimity it is many and Content is complicated, determines that the key element of similar judgement document is all hidden in the word of big section, therefore the application Selected in the step S504 by based on input case text have phase accomplice by some candidates Judgement document carry out party's demand content and party dispute content identical word excavated, The keyword relational information of the judgement document of candidate is obtained, and the keyword relational information is expressed as Whether the form of Text eigenvector more aspect quickly calculates judgement document with inputting case text phase Seemingly, while will be disputed on content identical word phase with party's demand content of judgement document and party The word of pass is used as the expansion word of the judgement document of candidate, and all keyword phases based on judgement document Close information and expansion word relevant information sets up feature dictionary;Then in the step S505, it is based on The text subject characteristic information and bag of words that the keyword relational information of the judgement document of candidate is updated are special Reference ceases the Text eigenvector for obtaining judgement document, wherein the characteristic value in the Text eigenvector As the word theme feature corresponding to keyword eigenvalue cluster into, per dimensional feature vector represent judge text The vector of the same character representation of book;And then in the step S506, by all judge's texts The Text eigenvector and feature dictionary of book all send into the searching database of the second equipment end progress On-line storage, in case rapidly searching the similar judgement document of the input case text of input;Then The input case text for requiring to look up similar judgement document is obtained in the step S507 again;Then By means of the key of all judgement documents sended in the first equipment end in the step S508 Word relevant information inputs the text subject characteristic information and keyword relational information of case text to excavate; And in the step S509 text subject characteristic information of the input case text based on acquisition and pass The text subject characteristic information and bag of words that keyword relevant information obtains inputting after the renewal of case text are special Reference is ceased, and the text subject characteristic information and bag of words characteristic information of the input case text are carried out Merge the Text eigenvector for obtaining inputting case text;Then second in the step S510 Equipment end online retrieving with input case text have phase accomplice by some candidates judgement document, example As case case is found out as the existing judgement document of whole as case type, and count respectively Calculate the Text eigenvector of the Text eigenvector and input case text of the judgement document of these candidates Similarity, and by the size of similarity according to sorting from high to low;Finally in the step S511 The middle quantity according to similar judgement document the need for input, will be similar in the step S510 The judgement document of the corresponding candidate of the identical quantity of degree sequence highest is used as the similar sanction for needing to obtain Sentence document.
In Court business scene, it is necessary to by relatively multiple judges in trial merit similar, party Court verdict during the close different cases of demand content, it is whether reasonable with the court verdict for the judge that audits; Meanwhile, judge can also refer to the judgement of the similar existing case of merit during actual trial-case As a result, the fact that form final identification and court verdict, therefore in these numerous and diverse Court business scenes In, the judgement document for being required for prior or real-time excavation similar to input case text.But due to every The content of individual case varies, and the growing number tried under Court business scene and rapid growth, Therefore it has been difficult to meet the demand in Court business scene by traditional manual sorting means, therefore at this The magnanimity in law court's business scenario is handled in the embodiment of application by using equipment as shown in Figure 5 Published judgement document, and the Text eigenvector of judgement document is excavated, to allow to rapidly Find out the similar judgement document of input case text.
Fig. 6 shows to be used to excavate similar judge's text based on big data according to one kind of the application one side The structural representation of first equipment of book.It is special that the equipment 1 includes judgement document's acquisition device 11, text Levy excavating gear 12 and feature sets up device 13.
Wherein, judgement document's acquisition device 11 is used to obtain the published judgement document of magnanimity, And obtain the case of each judgement document by;The text feature excavating gear 12 is used for based on every The content of text of judgement document described in one obtains the text subject feature on judgement document's merit Information and some passes on party's dispute content and party's demand content in the judgement document Keyword relevant information, and based on the text subject characteristic information and some keyword relational informations Set up the Text eigenvector of the judgement document;The feature dictionary, which sets up device 13, to be used to be based on Some keyword relational informations update the feature dictionary on keyword.
Here, the equipment 1 includes but is not limited to user equipment or user equipment passes through with the network equipment Network is integrated constituted equipment.The user equipment its include but is not limited to any one can be with user The mobile electronic product of man-machine interaction is carried out by touch pad, it is described to move such as smart mobile phone, PDA Dynamic electronic product can use any operating system, such as android operating systems, iOS operating systems. Wherein, the network equipment can enter line number automatically including a kind of according to the instruction for being previously set or storing Value calculates the electronic equipment with information processing, and its hardware includes but is not limited to microprocessor, special integrated electricity Road (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..Institute State network include but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, it is wireless from Organize network (Ad Hoc networks) etc..Preferably, the equipment 1, which can also be, can use cloud meter The cloud computing server that calculation means processing big data is calculated, is used as the first equipment using cloud computing server below Preferred embodiment for the one side of the application is carried out in detail to the similar judgement document of excavation based on big data It is thin to explain.Certainly, those skilled in the art will be understood that the said equipment 1 is only for example, and other are existing Or the equipment 1 being likely to occur from now on is such as applicable to the application, also should be included in the application protection domain with It is interior, and be incorporated herein by reference herein.
Constantly worked between above-mentioned each device, here, it will be understood by those skilled in the art that " lasting " Refer to above-mentioned each device respectively in real time or according to the mode of operation requirement of setting or real-time adjustment.
Here, judgement document that the judgement document includes but is not limited in Court business scene etc., bag Include Court of First Instance assert true document, Court of Second Instance assert true document, Court of Retrial assert true document, The bill of complaint, billof defence, inquiry record and testimony of witnesses etc..
Taken below with the cloud computing that cloud computing means can be used to handle big data calculating in Court business scene The first equipment that business device is excavated to judgement document is the preferred embodiment of the one side of the application to this Application carries out explaining in detail for specific embodiment.Certainly, herein using the excavation sea in Court business scene The cloud computing server of published judgement document is measured as the first equipment specific embodiment is carried out to the application Explain in detail, purpose only by way of example, embodiments herein not limited to this, others it is soft Following embodiments can be equally realized in part program.
It should be noted that the text subject characteristic information includes but is not limited in Court business scene On judgement document's merit in judgement document, the keyword includes but is not limited in Court business scene Party dispute content and party's demand content etc. in judgement document, below with working as in judgement document Thing people dispute content and party's demand content are for the keyword of judgement document and on judgement document's merit The judgement document is entered for the preferred embodiment that text subject characteristic information is the application one side The excavation of row Text eigenvector.
In embodiments herein, judgement document's acquisition device 11 is published for obtaining magnanimity Judgement document, and obtain the case of each judgement document by;Due to examining in Court business scene Sentence business to carry out stage by stage, input case text can going deep into process of trial, its content very may be used It is able to can change a lot.Therefore need that suits measures to local conditions trying flow each stage, be to excavating The suitable data of system input, can make the similar cases that each stage excavates meet actual business requirement.Institute With need to continue in the text feature excavating gear 12 stage by stage to the published judge of magnanimity Document carries out the excavation of the similar judgement document based on big data, and therefrom extracts on the sanction Sentence the text subject characteristic information of document merit and on party's dispute content in the judgement document With some keyword relational informations of party's demand content, and based on the text subject characteristic information The Text eigenvector of the judgement document is set up with some keyword relational informations.For example, Cloud computing server is deposited published whole judgement document in Court business scene using Internet of Things network Storage is got up, in case cloud computing server passes through offline feature in the text feature excavating gear 12 Work fully carries out text feature with the published judgement document of powerful calculating ability of cloud computing, and Therefrom excavate the Text eigenvector of judgement document and set up in the feature dictionary in device 13 and dug The feature dictionary of all judgement documents, and the network special line in by Court business scene are dug, once Property is transferred in the on-line memory in law court's Intranet.
Further, one kind of the one side of described the application is used to excavate similar cut out based on big data Sentencing the first equipment of document also includes:14 (not shown) are put in text structure makeup, for being cut out described Sentence document and carry out structuring processing, obtain the text structure information after structuring;In judge's text After book acquisition device 11 and before the text feature excavating gear 12, the text structure Device carries out the judgement document in each stage in the trial business in the Court business scene of acquisition Structuring is handled, and by the (not shown) of text structure information acquisition device 15, for based on institute State judge's relevant information of judgement document described in text structure acquisition of information, judge's relevant information Including party's information, case type, case by and court verdict.
It should be noted that being obtained out in the (not shown) of text structure information acquisition device 15 Judgement document judge's relevant information in case type include but is not limited to criminal suit, it is civil to tell Please, administrative litigation, IP dispute, written verdict compensates case, performs the big judge of case etc. 7 Document type and each stage of law court's trial.Each stage such as Fig. 7 of wherein described law court's trial It is shown.Certainly, other text subject features of judgement document that are existing or being likely to occur from now on can such as be fitted For the application, it should also be included within the application protection domain, and be incorporated herein by reference herein.
Fig. 7 shows to be used for excavation of first equipment based on big data according to a kind of of the application one side The law court of similar judgement document examines the schematic flow sheet for respectively sentencing the stage.Wherein, cloud computing server is based on The equipment for excavating similar judgement document of cloud computing, tries flow according to people's court, sets stage by stage The content of text of the corresponding judgement document excavated the need for counting each stage;People's court is considered simultaneously The network characteristicses and security requirements of system, using in cloud computing server in Court business scene Judgement document is excavated to meet the business need in Court business scene stage by stage in trial business Ask.
Here, the cloud computing server of the application is needed in Court business scene to be processed as shown in Figure 7 Trial flow include:Put on record stage S71, try stage S72, judgement of first trial stage S73, Second instance judgement stage S74, reconvict stage S75 and court verdict perform stage S76.Wherein, exist The stage S71 that puts on record is received after the pleadings of suitor and the billof defence of defendant for people's court, and Determine putting on record the stage after putting on record;The stage S72 that tries tries the stage for people's court; Judgement of first trial stage S73 is people's court's judgement of first trial stage;Second instance judgement stage S74 Wound up the case the stage for people's court's second trial;The stage S75 that reconvicts reviews for people's court to wind up the case the stage; The court verdict performs the last judgement knot made for people's court with regard to this trial case in stage S76 Fruit performs the stage.In first five described stage, judicial functionary, which has, excavates similar judgement document Demand.
Wherein, the judgement document for needing excavation similar in each process of adjudication in the figure 7 counts accordingly According to as follows respectively.In the stage S71 that puts on record it is corresponding it is related to judgement document have the bill of complaint and Billof defence;It is described try in stage S72 it is corresponding it is related to judgement document have the bill of complaint, Billof defence, inquiry record and testimony of witnesses;It is corresponding with judging in judgement of first trial stage S73 Document it is related to have Court of First Instance to assert true;It is corresponding with cutting out in second instance judgement stage S74 That sentences document correlation has above-mentioned shape and Court of Second Instance to assert the fact;It is right in the stage S75 that reconvicts Answer related to judgement document to have Court of Retrial to assert true.Wherein, the bill of complaint is used to indicate Plaintiff presents the charging document of Court of First Instance;The billof defence is used to indicate that Court of First Instance has received After pleadings, it is desirable to the reply content that defendant provides;The record of addressing inquires to is used to indicate that people's court opens a court session During trial, plaintiff agent is to the inquiry content of defendant and defendant's reply content, and defendant's is to original The inquiry content of announcement and the reply content of plaintiff;The testimony of witnesses is used to indicate that people's court tries rank Section, the testimony of witnesses of party, and inquiry of the former defendant's to adverse witness are recorded;It is described Court of First Instance assert it is true be used to indicate Court of First Instance after investigation and trial, the fact that identification content; The petition for appeal is used for after indicating to adjudiacate in the first instance, the second trial bill of complaint for the one party not agreed with a decision; Assert true for indicating the fact that second trial or Court of Retrial are assert in the second trial/Court of Retrial.
Excavate similar judgement document's the cloud computing server in Court business scene is determined , need to be to obtaining some judge's texts in judgement document's acquisition device 11 after the data text in each stage Book carries out the extraction of the Text eigenvector of correlation in the text feature excavating gear 12, specifically Ground, all devices that the text feature excavating gear 12 includes are as shown in Figure 8.
Wherein, Fig. 8 shows to be taken for cloud computing according to a preferred embodiment of the application one side Business device excavates the structural representation of the text feature excavating gear 12 of the Text eigenvector of judgement document. The text feature excavating gear 12 includes first and excavates unit 121, second excavating unit 122, the Three excavate unit 123 and generation unit 124.Wherein described first excavation unit 121 is used to extract institute State the text subject characteristic information of judgement document and the word theme of each word in the judgement document Feature;The second excavation unit 122 is used to obtain the context relation between each described word, base In the word theme feature of each word of the context relation amendment, and based on institute it is revised each The matching degree of the word theme feature of the word and the text subject characteristic information, is determined some The keyword relational information of the judgement document, wherein, the keyword relational information include keyword, Keyword importance information and the corresponding word theme feature of keyword;Described 3rd excavates unit 123 For based on the keyword relational information, updating the text subject characteristic information of the judgement document; The generation unit 124 is used to obtain expansion word relevant information, institute based on the keyword relational information Expansion word and the expansion word degree of correlation of the expansion word relevant information including the keyword are stated, and based on described Keyword relational information and the expansion word relevant information set up bag of words characteristic information, and are based on being updated Text subject characteristic information and the bag of words characteristic information, determine the text feature of the judgement document Vector.
Specifically, the described first text subject characteristic information for excavating the judgement document in unit 121 has Body is used to indicate the merit in the judgement document, in the embodiment of the present application preferably by theme mould Type method come extract acquisition judgement document text subject characteristic information and each word word master Feature is inscribed, wherein the topic model method is consistent with agent model method of the prior art.Certainly, Other it is existing or be likely to occur from now on extract judgement document in text subject characteristic informations and each The method of the word theme feature of word is such as applicable to the application, should also be included in the application protection domain Within, and be incorporated herein by reference herein.
Further, the second excavation unit 122 is used to obtain the context between each described word Word cooccurrence relation;Obtain the context transfer probability between word described in any two;On described Hereafter word cooccurrence relation and the context transfer probability, correct the word theme feature of each word; Based on each revised word word theme feature and the text subject characteristic information Matching degree, determine the keyword and its corresponding word theme feature of some judgement documents, And obtain the importance information of the keyword.
In the embodiment of the present application, described second unit 122 is excavated based in the described first excavation unit The text subject characteristic information of the judgement document extracted in 121 and the word theme feature of each word, According to the word theme feature of context relation amendment each word between each described word, and it is based on The matching of the word theme feature of each revised word and the text subject characteristic information Degree, so that it is determined that the keyword of some judgement documents and its corresponding word theme feature, and Obtain the importance information of the keyword.The keyword of judgement document is specifically determined herein and its right The word theme feature answered, and obtain the keyword importance information specific embodiment with it is above-mentioned The step 122 in specific embodiment it is corresponding, here is omitted.
In embodiments herein, the described 3rd excavates unit 123 is excavating unit based on second The keyword relational information determined in 122, updates the text subject characteristic information of the judgement document. For example, updating the text subject characteristic information of judgement document by below equation:
Wherein D represents the text subject characteristic information after updating, and text includes n keyword, wiIt is importance of i-th of keyword in judgement document, IiIt is keyword wiWord theme feature, By the word theme feature weighted sum to the keyword in above judgement document, obtain judgement document's Text subject characteristic information, can effectively remove in judgement document unessential word and to building text The influence of this theme feature information.
Further, the described 3rd unit 124 is excavated based on keyword relational information acquisition extension Word relevant information, the expansion word that the expansion word relevant information includes the keyword is related to expansion word Degree.Wherein described keyword includes the synonym and the keyword of the keyword in judge's text The word of height correlation in book.In embodiments herein, by the master for calculating any two word Characteristic similarity is inscribed, to excavate synonym.For example, for keyword A, if taking similarity highest Dry word, is used as keyword A synonym.Wherein, by excavating the word algorithm of height correlation (word2vector) come the word of the height correlation that calculates keyword, the algorithm is to each word meter Term vector is calculated, the term vector similarity of any two word is then calculated, to excavate the word of height correlation Language.For example, for keyword A, taking several words of term vector similarity highest, being used as key The word of word A height correlation.
Further, the generation unit 124 is based on the keyword and its corresponding word theme is special Levy, determine the expansion word and the expansion word degree of correlation of the keyword, wherein, the expansion word includes institute State the synonym of keyword and in the judgement document height correlation correlation word;Closed based on described Keyword and its corresponding word theme feature and the expansion word and the expansion word degree of correlation, utilize bag of words mould Type, sets up bag of words characteristic information.
In the embodiment of the present application, the bag of words characteristic information is used to indicate the keyword in judgement document And its corresponding word feature of expansion word.In bag of words characteristic information, the characteristic value of keyword feature is Importance information of the keyword in judgement document, the characteristic value of synonym feature is keyword importance The product of information and synonymous degree, the characteristic value of correlation word feature is keyword importance information and phase The product of pass degree.For example, it is assumed that one 100,000 different words are had in all judgement documents, that The bag of words feature of each judgement document is the vector of 100,000 dimensions, and the position is marked per dimensional vector Whether word occurs in judgement document.For example, it is assumed that during word word1 is bag of words characteristic information 1st dimension, word word2 is the 2nd dimension in bag of words characteristic information, and word word3 is bag of words feature The 10th dimension in information, word word4 is the 30th dimension in bag of words characteristic information, word3 and word1 Similar word each other, similarity is weight13, word4 and word2 similar word each other, similar Spend for weight24;Wherein judgement document A includes word word1, word3 and weight4, and And their importance information in A are respectively weight1, weight3, weight4, then Cai Panwen The characteristic value that book A bag of words feature the 1st is tieed up is weight1+weight13*weight3, the 2nd dimension Characteristic value is weight24*weight4, and the characteristic value of the 10th dimension is weight3+weight1*weight13, The characteristic value of 30th dimension is weight4.Wherein, keyword can also be obtained by above computational methods Height correlation word word feature characteristic value, therefore the feature in the bag of words characteristic information of gained Value includes the corresponding characteristic value of word theme feature of keyword and the word theme feature of expansion word Corresponding characteristic value.
Further, the generation unit 124 is by the updated text subject characteristic information and institute Predicate bag characteristic information is merged, and determines the urtext feature of the judgement document;By to institute The urtext feature for stating judgement document carries out feature normalizing, determines the text feature of the judgement document Vector.Specifically, the sanction that the generation unit 124 will be obtained in the described 3rd excavation unit 123 The text subject characteristic information and bag of words characteristic information for sentencing document are spliced into a characteristic vector, and generation is cut out Sentence the urtext feature of document, wherein, specifically generate the specific reality of the urtext of judgement document Apply example corresponding with the embodiment in step S124 described above, here is omitted.
Further, the feature dictionary sets up device 13 using the keyword as index, to each The word theme feature and expansion word of the keyword set up the feature dictionary on keyword.Example Such as, in Court business scene, party's demand content in judgement document and party are disputed on interior Hold identical word as the keyword of extraction judgement document, and told based on keyword lookup with party Ask word and the related all words of party's dispute point word as the expansion word of keyword to cutting out Sentence document and carry out feature extraction, obtain the keyword relational information and expansion word relevant information of judgement document The feature dictionary of foundation.
Further, one kind of the one side of described the application is used to excavate similar cut out based on big data Sentencing the first equipment of document also includes the (not shown) of dispensing device 16, for all judges are literary The Text eigenvector of book, the feature dictionary and judge's relevant information are sent to the second equipment In searching database.For example, in Court business scene, will be in the text feature excavating gear 12 The Text eigenvector of the judgement document of middle acquisition, sets up in device 13 in the feature dictionary and obtains Judgement document feature dictionary and obtained in the (not shown) of dispensing device 14 judge text The text structure information and text type of book are sent to the second equipment, so that the second equipment is relying on the Feature dictionary and simplified calculating logic that one equipment is calculated, it is ensured that the first equipment and the second equipment pin To identical Text eigenvector and feature dictionary can be exported with a judgement document;People is considered simultaneously The network characteristicses and security requirements of people's court system, using in cloud computing server to Court business Stage by stage judgement document is excavated to meet in Court business scene in trial business in scape Business demand.
Because in Court business scene, the Text eigenvector of input case text in a review is all stored In law court's intranet server, in addition to the judgement document after disclosed, in other Court business systems Input case text in a review can not flow out law court's intranet server, in order to meet Court business The confidentiality requirement of the relevant information to inputting case text in scape, present applicant proposes such as Fig. 9 institutes The equipment shown, to meet the requirement of the confidentiality to the input case text in Court business scene, from And improve the real-time of processing input case text.
Fig. 9 shows to be used to excavate similar judge's text based on big data according to one kind of the application one side The structural representation of second equipment of book.It is special that the equipment 2 includes input unit 21, input case text Levy excavating gear 22, candidate judgement document acquisition device 23 and similar judgement document's acquisition device 24.
Wherein, the input unit 21 is used to obtain input case text, based in searching database On the feature dictionary of keyword, some candidate keywords of the input case text are extracted;It is described Input case text feature excavating gear 22 be used for content of text based on the input case text and Some candidate keywords obtain the text subject characteristic information for inputting case text and some Keyword relational information, and believe based on the text subject characteristic information is related to some keywords Breath sets up the Text eigenvector of the input case text;The candidate judgement document acquisition device 23 For from the searching database obtain with it is described input case text have phase accomplice by it is some The judgement document of candidate;Similar judgement document's acquisition device 24 is used for the sanction for calculating the candidate Sentence the similarity of the Text eigenvector of document and the Text eigenvector of the input case text, base Similar judgement document is chosen in the similarity.
Here, the equipment 2 includes but is not limited to user equipment or user equipment passes through with the network equipment Network is integrated constituted equipment.The user equipment its include but is not limited to any one can be with user The mobile electronic product of man-machine interaction is carried out by touch pad, it is described to move such as smart mobile phone, PDA Dynamic electronic product can use any operating system, such as android operating systems, iOS operating systems. Wherein, the network equipment can enter line number automatically including a kind of according to the instruction for being previously set or storing Value calculates the electronic equipment with information processing, and its hardware includes but is not limited to microprocessor, special integrated electricity Road (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..Institute State network include but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, it is wireless from Organize network (Ad Hoc networks) etc..Preferably, the equipment 2, which can also be, can use cloud computing Law court's intranet server of the simple calculating logic of onlineization of offline feature chemical industry tool in server is made For the second equipment of the application one side, below using law court's intranet server as the second equipment as this Shen The similar judgement document of excavation based on big data is explained in detail the preferred embodiment of one side please. Certainly, those skilled in the art will be understood that the said equipment 2 is only for example, and other are existing or from now on may be used The equipment 2 that can occur such as is applicable to the application, should also be included within the application protection domain, and This is incorporated herein by reference.
Constantly worked between above-mentioned each device, here, it will be understood by those skilled in the art that " lasting " Refer to above-mentioned each device respectively in real time or according to the mode of operation requirement of setting or real-time adjustment.
It should be noted that in the preferred embodiment of the application, at the cloud computing service end of equipment 1 Reason be excavate in the published judgement document of magnanimity in Court business scene with the equipment 2 The corresponding similar judgement document of input case text of input, i.e., what is carried out in cloud computing server is The excavation that judgement document published to magnanimity is carried out, and the law court's intranet server of the equipment 2 is to pass through Online characterization instrument only need to simply be calculated an input case text of input, by cloud computing The feature dictionary of offline feature chemical industry tool output as characterizing online in law court's intranet server in server The input of instrument, so that the online calculating logic in law court's intranet server is simplified, to ensure with portion Judgement document be input to after two instruments export identical Text eigenvector, feature dictionary and Structured message.Certainly, judge's text that the cloud computing server will have output by offline feature chemical industry The correlated characteristic of book is disposably transferred in the on-line memory in law court's intranet server by network special line, So as to not only meet the excavation to the published similar judgement document of magnanimity, in turn ensure that in law court The confidentiality of confidential input case text in network server, and the phase that case text will be inputted As judgement document excavate, and similar judgement document is obtained, so as to be effectively improved law court The operating efficiency of the similar judgement document of excavation in business scenario.
It should be noted that it is described input case text include but is not limited to existing judgement document and Example text of trying a case this etc..Certainly, other input case texts that are existing or being likely to occur from now on are for example applicable In the application, it should also be included within the application protection domain, and be incorporated herein by reference herein.
In embodiments herein, second equipment also includes the (not shown) of reception device 25, The (not shown) of reception device 25 receives the public affairs acquired in first equipment from the first equipment The Text eigenvector, the feature dictionary and judge's relevant information of judgement document are opened, and is preserved Into the searching database, judge's relevant information includes party's information, case type, case By and court verdict.For example, searching database on-line storage in Court business scene Intranet is cut out Sentence the Text eigenvector, the feature dictionary and affiliated judge's relevant information of document.It is especially stored in The information of judgement document in reception device 25 includes following eight aspects:(1), every kind of judge's text Case type and case in book is by corresponding judgement document.Wherein, Key be case type and case by, Value is numbering of the judgement document in internal system.(2), the structured message of existing judgement document. Wherein, Key is numbering of the judgement document in internal system, and value is generated by structuring extraction module Text structure information.(3), the Text eigenvector of existing judgement document.Wherein, Key is Judgement document is in the numbering of internal system, and value is the Text eigenvector of text feature module generation. (4), whole keywords of existing judgement document.Wherein, Key is a constant, and value is to close Whole keywords of keyword topic module generation.(5), the word theme feature of each keyword.Its In, Key is keyword, and value is the keyword word theme feature of keyword subject module generation. (6), the synonym of each keyword.Wherein, Key is keyword, and value is the synonymous of keyword Word and its synonymous degree.(7), the related term of each keyword.Key is keyword, and value is to close The related term and its degree of correlation of keyword.(8), characteristic value mean variance of the judgement document per dimensional feature. Key is feature number, and value is the average and variance of characteristic value.
It should be noted that the input case that the text type includes but is not limited in Court business scene The case type of example text sheet, wherein the case type includes criminal suit, civil appeal, administration is told Dispute, IP dispute, written verdict, compensate case, perform case and in example of trying a case in careful rank Section.Certainly, other text types that are existing or being likely to occur from now on are such as applicable to the application, also should Within the application protection domain, and it is incorporated herein by reference herein.
Further, the input unit 21 obtains input case text, based in searching database On the feature dictionary of keyword, some candidate keywords of the input case text are extracted, specifically Ground, the input unit 21 includes obtaining input case text, based on the input case text Case is by if the feature dictionary on keyword from searching database extracts the input case text Dry candidate keywords.For example, being searched in the published judgement document of magnanimity in Court business scene The judgement document similar to the input case text, due to judgement document's case in Court business scene Feelings case is different by type, therefore for the ease of rapidly finding the judge text similar to input case text Book, then the case based on input case text is by the feature dictionary from searching database on keyword Middle extraction is used as some times of input case text with inputting the word that the word of case text mutually occurs simultaneously Select keyword, it can be ensured that the keyword that input case text mining comes out is present in searching database.
Further, the input case text feature excavating gear 22 inquires about law court's intranet server In the (not shown) of reception device 25 receive correlated characteristic dictionary, to input case text carry out exist Line characterizes the Text eigenvector for calculating and obtaining inputting case text, specific as shown in Figure 10.Figure 10 show to be used to excavate similar based on big data according to a preferred embodiment of the application one side The structure stream of input case text feature excavating gear 22 in law court's intranet server of judgement document Cheng Tu.The input case text feature excavating gear 22 includes the 4th and excavates the digging of unit the 221, the 5th Dig unit 222 and the 6th and excavate unit 223.
Wherein, the described 4th unit 221 is excavated for each word based on the input case text It is compared with whole keywords of all judgement documents, to be carried from the input case text Candidate keywords and its word theme feature are taken, and the input is obtained based on the word theme feature The text subject characteristic information of case text;The 5th excavation unit 222 is used to obtain described in each Context relation between candidate keywords, based on the context relation amendment each candidate keywords Word theme feature, and based on each revised candidate keywords word theme feature and The matching degree of the text subject characteristic information, determines that the keyword of the input case text is related Information;Described 6th, which excavates unit 223, is used to be based on the keyword relational information, updates described defeated Enter the text subject characteristic information of case text and obtain expansion word relevant information, and based on the key Word relevant information and the expansion word relevant information set up described input this paper bag of words characteristic information, and Based on the text subject characteristic information and the bag of words characteristic information updated, the input case is determined The Text eigenvector of text.
In the embodiment of the present application, it is defeated in real time that law court's Intranet in Court business scene mainly completes user Enter the Text eigenvector of case text.Dug in the 4th excavation unit 221 of law court's intranet server The input case text key word inputted online is dug provided with a hypothesis:What is inputted online is described defeated Enter the keyword of case text, it is necessary to be also the keyword of existing judgement document.Therefore, the module In the published judgement document of magnanimity inquiry and input case text have phase accomplice by judge it is literary Book all with party's demand content and party dispute content identical keyword, and with input case Example text word takes common factor, as the candidate keywords of the input case text inputted online, effectively Ensure that the keyword that goes out of input case text selecting is all the keyword in existing judgement document, from And can be excavated in existing judgement document and the similar judgement document of input case text and its right The Text eigenvector and feature dictionary answered, from all keywords in published judgement document really Surely the candidate keywords of input case text cause on the basis of the processing published judgement document of magnanimity The calculating logic of upper simplified input case text.Specifically, the 4th excavating gear 221 excavates defeated Dug in the step S221 in the method and above-described embodiment of the text subject feature for entering case text The method of pick text subject feature thinks correspondence, and here is omitted.
Specifically, the 5th excavation unit 222 in law court's intranet server determines the input In the step S222 in the specific method and the above embodiments of the present application of the keyword of case text The method of description is corresponding, passes through the side corresponding with the step S222 in above-described embodiment Method determine keyword more can effectively and accurately express it is described input case text keyword and its The word feature of keyword so that the text subject characteristic information obtained based on keyword more can with it is defeated The case type for entering case text is close, more can the degree of accuracy expression input case text content of text, So that by inputting similar judge's text that the text subject characteristic information of case text is found The similarity of book is higher, improves the accuracy for searching similar judgement document.
It is related based on the keyword in the 6th excavating gear 223 in embodiments herein Information, updates the text subject characteristic information of the input case text and obtains expansion word relevant information. The text subject characteristic information and the application step described above of input case text are specifically updated herein The method of the text subject characteristic information of renewal input case text in rapid S223 embodiment is consistent, Here is omitted.Certainly, the synonym of the keyword of input case text is obtained herein and in input The word of height correlation in case text and the specific method of bag of words feature and step described above The synonym that keyword is obtained in S223 is consistent with the word of height correlation and the method for bag of words feature, Also repeat no more herein.
In embodiments herein, the 6th excavating gear 223 is by the updated text subject Characteristic information and the bag of words characteristic information are merged, and determine the original text of the input case text Eigen;Feature normalizing is carried out by the urtext feature to the input case text, institute is determined State the Text eigenvector of input case text.For example, defeated by what is obtained in the step S123 The text subject characteristic information and bag of words characteristic information for entering case text are spliced into a characteristic vector, raw Into the urtext feature of input case text.For example, the text subject feature letter of input case text Breath is the characteristic vector of one 10 dimension, and bag of words characteristic information is the characteristic vector of one 100 dimension, then The urtext for having input case text is characterized as the characteristic vector of one 110 dimension.Recycle engineering The conventional feature normalization method in habit field, feature normalizing, generation input are carried out to urtext feature The Text eigenvector of case text.For example, it is assumed that the same feature of input case text meets just State be distributed, therefore can by every dimensional feature normalizing into standard normal distribution.
In embodiments herein, similar judgement document's acquisition device 24 is based in the candidate Obtain and have with the input case text from the searching database in judgement document's acquisition device 23 Have phase accomplice by some candidates judgement document, calculate the text feature of the judgement document of the candidate The similarity of the Text eigenvector of case text is inputted described in vector sum, is chosen based on the similarity Similar judgement document.
It should be noted that calculating Text eigenvector in similar judgement document's acquisition device 24 The algorithm of similarity include but is not limited to Euclidean distance algorithm and cosine similarity algorithm etc..Certainly, Other it is existing or be likely to occur from now on calculating Text eigenvector similarity algorithm it is for example applicable In the application, it should also be included within the application protection domain, and be incorporated herein by reference herein.
For example, the case type and case case of the input case text inputted first according to user are by looking into Ask same case type and case case by the existing judgement document of whole as the similar judge's text of candidate Book, then retrieves the Text eigenvector of the similar judgement document of candidate.Then above-mentioned calculating text is used The algorithm (Euclidean distance algorithm or cosine similarity algorithm) of characteristic vector similarity, calculates input The similarity for inputting case text judgement document similar with each candidate.Then, it is defeated according to user The number N of the similar judgement document of the demand entered, takes the N number of judgement document's conduct of similarity highest Final required similar judgement document.Then the text structureization letter of similar judgement document is inquired about Breath and judge's relevant information, and feed back to the user that demand obtains similar judgement document.Finally count The court verdict of similar judgement document, by principal penalty, accessary penalty, indemnity, party's victory or defeat etc. The dimension of text feature, in visual form, shows demand to obtain the use of similar judgement document Family.Specifically, for example, the case type and case case of the input case text inputted according to user by, Inquire about same case type and case case by the existing judgement document of whole as candidate judge text Book has 100, of the judgement document for the candidate similar to input case text that user's request is returned Number is 10, then the Text eigenvector for inputting case text is distinguished by above-mentioned similarity algorithm Similarity Measure is carried out with the Text eigenvector of the judgement document of 100 candidates, and calculating is obtained Similarity by from low to high order arrangement, take the judgement document of 10 candidates of similarity highest As similar judgement document, and by the text structure information of 10 similar judgement documents The user for the judgement document that to need acquisition similar is fed back to judge's relevant information.
Further, one kind of the one side of described the application is used to excavate similar cut out based on big data Sentencing the second equipment of document also includes:Text structure information receiver, for receiving described first The text knot carried out the judgement document after the resulting structuring of structuring processing transmitted by equipment Structure information;Text structure information acquisition device, the text for obtaining the similar judgement document This structured message.For example, after by the Similarity Measure to the judgement document of candidate, will obtain The text structure information of all similar judgement documents for meeting quantity required.
Figure 11 shows that being based on big data according to one kind of the application one side excavates similar judgement document System schematic.The equipment includes cloud computing server 31 and law court's intranet server 32.Wherein, The cloud computing server 31 includes published judgement document's acquisition device 311, offline feature chemical industry Have device 312 and the Text eigenvector generating means 313 of published judgement document, the law court The input case text-obtaining mechanism that intranet server 32 includes on-line memory 321, inputted online 322nd, the online Text eigenvector generating means for characterizing tool device 323, inputting case text 324th, the similar sanction of online similar judgement document's calculating instrument device 325 and input case text Sentence document 326.
Wherein, the cloud computing server 31 and one kind of the application one side shown in Fig. 6 are used for The function that the first equipment of similar judgement document is excavated based on big data is consistent, law court's Intranet service One kind of device 32 and the application one side shown in Fig. 9 is used to judge based on big data excavations is similar The function of second equipment of document is consistent.It is succinct for description below, in the cloud computing server 31 Published judgement document's acquisition device 311 exchanged with judgement document's acquisition device 11 in Fig. 6 Use, the text feature of the offline feature tool device 312 and the published judgement document The used interchangeably of text feature excavating gear 12 in vector generator 313 and Fig. 6, it is described The on-line memory 321 in law court's intranet server 32 is obtained with the candidate judgement document in Fig. 9 The used interchangeably of device 23 is taken, in the input case text-obtaining mechanism 322 inputted online and Fig. 9 The used interchangeably of input unit 21, the online characterization tool device 323 and the input case The Text eigenvector generating means 324 of example text sheet are excavated with the input case text feature in Fig. 9 and filled Put 22 used interchangeablies, online similar judgement document's calculating instrument device 325 and input case Similar judgement document 326 judgement document's acquisition device 24 similar in Fig. 9 of text, which is exchanged, to be made With its substantive content is identical.
In embodiments herein, in the trial business of Court business scene, the cloud computing service Published judgement document's acquisition device 311 in device 31 is whole using Internet storage Disclosed judgement document;The offline feature tool device 312 is fully with the powerful meter of cloud computing Calculation ability, is characterized, and excavate the Feature Words on keyword to published institute judge's text Storehouse;The Text eigenvector generating means 313 of the published judgement document excavate published sanction Sentence the Text eigenvector of document and the feature dictionary on keyword, and pass through network special line, one Secondary property is transferred to the on-line memory 321 in law court's intranet server 31.Law court's intranet server On-line memory 321 in 32 store published judgement document Text eigenvector and on The feature dictionary of keyword;The input case text-obtaining mechanism 322 inputted online obtains input The related text content of case text;The online characterization tool device 323 inquiry on-line memory In published judgement document on the feature dictionary of keyword with obtain correlation feature dictionary, And characterization calculating is carried out to input case text, so that in the text feature of the input case text The Text eigenvector of input case text is formed in vector generator 324;It is described online similar Judgement document's calculating instrument device 325 inputs the input case text online and its corresponding text is special Vector is levied, on-line memory is inquired about, online retrieving has phase accomplice public by with input case text The Text eigenvector of the judgement document of some candidates opened, calculates the judge of the published candidate The similarity of the Text eigenvector of the Text eigenvector of document and the input case text, row The judgement document most like with the input case text is obtained after sequence.
Here, between above-mentioned offline feature tool device 312 and online characterization tool device 323 Calculating logic it is identical, the difference of the two is that online characterize is only needed to by letter in tool device 312 Single calculating, can be achieved with and the identical calculating logic of offline feature tool device 323.It is offline special The feature dictionary on keyword that levying tool device 312 is exported, instrument dress is characterized as online Put 323 input, and it is online characterize tool device 323 rely on off-line calculation on keyword Feature dictionary and the online calculating logic simplified, to ensure that same number evidence is input to two tool devices Afterwards, output result is identical.I.e. same judgement document is respectively through offline feature tool device 312 Text eigenvectors the same with after online characterization tool device 323, respectively obtaining and pass In the feature dictionary of keyword so that be more effectively carried out between input case text and judgement document The Similarity Measure of Text eigenvector, the excavation being effectively improved in Court business scene is similar The operating efficiency and accuracy of judgement document;By the online calculating for characterizing tool device 323 Afterwards, the generation input case text in the Text eigenvector generating means 324 of input case text Text eigenvector in case in online similar judgement document's calculating instrument device 325, counting respectively Calculator has the Text eigenvector and input case of the judgement document of some candidates of identical text type Similarity between the Text eigenvector of example text sheet, and in similar judge's text of input case text In book 326, based on the quantity of the similar judgement document needed in Court business scene, by similarity The judgement document of the candidate of highest respective numbers is used as similar judgement document.
In above-described embodiment of the application, setting for similar judgement document is excavated based on big data by the application In above-described embodiment that the application is can be seen that in standby, by big data text analysis technique, Neng Gouyou The text subject characteristic information of judgement document's merit of the similar judgement document of excavation of effect and on described Some keyword relational informations three of party's dispute content and party's demand content will in judgement document Element, and complete factor content and compare two-by-two, excavate similar judgement document to realize.The embodiment of the present application is led to Cross judgement documents all to the whole nation first and set up Text eigenvector, including text subject characteristic information, Text key word feature, expanded keyword feature.Then machine learning real-time computing technique is utilized, to reality When the input case text (or only merit and the bill of complaint of party's demand) that inputs calculate text Characteristic vector, recycles machine learning model, calculates most like with the input case text that inputs in real time Existing court verdict judgement document.In above process, judicial functionary can be according to actual feelings Condition input needs to find similar judgement document, and the application device does not limit the structure of input case text, Fully meet the application scenarios of Court business.
Compared with prior art, one kind according to embodiments herein is used for the first equipment end group The method and apparatus of similar judgement document is excavated in big data, by obtaining the published judge's text of magnanimity Book, and obtain the case of each judgement document by;Content of text based on each judgement document Obtain on the text subject characteristic information of judgement document's merit and in the judgement document Some keyword relational informations of party's dispute content and party's demand content, and based on the text The text that this theme feature information and some keyword relational informations set up the judgement document is special Levy vector;The published each judgement document of magnanimity is effectively passed through into judge's text with judgement document The text subject characteristic information of writing desk feelings and in the judgement document party dispute content and work as These three key elements of some keyword relational informations of thing people's demand content excavate the text of judgement document Characteristic vector, and accurately being shown in the form of Text eigenvector, it is to avoid artificial time-consuming consumption The magnanimity judgement document for removing to power many analysis words, content complexity and different style, so that effectively Improve the operating efficiency for excavating similar judgement document;And based on some keyword relational informations The feature dictionary on keyword is updated, effectively by the content of text of judgement document with all passes The form for the feature dictionary that keyword and its word theme feature and expansion word are set up carries out height identification, makes Can the similar judgement document of quick obtaining and its corresponding Text eigenvector, reached raising dig The effect of the operating efficiency of the similar judgement document of pick.
Further, one kind according to embodiments herein is used for the second equipment end based on big number According to the method and apparatus for excavating similar judgement document, by obtaining input case text first, based on inspection Feature dictionary in rope database on keyword, some candidates for extracting the input case text are closed Keyword so that input case text obtains keyword and can found in searching database, so that effectively Improve the lookup for carrying out similar judgement document to input case text by keyword in ground;It is then based on The content of text and some candidate keywords of the input case text obtain the input case The text subject characteristic information and some keyword relational informations of text, and it is special based on the text subject Reference cease and some keyword relational informations set up it is described input case text text feature to Amount, effectively can be expressed the relevant information for inputting case text by the form of Text eigenvector Out;Finally from the searching database obtain with it is described input case text have phase accomplice by The judgement document of some candidates;Calculate the Text eigenvector of the judgement document of the candidate and described defeated Enter the similarity of the Text eigenvector of case text, similar judge's text is chosen based on the similarity Book, effectively by the Text eigenvector of the judgement document of the candidate sent from the first equipment and in real time The Text eigenvector for the input case text excavated carries out Similarity Measure, obtains similar judge Document, enabling rapidly accurately filtered out and input case from the published judgement document of magnanimity Example text sheet similar judgement document, it is to avoid artificial go that analysis word is more, content complexity with taking time and effort And the judgement document of the magnanimity of different style, so as to be effectively improved the work for excavating Similar Text Efficiency.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, example Such as, it can be set using application specific integrated circuit (ASIC), general purpose computer or any other similar hardware It is standby to realize.In one embodiment, the software program of the application can be realized by computing device Steps described above or function.Similarly, the software program (including related data structure) of the application Can be stored in computer readable recording medium storing program for performing, for example, RAM memory, magnetically or optically driver or Floppy disc and similar devices.In addition, some steps or function of the application can employ hardware to realize, example Such as, as coordinating with processor so as to performing the circuit of each step or function.
In addition, the part of the application can be applied to computer program product, such as computer program Instruction, when it is computer-executed, by the operation of the computer, can call or provide basis The present processes and/or technical scheme.And the programmed instruction of the present processes is called, it may be deposited Store up in fixed or moveable recording medium, and/or by broadcast or other signal bearing medias Data flow and be transmitted, and/or be stored according to the computer equipment of described program instruction operation In working storage.Here, including a device, the device bag according to one embodiment of the application The memory for storing computer program instructions and the processor for execute program instructions are included, wherein, When the computer program instructions are by the computing device, the plant running is triggered based on foregoing according to this The methods and/or techniques scheme of multiple embodiments of application.
It is obvious to a person skilled in the art that the application is not limited to the thin of above-mentioned one exemplary embodiment Section, and in the case of without departing substantially from spirit herein or essential characteristic, can be with other specific Form realizes the application.Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary , and be nonrestrictive, scope of the present application is limited by appended claims rather than described above It is fixed, it is intended that all changes fallen in the implication and scope of the equivalency of claim are included In the application.The right that any reference in claim should not be considered as involved by limitation will Ask.Furthermore, it is to be understood that the word of " comprising " one is not excluded for other units or step, odd number is not excluded for plural number.Dress Software can also be passed through by a unit or device by putting the multiple units stated in claim or device Or hardware is realized.The first, the second grade word is used for representing title, and is not offered as any specific Order.

Claims (27)

1. a kind of be used for the method that the first equipment end excavates similar judgement document based on big data, wherein, Methods described includes:
Obtain the published judgement document of magnanimity, and obtain the case of each judgement document by;
Content of text based on each judgement document obtains the text on judgement document's merit This theme feature information and in the judgement document party dispute content and party's demand in Some keyword relational informations held, and based on the text subject characteristic information and some keys Word relevant information sets up the Text eigenvector of the judgement document;
The feature dictionary on keyword is updated based on some keyword relational informations.
2. according to the method described in claim 1, wherein, methods described also includes:
The judgement document is subjected to structuring processing, the text structure information after structuring is obtained;
Based on judge's relevant information of judgement document described in the text structure acquisition of information, the sanction Sentence relevant information including party's information, case type, case by and court verdict.
3. method according to claim 1 or 2, wherein, methods described also includes:
The Text eigenvector of all judgement documents, the feature dictionary and the judge is related Information is sent into the searching database of the second equipment.
4. the method according to any one of claims 1 to 3, wherein, it is described based on each described The content of text of judgement document obtain on judgement document's merit text subject characteristic information and On in the judgement document when thing dispute content it is related to some keywords of party's demand content Information, and set up described based on the text subject characteristic information and some keyword relational informations The Text eigenvector of judgement document includes:
Extract the judgement document text subject characteristic information and the judgement document in each word Word theme feature;
The context relation between each described word is obtained, based on described each word of context relation amendment The word theme feature of language, and based on each revised word word theme feature and institute The matching degree of text subject characteristic information is stated, the keyword correlation letter of some judgement documents is determined Breath, wherein, the keyword relational information includes keyword, keyword importance information and keyword Corresponding word theme feature;
Based on the keyword relational information, the text subject characteristic information of the judgement document is updated;
Expansion word relevant information, the expansion word relevant information are obtained based on the keyword relational information Expansion word and the expansion word degree of correlation including the keyword, and based on the keyword relational information and The expansion word relevant information sets up bag of words characteristic information, and based on the text subject feature letter updated Breath and the bag of words characteristic information, determine the Text eigenvector of the judgement document.
5. method according to claim 4, wherein, it is upper between described each described word of acquisition Hereafter relation, based on the word theme feature of each word of the context relation amendment, and is based on institute The matching of the word theme feature of each revised word and the text subject characteristic information Degree, determines the keyword relational information of some judgement documents, wherein, the keyword is related Information, which includes keyword, keyword importance information and the corresponding word theme feature of keyword, to be included:
Obtain the context words cooccurrence relation between each described word;
Obtain the context transfer probability between word described in any two;
Based on the context words cooccurrence relation and the context transfer probability, each word is corrected Word theme feature;
Based on each revised word word theme feature and the text subject feature The matching degree of information, determines that the keyword and its corresponding word theme of some judgement documents is special Levy, and obtain the importance information of the keyword.
6. method according to claim 4, wherein, it is described to be based on the keyword relational information Expansion word relevant information is obtained, the expansion word relevant information includes expansion word and the expansion of the keyword The word degree of correlation is opened up, and bag of words are set up based on the keyword relational information and the expansion word relevant information Characteristic information includes:
Based on the keyword and its corresponding word theme feature, the expansion word of the keyword is determined With the expansion word degree of correlation, wherein, the synonym of the expansion word including the keyword and cut out described Sentence the correlation word of height correlation in document;
Based on the keyword and its corresponding word theme feature and the expansion word and expansion word phase Guan Du, using bag of words, sets up bag of words characteristic information.
7. method according to claim 4, wherein, it is described special based on the text subject updated Reference ceases and the bag of words characteristic information, and determining the Text eigenvector of the judgement document includes:
The updated text subject characteristic information and the bag of words characteristic information are merged, really The urtext feature of the fixed judgement document;
Feature normalizing is carried out by the urtext feature to the judgement document, judge's text is determined The Text eigenvector of book.
8. method according to any one of claim 1 to 7, wherein, based on some passes Keyword relevant information updates to be included on the feature dictionary of keyword:
Using the keyword as index, the word theme feature and expansion word of each keyword are built The vertical feature dictionary on keyword.
9. a kind of be used for the method that the second equipment end excavates similar judgement document based on big data, wherein, Methods described includes:
Input case text is obtained, based on the feature dictionary in searching database on keyword, is extracted Some candidate keywords of the input case text;
Content of text and some candidate keywords based on the input case text obtain described The text subject characteristic information and some keyword relational informations of case text are inputted, and based on the text This theme feature information and some keyword relational informations set up the text of the input case text Eigen vector;
From the searching database obtain with it is described input case text have phase accomplice by it is some The judgement document of candidate;
Calculate the Text eigenvector of the judgement document of the candidate and the text of the input case text The similarity of eigen vector, similar judgement document is chosen based on the similarity.
10. method according to claim 9, wherein, methods described also includes:
The text spy of the open judgement document acquired in first equipment is received from the first equipment Vectorial, described feature dictionary and judge's relevant information are levied, and is preserved into the searching database, It is described judge relevant information include party's information, case type, case by and court verdict.
11. the method according to claim 9 or 10, wherein, it is described to obtain input case text, Based on the feature dictionary in searching database on keyword, some of the input case text are extracted Candidate keywords include:
Input case text is obtained, the case based on the input case text is by from searching database Feature dictionary on keyword extracts some candidate keywords of the input case text.
12. the method according to any one of claim 9 to 11, wherein, it is described based on described defeated The content of text and some candidate keywords for entering case text obtain the input case text Text subject characteristic information and some keyword relational informations, and based on the text subject characteristic information The Text eigenvector for setting up the input case text with some keyword relational informations includes:
All keys of each word and all judgement documents based on the input case text Word is compared, to extract candidate keywords and its word theme feature from the input case text, And the text subject characteristic information of the input case text is obtained based on the word theme feature;
The context relation between each described candidate keywords is obtained, based on the context relation amendment The word theme feature of each candidate keywords, and based on each revised described candidate keywords of institute Word theme feature and the text subject characteristic information matching degree, determine the input case The keyword relational information of text;
Based on the keyword relational information, the text subject feature letter of the input case text is updated Breath and acquisition expansion word relevant information, and it is related to the expansion word based on the keyword relational information Information sets up described input this paper bag of words characteristic information, and based on the text subject feature letter updated Breath and the bag of words characteristic information, determine the Text eigenvector of the input case text.
13. the method according to any one of claim 9 to 12, wherein, methods described also includes:
Receive the structuring that carries out the judgement document transmitted by first equipment and handle resulting Text structure information after structuring;
Obtain the text structure information of the similar judgement document.
14. a kind of the first equipment for being used to excavate similar judgement document based on big data, wherein, described the One equipment includes:
Judgement document's acquisition device, for obtaining the published judgement document of magnanimity, and obtains each institute State the case of judgement document by;
Text feature excavating gear, for based on each judgement document content of text obtain on The text subject characteristic information of judgement document's merit and striven on party in the judgement document Some keyword relational informations of content and party's demand content are discussed, and it is special based on the text subject Reference ceases the Text eigenvector that the judgement document is set up with some keyword relational informations;
Feature dictionary sets up device, for being updated based on some keyword relational informations on key The feature dictionary of word.
15. the first equipment according to claim 14, wherein, first equipment also includes:
Text structure makeup is put, and for the judgement document to be carried out into structuring processing, obtains structuring Text structure information afterwards;
Text structure information acquisition device, for based on sanction described in the text structure acquisition of information Sentence judge's relevant information of document, judge's relevant information include party's information, case type, Case by and court verdict.
16. the first equipment according to any one of claims 14 or 15, wherein, described first Equipment also includes:
Dispensing device, for by the Text eigenvector of all judgement documents, the feature dictionary And judge's relevant information is sent into the searching database of the second equipment.
17. the first equipment according to any one of claim 14 to 16, wherein, the text Feature mining device includes:
First excavates unit, text subject characteristic information and the sanction for extracting the judgement document Sentence the word theme feature of the word of each in document;
Second excavates unit, for obtaining the context relation between each described word, on described The hereafter word theme feature of each word of relation amendment, and based on each revised described word of institute Word theme feature and the text subject characteristic information matching degree, determine some judges The keyword relational information of document, wherein, the keyword relational information includes keyword, keyword Importance information and the corresponding word theme feature of keyword;
3rd excavates unit, for based on the keyword relational information, updating the judgement document's Text subject characteristic information;
Generation unit, it is described for obtaining expansion word relevant information based on the keyword relational information Expansion word relevant information includes the expansion word and the expansion word degree of correlation of the keyword, and is closed based on described Keyword relevant information and the expansion word relevant information set up bag of words characteristic information, and based on being updated Text subject characteristic information and the bag of words characteristic information, determine the text feature of the judgement document to Amount.
18. the first equipment according to any one of claim 14 to 17, wherein, described second Excavating unit is used for:
Obtain the context words cooccurrence relation between each described word;
Obtain the context transfer probability between word described in any two;
Based on the context words cooccurrence relation and the context transfer probability, each word is corrected Word theme feature;
Based on each revised word word theme feature and the text subject feature The matching degree of information, determines that the keyword and its corresponding word theme of some judgement documents is special Levy, and obtain the importance information of the keyword.
19. the first equipment according to any one of claim 14 to 18, wherein, the generation Unit is used for:
Based on the keyword and its corresponding word theme feature, the expansion word of the keyword is determined With the expansion word degree of correlation, wherein, the synonym of the expansion word including the keyword and cut out described Sentence the correlation word of height correlation in document;
Based on the keyword and its corresponding word theme feature and the expansion word and expansion word phase Guan Du, using bag of words, sets up bag of words characteristic information.
20. the first equipment according to any one of claim 14 to 19, wherein, the generation Literary unit is used for:
The updated text subject characteristic information and the bag of words characteristic information are merged, really The urtext feature of the fixed judgement document;
Feature normalizing is carried out by the urtext feature to the judgement document, judge's text is determined The Text eigenvector of book.
21. the first equipment according to any one of claim 14 to 20, wherein, the feature Dictionary, which sets up device, to be used for:
Using the keyword as index, the word theme feature and expansion word of each keyword are built The vertical feature dictionary on keyword.
22. a kind of the second equipment for being used to excavate similar judgement document based on big data, wherein, it is described Second equipment includes:
Input unit, for obtain input case text, based in searching database on keyword Feature dictionary, extracts some candidate keywords of the input case text;
Case text feature excavating gear is inputted, for the content of text based on the input case text And if some candidate keywords obtain the input case text text subject characteristic information and Dry keyword relational information, and it is related to some keywords based on the text subject characteristic information Information sets up the Text eigenvector of the input case text;
Candidate's judgement document's acquisition device, for being obtained and the input case from the searching database Example text originally have phase accomplice by some candidates judgement document;
Similar judgement document's acquisition device, for calculate the candidate judgement document text feature to The similarity of the Text eigenvector of amount and the input case text, phase is chosen based on the similarity As judgement document.
23. the second equipment according to claim 22, wherein, second equipment also includes:
Reception device, for receiving the open judge acquired in first equipment from the first equipment The Text eigenvector of document, the feature dictionary and judge's relevant information, and preserve to described In searching database, it is described judge relevant information include party's information, case type, case by with sentence Certainly result.
24. the second equipment according to claim 22 or 23, wherein, the input unit is used In:
Input case text is obtained, the case based on the input case text is by from searching database Feature dictionary on keyword extracts some candidate keywords of the input case text.
25. the second equipment according to any one of claim 22 to 24, wherein, the input Case text feature excavating gear includes:
4th excavates unit, for each word based on the input case text and all sanctions The whole keywords for sentencing document are compared, to extract candidate keywords from the input case text And its word theme feature, and the text for inputting case text is obtained based on the word theme feature This theme feature information;
5th excavates unit, for obtaining the context relation between each described candidate keywords, is based on The word theme feature of each candidate keywords of the context relation amendment, and it is revised to be based on institute The matching journey of the word theme feature of each candidate keywords and the text subject characteristic information Degree, determines the keyword relational information of the input case text;
6th excavates unit, for based on the keyword relational information, updating the input case example text This text subject characteristic information and acquisition expansion word relevant information, and based on keyword correlation letter Breath and the expansion word relevant information set up described input this paper bag of words characteristic information, and based on institute more New text subject characteristic information and the bag of words characteristic information, determines the text of the input case text Eigen vector.
26. the second equipment according to any one of claim 22 to 25, wherein, described second Equipment also includes:
Text structure information receiver, for receiving being cut out described transmitted by first equipment Sentence document and carry out the text structure information after the resulting structuring of structuring processing;
Text structure information acquisition device, the text structure for obtaining the similar judgement document Change information.
27. a kind of system for being used to excavate similar judgement document based on big data, wherein, the system Including the first equipment and the second equipment:
First equipment includes:
Judgement document's acquisition device, for obtaining the published judgement document of magnanimity, and obtains each institute State the case of judgement document by;
Text feature excavating gear, for based on each judgement document content of text obtain on The text subject characteristic information of judgement document's merit and striven on party in the judgement document Some keyword relational informations of content and party's demand content are discussed, and it is special based on the text subject Reference ceases the Text eigenvector that the judgement document is set up with some keyword relational informations;
Feature dictionary sets up device, for being updated based on some keyword relational informations on key The feature dictionary of word;
Text structure makeup is put, and for the judgement document to be carried out into structuring processing, obtains structuring Text structure information afterwards;
Text structure information acquisition device, for based on sanction described in the text structure acquisition of information Sentence judge's relevant information of document, judge's relevant information include party's information, case type, Case by and court verdict;
Dispensing device, for by the Text eigenvector of all judgement documents, the feature dictionary And judge's relevant information is sent into the searching database of the second equipment;
Second equipment includes:
Reception device, for receiving the open judge acquired in first equipment from the first equipment The Text eigenvector of document, the feature dictionary and judge's relevant information, and preserve to described In searching database, it is described judge relevant information include party's information, case type, case by with sentence Certainly result;
Text structure information receiver, for receiving being cut out described transmitted by first equipment Sentence document and carry out the text structure information after the resulting structuring of structuring processing;
Text structure information acquisition device, the text structure for obtaining the similar judgement document Change information;
Input unit, for obtain input case text, based in searching database on keyword Feature dictionary, extracts some candidate keywords of the input case text;
Case text feature excavating gear is inputted, for the content of text based on the input case text And if some candidate keywords obtain the input case text text subject characteristic information and Dry keyword relational information, and it is related to some keywords based on the text subject characteristic information Information sets up the Text eigenvector of the input case text;
Candidate's judgement document's acquisition device, for being obtained and the input case from the searching database Example text originally have phase accomplice by some candidates judgement document;
Similar judgement document's acquisition device, for calculate the candidate judgement document text feature to The similarity of the Text eigenvector of amount and the input case text, phase is chosen based on the similarity As judgement document.
CN201610038106.XA 2016-01-20 2016-01-20 Method and equipment for mining similar referee documents based on big data Active CN106991092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610038106.XA CN106991092B (en) 2016-01-20 2016-01-20 Method and equipment for mining similar referee documents based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610038106.XA CN106991092B (en) 2016-01-20 2016-01-20 Method and equipment for mining similar referee documents based on big data

Publications (2)

Publication Number Publication Date
CN106991092A true CN106991092A (en) 2017-07-28
CN106991092B CN106991092B (en) 2021-11-05

Family

ID=59413645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610038106.XA Active CN106991092B (en) 2016-01-20 2016-01-20 Method and equipment for mining similar referee documents based on big data

Country Status (1)

Country Link
CN (1) CN106991092B (en)

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562938A (en) * 2017-09-21 2018-01-09 重庆工商大学 A kind of law court intelligently tries method
CN107633465A (en) * 2017-08-21 2018-01-26 厦门能见易判信息科技有限公司 Intelligence aids in method of deciding a case
CN107918921A (en) * 2017-11-21 2018-04-17 南京擎盾信息科技有限公司 Criminal case court verdict measure and system
CN108038091A (en) * 2017-10-30 2018-05-15 上海思贤信息技术股份有限公司 A kind of similar calculating of judgement document's case based on figure and search method and system
CN108197163A (en) * 2017-12-14 2018-06-22 上海银江智慧智能化技术有限公司 A kind of structuring processing method based on judgement document
CN108304386A (en) * 2018-03-05 2018-07-20 上海思贤信息技术股份有限公司 A kind of logic-based rule infers the method and device of legal documents court verdict
CN109285094A (en) * 2017-07-19 2019-01-29 北京国双科技有限公司 The processing method and processing device of legal documents
CN109284359A (en) * 2018-09-13 2019-01-29 巫溪县片刻网络科技有限公司 A kind of trial ancillary data management platform
CN109426905A (en) * 2017-08-29 2019-03-05 北京国双科技有限公司 A kind of determination method and device that the criminal document measurement of penalty deviates
CN109472722A (en) * 2017-09-08 2019-03-15 北京国双科技有限公司 Obtain the method and device that judgement document to be generated finds out section relevant information through trying
CN109472017A (en) * 2017-09-08 2019-03-15 北京国双科技有限公司 Obtain the method and device that judgement document the court to be generated thinks section relevant information
CN109583669A (en) * 2017-09-28 2019-04-05 北京国双科技有限公司 Data capture method, device, storage medium and processor
CN110019669A (en) * 2017-10-31 2019-07-16 北京国双科技有限公司 A kind of text searching method and device
CN110019697A (en) * 2017-08-29 2019-07-16 北京国双科技有限公司 A kind of method for pushing and device of criminal document
CN110019670A (en) * 2017-10-31 2019-07-16 北京国双科技有限公司 A kind of text searching method and device
CN110019672A (en) * 2017-11-09 2019-07-16 北京国双科技有限公司 A kind of method for pushing of similar case, system, storage medium and processor
CN110019663A (en) * 2017-09-30 2019-07-16 北京国双科技有限公司 A kind of method for pushing, system, storage medium and the processor of case information
CN110019668A (en) * 2017-10-31 2019-07-16 北京国双科技有限公司 A kind of text searching method and device
CN110162590A (en) * 2019-02-22 2019-08-23 北京捷风数据技术有限公司 A kind of database displaying method and device thereof of calling for tenders of project text combination economic factor
CN110209760A (en) * 2019-06-13 2019-09-06 北京百度网讯科技有限公司 Go through the associated method and apparatus of part of trying a case, electronic equipment, computer-readable medium
WO2019170015A1 (en) * 2018-03-09 2019-09-12 北京国双科技有限公司 Judicial document searching method and device
CN110362799A (en) * 2019-06-17 2019-10-22 平安科技(深圳)有限公司 Processing method, device and computer equipment are generated based on the award arbitrated online
CN110472048A (en) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 A kind of auxiliary judgement method, apparatus and terminal device
CN110727787A (en) * 2019-10-11 2020-01-24 北京明略软件系统有限公司 Case text matching method and device, electronic equipment and storage medium
CN110738039A (en) * 2019-09-03 2020-01-31 平安科技(深圳)有限公司 Prompting method, device, storage medium and server for case auxiliary information
CN110827177A (en) * 2018-08-13 2020-02-21 北京国双科技有限公司 Case-like document searching method and device
CN110941645A (en) * 2018-09-21 2020-03-31 北京国双科技有限公司 Method, device, storage medium and processor for automatically judging case string
CN110955760A (en) * 2018-09-26 2020-04-03 北京国双科技有限公司 Evaluation method of judgment result and related device
CN110968662A (en) * 2018-09-27 2020-04-07 北京国双科技有限公司 Judicial data processing method and device, storage medium and processor
CN110990522A (en) * 2018-09-30 2020-04-10 北京国双科技有限公司 Legal document determining method and system
CN111008261A (en) * 2018-09-19 2020-04-14 北京国双科技有限公司 Method and device for determining referee document based on preposed document
CN111144095A (en) * 2019-11-26 2020-05-12 方正璞华软件(武汉)股份有限公司 Method and device for generating work damage case sanction book
CN111259160A (en) * 2018-11-30 2020-06-09 百度在线网络技术(北京)有限公司 Knowledge graph construction method, device, equipment and storage medium
CN111291152A (en) * 2018-12-07 2020-06-16 北大方正集团有限公司 Case document recommendation method, device, equipment and storage medium
CN111382769A (en) * 2018-12-29 2020-07-07 阿里巴巴集团控股有限公司 Information processing method, device and system
CN112784007A (en) * 2020-07-16 2021-05-11 上海芯翌智能科技有限公司 Text matching method and device, storage medium and computer equipment
CN112925877A (en) * 2019-12-06 2021-06-08 中国科学院软件研究所 One-person multi-case association identification method and system based on depth measurement learning
WO2021164226A1 (en) * 2020-02-20 2021-08-26 平安科技(深圳)有限公司 Method and apparatus for querying knowledge map of legal cases, device and storage medium
CN117453856A (en) * 2023-10-19 2024-01-26 中国司法大数据研究院有限公司 Method and device for extracting calendar and examination case series based on multi-source data fusion
CN117830060A (en) * 2024-03-04 2024-04-05 天津财经大学 Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145153A (en) * 2006-09-13 2008-03-19 阿里巴巴公司 Method and system for searching information
US7577652B1 (en) * 2008-08-20 2009-08-18 Yahoo! Inc. Measuring topical coherence of keyword sets
CN101655857A (en) * 2009-09-18 2010-02-24 西安建筑科技大学 Method for mining data in construction regulation field based on associative regulation mining technology
JP2013030098A (en) * 2011-07-29 2013-02-07 Kddi R & D Laboratories Inc Importance level determination device, importance level determination method, and program
CN102982063A (en) * 2012-09-18 2013-03-20 华东师范大学 Control method based on tuple elaboration of relation keywords extension
CN103294820A (en) * 2013-06-14 2013-09-11 广东电网公司电力科学研究院 WEB page classifying method and system based on semantic extension
US20140040301A1 (en) * 2012-08-02 2014-02-06 Rule 14 Real-time and adaptive data mining
CN103970806A (en) * 2013-02-05 2014-08-06 百度在线网络技术(北京)有限公司 Method and device for establishing lyric-feelings classification models
CN104298715A (en) * 2014-09-16 2015-01-21 北京航空航天大学 TF-IDF based multiple-index result merging and sequencing method
CN104424291A (en) * 2013-09-02 2015-03-18 阿里巴巴集团控股有限公司 Method and device for sorting search results
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145153A (en) * 2006-09-13 2008-03-19 阿里巴巴公司 Method and system for searching information
US7577652B1 (en) * 2008-08-20 2009-08-18 Yahoo! Inc. Measuring topical coherence of keyword sets
CN101655857A (en) * 2009-09-18 2010-02-24 西安建筑科技大学 Method for mining data in construction regulation field based on associative regulation mining technology
JP2013030098A (en) * 2011-07-29 2013-02-07 Kddi R & D Laboratories Inc Importance level determination device, importance level determination method, and program
US20140040301A1 (en) * 2012-08-02 2014-02-06 Rule 14 Real-time and adaptive data mining
CN102982063A (en) * 2012-09-18 2013-03-20 华东师范大学 Control method based on tuple elaboration of relation keywords extension
CN103970806A (en) * 2013-02-05 2014-08-06 百度在线网络技术(北京)有限公司 Method and device for establishing lyric-feelings classification models
CN103294820A (en) * 2013-06-14 2013-09-11 广东电网公司电力科学研究院 WEB page classifying method and system based on semantic extension
CN104424291A (en) * 2013-09-02 2015-03-18 阿里巴巴集团控股有限公司 Method and device for sorting search results
CN104298715A (en) * 2014-09-16 2015-01-21 北京航空航天大学 TF-IDF based multiple-index result merging and sequencing method
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WU D 等: "Identification of web query intent based on query text and web knowledge", 《PCSPA2010 FIRST INTERNATIONAL CONFERENCE ON》 *
向李兴: "基于自然语义处理的裁判文书推荐系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
宋巍 等: "基于检索历史上下文的个性化查询重构技术研究", 《中文信息学报》 *

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109285094A (en) * 2017-07-19 2019-01-29 北京国双科技有限公司 The processing method and processing device of legal documents
CN107633465A (en) * 2017-08-21 2018-01-26 厦门能见易判信息科技有限公司 Intelligence aids in method of deciding a case
CN110019697A (en) * 2017-08-29 2019-07-16 北京国双科技有限公司 A kind of method for pushing and device of criminal document
CN109426905A (en) * 2017-08-29 2019-03-05 北京国双科技有限公司 A kind of determination method and device that the criminal document measurement of penalty deviates
CN109426905B (en) * 2017-08-29 2022-03-18 北京国双科技有限公司 Criminal document criminal deviation judging method and device
CN109472722A (en) * 2017-09-08 2019-03-15 北京国双科技有限公司 Obtain the method and device that judgement document to be generated finds out section relevant information through trying
CN109472722B (en) * 2017-09-08 2021-08-17 北京国双科技有限公司 Method and device for obtaining relevant information of approved finding segment of official document to be generated
CN109472017A (en) * 2017-09-08 2019-03-15 北京国双科技有限公司 Obtain the method and device that judgement document the court to be generated thinks section relevant information
CN109472017B (en) * 2017-09-08 2022-09-20 北京国双科技有限公司 Method and device for obtaining relevant information of text court deeds of referee to be generated
CN107562938A (en) * 2017-09-21 2018-01-09 重庆工商大学 A kind of law court intelligently tries method
CN109583669A (en) * 2017-09-28 2019-04-05 北京国双科技有限公司 Data capture method, device, storage medium and processor
CN110019663A (en) * 2017-09-30 2019-07-16 北京国双科技有限公司 A kind of method for pushing, system, storage medium and the processor of case information
CN108038091B (en) * 2017-10-30 2021-12-14 上海思贤信息技术股份有限公司 Graph-based referee document case similarity calculation and retrieval method and system
CN108038091A (en) * 2017-10-30 2018-05-15 上海思贤信息技术股份有限公司 A kind of similar calculating of judgement document's case based on figure and search method and system
CN110019670A (en) * 2017-10-31 2019-07-16 北京国双科技有限公司 A kind of text searching method and device
CN110019669B (en) * 2017-10-31 2021-06-29 北京国双科技有限公司 Text retrieval method and device
CN110019669A (en) * 2017-10-31 2019-07-16 北京国双科技有限公司 A kind of text searching method and device
CN110019668A (en) * 2017-10-31 2019-07-16 北京国双科技有限公司 A kind of text searching method and device
CN110019672A (en) * 2017-11-09 2019-07-16 北京国双科技有限公司 A kind of method for pushing of similar case, system, storage medium and processor
CN107918921B (en) * 2017-11-21 2021-10-08 南京擎盾信息科技有限公司 Criminal case judgment result measuring method and system
CN107918921A (en) * 2017-11-21 2018-04-17 南京擎盾信息科技有限公司 Criminal case court verdict measure and system
CN108197163B (en) * 2017-12-14 2021-08-10 上海银江智慧智能化技术有限公司 Structured processing method based on referee document
CN108197163A (en) * 2017-12-14 2018-06-22 上海银江智慧智能化技术有限公司 A kind of structuring processing method based on judgement document
CN108304386A (en) * 2018-03-05 2018-07-20 上海思贤信息技术股份有限公司 A kind of logic-based rule infers the method and device of legal documents court verdict
WO2019170015A1 (en) * 2018-03-09 2019-09-12 北京国双科技有限公司 Judicial document searching method and device
CN110827177A (en) * 2018-08-13 2020-02-21 北京国双科技有限公司 Case-like document searching method and device
CN109284359A (en) * 2018-09-13 2019-01-29 巫溪县片刻网络科技有限公司 A kind of trial ancillary data management platform
CN111008261A (en) * 2018-09-19 2020-04-14 北京国双科技有限公司 Method and device for determining referee document based on preposed document
CN111008261B (en) * 2018-09-19 2023-08-25 北京国双科技有限公司 Method and device for determining referee document based on prepositive document
CN110941645A (en) * 2018-09-21 2020-03-31 北京国双科技有限公司 Method, device, storage medium and processor for automatically judging case string
CN110955760A (en) * 2018-09-26 2020-04-03 北京国双科技有限公司 Evaluation method of judgment result and related device
CN110968662A (en) * 2018-09-27 2020-04-07 北京国双科技有限公司 Judicial data processing method and device, storage medium and processor
CN110990522A (en) * 2018-09-30 2020-04-10 北京国双科技有限公司 Legal document determining method and system
CN110990522B (en) * 2018-09-30 2023-07-04 北京国双科技有限公司 Legal document determining method and system
CN111259160A (en) * 2018-11-30 2020-06-09 百度在线网络技术(北京)有限公司 Knowledge graph construction method, device, equipment and storage medium
CN111259160B (en) * 2018-11-30 2023-08-29 百度在线网络技术(北京)有限公司 Knowledge graph construction method, device, equipment and storage medium
CN111291152A (en) * 2018-12-07 2020-06-16 北大方正集团有限公司 Case document recommendation method, device, equipment and storage medium
CN111382769A (en) * 2018-12-29 2020-07-07 阿里巴巴集团控股有限公司 Information processing method, device and system
CN111382769B (en) * 2018-12-29 2023-09-22 阿里巴巴集团控股有限公司 Information processing method, device and system
CN110162590A (en) * 2019-02-22 2019-08-23 北京捷风数据技术有限公司 A kind of database displaying method and device thereof of calling for tenders of project text combination economic factor
CN110209760A (en) * 2019-06-13 2019-09-06 北京百度网讯科技有限公司 Go through the associated method and apparatus of part of trying a case, electronic equipment, computer-readable medium
CN110362799A (en) * 2019-06-17 2019-10-22 平安科技(深圳)有限公司 Processing method, device and computer equipment are generated based on the award arbitrated online
CN110362799B (en) * 2019-06-17 2024-02-06 平安科技(深圳)有限公司 On-line arbitration-based method and device for generating and processing resolution book and computer equipment
CN110472048A (en) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 A kind of auxiliary judgement method, apparatus and terminal device
CN110738039A (en) * 2019-09-03 2020-01-31 平安科技(深圳)有限公司 Prompting method, device, storage medium and server for case auxiliary information
CN110727787A (en) * 2019-10-11 2020-01-24 北京明略软件系统有限公司 Case text matching method and device, electronic equipment and storage medium
CN111144095B (en) * 2019-11-26 2024-04-05 方正璞华软件(武汉)股份有限公司 Method and device for generating work case judgment
CN111144095A (en) * 2019-11-26 2020-05-12 方正璞华软件(武汉)股份有限公司 Method and device for generating work damage case sanction book
CN112925877B (en) * 2019-12-06 2023-07-07 中国科学院软件研究所 One-person-multiple-case association identification method and system based on deep measurement learning
CN112925877A (en) * 2019-12-06 2021-06-08 中国科学院软件研究所 One-person multi-case association identification method and system based on depth measurement learning
WO2021164226A1 (en) * 2020-02-20 2021-08-26 平安科技(深圳)有限公司 Method and apparatus for querying knowledge map of legal cases, device and storage medium
CN112784007A (en) * 2020-07-16 2021-05-11 上海芯翌智能科技有限公司 Text matching method and device, storage medium and computer equipment
CN112784007B (en) * 2020-07-16 2023-02-21 上海芯翌智能科技有限公司 Text matching method and device, storage medium and computer equipment
CN117453856A (en) * 2023-10-19 2024-01-26 中国司法大数据研究院有限公司 Method and device for extracting calendar and examination case series based on multi-source data fusion
CN117453856B (en) * 2023-10-19 2024-05-07 中国司法大数据研究院有限公司 Method and device for extracting hold court trial pieces of calendar series based on multi-source data fusion
CN117830060A (en) * 2024-03-04 2024-04-05 天津财经大学 Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph
CN117830060B (en) * 2024-03-04 2024-05-28 天津财经大学 Injury crime law enforcement supervision and auxiliary decision-making system based on knowledge graph

Also Published As

Publication number Publication date
CN106991092B (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN106991092A (en) The method and apparatus that similar judgement document is excavated based on big data
CN110825881B (en) Method for establishing electric power knowledge graph
CN108984745B (en) Neural network text classification method fusing multiple knowledge maps
CN110309331A (en) A kind of cross-module state depth Hash search method based on self-supervisory
Wen et al. Research on keyword extraction based on word2vec weighted textrank
CN109271537B (en) Text-to-image generation method and system based on distillation learning
CN107526799A (en) A kind of knowledge mapping construction method based on deep learning
CN106855853A (en) Entity relation extraction system based on deep neural network
CN110555084B (en) Remote supervision relation classification method based on PCNN and multi-layer attention
CN110889786A (en) Legal action insured advocate security use judging service method based on LSTM technology
CN109165275B (en) Intelligent substation operation ticket information intelligent search matching method based on deep learning
CN110866121A (en) Knowledge graph construction method for power field
CN111914555B (en) Automatic relation extraction system based on Transformer structure
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN111062214A (en) Integrated entity linking method and system based on deep learning
Soysal et al. An introduction to zero-shot learning: An essential review
CN116721176B (en) Text-to-face image generation method and device based on CLIP supervision
CN112084788B (en) Automatic labeling method and system for implicit emotion tendencies of image captions
CN115203429B (en) Automatic knowledge graph expansion method for constructing ontology framework in auditing field
Ronghui et al. Application of Improved Convolutional Neural Network in Text Classification.
CN115098646A (en) Multilevel relation analysis and mining method for image-text data
CN113688233A (en) Text understanding method for semantic search of knowledge graph
Tan et al. Sentiment analysis of chinese short text based on multiple features
Hua et al. Deep semantic correlation with adversarial learning for cross-modal retrieval
Perwej et al. The State-of-the-Art Handwritten Recognition of Arabic Script Using Simplified Fuzzy ARTMAP and Hidden Markov Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant