CN105930470A - File retrieval method based on feature weight analysis technology - Google Patents

File retrieval method based on feature weight analysis technology Download PDF

Info

Publication number
CN105930470A
CN105930470A CN201610259097.7A CN201610259097A CN105930470A CN 105930470 A CN105930470 A CN 105930470A CN 201610259097 A CN201610259097 A CN 201610259097A CN 105930470 A CN105930470 A CN 105930470A
Authority
CN
China
Prior art keywords
case
feature
tree
retrieval
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610259097.7A
Other languages
Chinese (zh)
Other versions
CN105930470B (en
Inventor
张静川
周宇
贾真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Fu Chi Information Technology Co Ltd
Original Assignee
Anhui Fu Chi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Fu Chi Information Technology Co Ltd filed Critical Anhui Fu Chi Information Technology Co Ltd
Priority to CN201610259097.7A priority Critical patent/CN105930470B/en
Publication of CN105930470A publication Critical patent/CN105930470A/en
Application granted granted Critical
Publication of CN105930470B publication Critical patent/CN105930470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Abstract

The invention relates to a file retrieval method based on a feature weight analysis technology. Compared with the prior art, the file retrieval method eliminates the defect that effective retrieval can not be carried out in a specific area. The file retrieval method comprises the following steps: organizing a judgment document, wherein the judgment document is organized by layer and classification according to causes of action; constructing a case feature tree, screening the public features and the private features of an appointed cause of action for the appointed case of action, organizing the public features and the private features of the appointed cause of action into a tree structure according to a logic relationship among features; carrying out weight training on the case feature tree, adopting a decision tree method to train by aiming at different goals, and calculating the comprehensive weight of case features; obtaining retrieval information, and inputting the filter condition and the query condition of the retrieval information, wherein input modes are condition selection and the input of condition-contained characters or a condition-contained whole judgment document; calculating a case similar matrix; and outputting a retrieval result. The industry takes the case feature tree which is carefully constructed by taking industrial characteristics as guidance as a basis, and drastically improves a retrieval accuracy rate and a coverage rate through semantic analysis and knowledge reasoning.

Description

A kind of document retrieval method of feature based weight analysis technology
Technical field
The present invention relates to data retrieval technology field, the document retrieval method of a kind of feature based weight analysis technology.
Background technology
File Search Technique has been widely used in daily life, obtains to daily quantity of information and provides great convenience.Particularly in special dimension is such as the investigation of judicial case, in the research process of some difficult case, professional is in addition to relying on own service knowledge and experience, it is often necessary to by the existing similar case of retrieval, hold the process of related episodes.And existing Ordinary search technology (approach) includes universal search engine, industrial sustainability, directiveness case;It all there is problems in that
(1) universal search engine: such as Baidu, Yahoo etc.;Customizing entirely without for judicial domain, retrieval rate and coverage rate are the lowest;
(2) industrial sustainability: such as judgement document's net, without disputing net etc.;Compared with universal search engine, retrieval rate and coverage rate have a distinct increment, and allow multi-filtering;But retrieval is based primarily upon keyword match, floats on surface, and accuracy rate is the most relatively low;Filtercondition is default, underaction;
(3) directiveness case: issued by Supreme Court, there is authority, specific aim;But caseload is little, delayed seriously and isolating each other, retrieval coverage rate is the lowest;This instructional model from top to bottom, regional adaptability also needs to be considered.
It addition, above-mentioned retrieval technique does not the most support semantic retrieval, it is impossible to independent assortment filters, querying condition, it is impossible to based on result consecutive retrieval, it is not carried out the statistics to retrieval result and displaying directly perceived.The most how to design the more professional search method of a kind of retrieval to have become as urgent need and solve the technical problem that.
Summary of the invention
The invention aims to the defect solving effectively cannot to retrieve at specific area in prior art, it is provided that the document retrieval method of a kind of feature based weight analysis technology solves the problems referred to above.
To achieve these goals, technical scheme is as follows:
The document retrieval method of a kind of feature based weight analysis technology, comprises the following steps:
The tissue of judgement document, is organized judgement document by hierarchical classification according to case;
Structure case characteristics tree, for appointment case by, screen its publicly-owned feature and privately owned feature, and be organized into tree structure by logical relation between feature;
Case characteristics tree is carried out weight training, uses traditional decision-tree to be trained for different target, calculate the comprehensive weight of case feature;
The acquisition of retrieval information, the input filtercondition of retrieval information and querying condition, input mode be condition select, the word that comprises condition or entire chapter judgement document;
Calculate case similar matrix, from characteristics tree set, screen validity feature tree according to the filtercondition of retrieval information;According to the querying condition of retrieval information, exploitation right renews, and uses weighted manhattan distance method to calculate in the set of validity feature tree similarity, composition similar matrix two-by-two, and is normalized result;
Output retrieval result, obtains similar case from case similar matrix, finds n the case most like with querying condition or the similarity case more than s, adds up this information, and carry out visual presentation.
Described structure case characteristics tree comprises the following steps:
Define publicly-owned feature, publicly-owned be characterized as case general property feature;
Define privately owned feature, the privately owned specific properties being characterized as case;
According to the logical relation between feature, publicly-owned feature is become with privately owned feature organization tree structure, form case characteristics tree.
Described calculating case similar matrix comprises the following steps:
Calculated generated the matrix of case similarity two-by-two by case characteristics tree, feature weight tree, querying condition;
Obtained effective case by filtercondition, obtain individual features value and weight according to querying condition, calculate the similarity of querying condition and case, case and case.
Beneficial effect
The document retrieval method of a kind of feature based weight analysis technology of the present invention, compared with prior art based on industrial nature is for instructing the case characteristics tree meticulously constructed, by semantic analysis and knowledge reasoning, is greatly improved retrieval rate and coverage rate.By with retrieval information as guiding principle, it is possible to independent assortment filters and querying condition;By structure case similar matrix, it is achieved consecutive retrieval based on case;Retrieval result is carried out statistical analysis, intuitively shows relevant information.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the present invention.
Detailed description of the invention
By making the architectural feature to the present invention and effect of being reached have a better understanding and awareness, coordinate detailed description in order to preferred embodiment and accompanying drawing, be described as follows:
As it is shown in figure 1, the document retrieval method of a kind of feature based weight analysis technology of the present invention, comprise the following steps:
The first step, the tissue of judgement document, judgement document is organized by hierarchical classification according to case.Owing to the particularity of present specification is to propose: for different field, the industrial nature of different industries, carry out the structure of characteristics tree, therefore for being directed to different field, its own industrial nature also differs, at this for convenience of the elaboration of technical scheme, the special characteristic with judicial case illustrates technical classification and design, therefore for judgement document for, then according to its case by carrying out hierarchical classification tissue.
Second step, constructs case characteristics tree.For appointment case by, screen its publicly-owned feature and privately owned feature, and be organized into tree structure by logical relation between feature.Construct at this case characteristics tree and case by one_to_one corresponding, reason is that case is by also having hierarchical structure (such as civil/marriage and family/divorce dispute), if by characteristics tree carry in corresponding case by hierarchical structure, then all characteristics tree can be organized into huge tree structure, it is simple to safeguards and browses.In the technical program, case feature is extracted from structural database and judgement document's text, relates to semantic analysis and knowledge reasoning, and for the similar case searching system of prior art, its accuracy rate, coverage rate all have internal to be substantially improved.It specifically includes following steps:
(1) publicly-owned feature is defined.The publicly-owned case general property feature that is characterized as, such as case time, area and case entity information etc., common by case for not accomplice.Generally, publicly-owned feature record, in the structured database of Court business system, directly obtains.
(2) privately owned feature is defined.Reason for divorce, child information, community property etc. in the privately owned specific properties being characterized as case, such as divorce dispute case, peculiar by case for not accomplice.Generally, in privately owned feature record judgement document's text.General, the privately owned feature of case includes directiveness case trial main idea and other central issue, is the comparison point of case similarity.
(3) according to the logical relation between feature, publicly-owned feature is become with privately owned feature organization tree structure, form case characteristics tree.
3rd step, carries out weight training to case characteristics tree.Based on domain knowledge, calculate case feature weight value by informatics principle, use traditional decision-tree to be trained for different target, calculate the comprehensive weight of case feature.
Case feature weight tree, is a kind of to describe the data structure of relative weighting between case feature.Different from the most similar case searching system, the information in its search condition has weight, for calculating the similarity between search condition and case, case and case.Introduce information weight to be heavily capable of:
(1) when search condition cannot all meet, the case sequence meeting the higher condition of weight is forward;
(2) when search condition can all meet, the sequence of case can be by further feature weighting sequence.
And the determination for case feature weight can have multiple method, such as based on domain knowledge, based on informatics principle etc..Also being tree structure owing to case feature organization is become tree structure, characteristic of correspondence weight by this programme, and meet certain constraint, such as father node weight is equal to child node weight sum.
4th step, the acquisition of retrieval information.The input filtercondition of retrieval information and querying condition, input mode be condition select, the word that comprises condition or entire chapter judgement document.
Wherein, filtercondition is filter, is used for limiting case time, area etc., the usually publicly-owned feature of case, is not involved in case Similarity Measure;Querying condition is requestor, is used for specifying retrieval dimension, the usually privately owned feature of case, constitutes case Similarity Measure dimension.The fundamental difference of two kinds of conditions is: filtercondition must is fulfilled for, and querying condition is nonessential to be met.User search condition portion is divided into filtration and inquiry, is favorably improved controllability and the motility of searching system.
5th step, calculates case similar matrix.Filtercondition according to retrieval information screens validity feature tree from characteristics tree set;According to the querying condition of retrieval information, exploitation right renews, and uses weighted manhattan distance method to calculate in the set of validity feature tree similarity, composition similar matrix two-by-two, and is normalized result.It specifically includes following steps:
(1) calculated generated the matrix of case similarity two-by-two by case characteristics tree, feature weight tree, querying condition, i.e. describe the matrix of case similarity two-by-two, calculated generation by case characteristics tree, feature weight tree, querying condition, and dynamically changed with querying condition.
(2) obtained effective case by filtercondition, obtain individual features value and weight according to querying condition, calculate the similarity of querying condition and case, case and case.After user inputs one group of retrieval information, filtercondition obtain effective case, then obtain individual features value and weight according to querying condition, calculate the similarity of querying condition and case, case and case.The calculating of case similarity by defining suitable distance, and can combine weight information.If effectively caseload is N, then case similar matrix dimension is (N+1) × (N+1).Calculate case and the similarity of case under querying condition, it is possible to achieve cascade based on case is retrieved.
6th step, output retrieval result.From case similar matrix, obtain similar case, find n the case most like with querying condition or the similarity case more than s, this information is added up, and carries out visual presentation.At this point it is possible to certain case is condition in selection result, similar matrix obtain cascade retrieval result.
The ultimate principle of the present invention, principal character and advantages of the present invention have more than been shown and described.Skilled person will appreciate that of the industry; the present invention is not restricted to the described embodiments; the principle of the simply present invention described in above-described embodiment and description; the present invention also has various changes and modifications without departing from the spirit and scope of the present invention, and these changes and improvements both fall within the range of claimed invention.The protection domain of application claims is defined by appending claims and equivalent thereof.

Claims (3)

1. the document retrieval method of a feature based weight analysis technology, it is characterised in that comprise the following steps:
11) tissue of judgement document, is organized judgement document by hierarchical classification according to case;
12) structure case characteristics tree, for appointment case by, screen its publicly-owned feature and privately owned feature, and be organized into tree structure by logical relation between feature;
13) case characteristics tree is carried out weight training, use traditional decision-tree to be trained for different target, calculate the comprehensive weight of case feature;
14) retrieve the acquisition of information, the input filtercondition of retrieval information and querying condition, input mode be condition select, the word that comprises condition or entire chapter judgement document;
15) calculate case similar matrix, from characteristics tree set, screen validity feature tree according to the filtercondition of retrieval information;According to the querying condition of retrieval information, exploitation right renews, and uses weighted manhattan distance method to calculate in the set of validity feature tree similarity, composition similar matrix two-by-two, and is normalized result;
16) output retrieval result, obtains similar case from case similar matrix, finds n the case most like with querying condition or the similarity case more than s, adds up this information, and carry out visual presentation.
The document retrieval method of a kind of feature based weight analysis technology the most according to claim 1, it is characterised in that described structure case characteristics tree comprises the following steps:
21) define publicly-owned feature, publicly-owned be characterized as case general property feature;
22) privately owned feature is defined, the privately owned specific properties being characterized as case;
23) according to the logical relation between feature, publicly-owned feature is become with privately owned feature organization tree structure, form case characteristics tree.
The document retrieval method of a kind of feature based weight analysis technology the most according to claim 1, it is characterised in that described calculating case similar matrix comprises the following steps:
31) calculated generated the matrix of case similarity two-by-two by case characteristics tree, feature weight tree, querying condition;
32) obtained effective case by filtercondition, obtain individual features value and weight according to querying condition, calculate the similarity of querying condition and case, case and case.
CN201610259097.7A 2016-04-25 2016-04-25 A kind of document retrieval method based on feature weight analytical technology Active CN105930470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610259097.7A CN105930470B (en) 2016-04-25 2016-04-25 A kind of document retrieval method based on feature weight analytical technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610259097.7A CN105930470B (en) 2016-04-25 2016-04-25 A kind of document retrieval method based on feature weight analytical technology

Publications (2)

Publication Number Publication Date
CN105930470A true CN105930470A (en) 2016-09-07
CN105930470B CN105930470B (en) 2019-03-26

Family

ID=56837041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610259097.7A Active CN105930470B (en) 2016-04-25 2016-04-25 A kind of document retrieval method based on feature weight analytical technology

Country Status (1)

Country Link
CN (1) CN105930470B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066468A (en) * 2016-11-18 2017-08-18 北京市高级人民法院 A kind of case search method based on genetic algorithm and nearest neighbor algorithm
WO2018113498A1 (en) * 2016-12-23 2018-06-28 北京国双科技有限公司 Method and apparatus for retrieving legal knowledge
CN108595548A (en) * 2018-04-09 2018-09-28 南京网感至察信息科技有限公司 A kind of case judge's prediction of result method based on Markov Logic Network
WO2018184427A1 (en) * 2017-04-06 2018-10-11 北京国双科技有限公司 Method and apparatus for recommending judicial knowledge
CN109033041A (en) * 2017-06-09 2018-12-18 北京国双科技有限公司 The treating method and apparatus of document similarity
CN109947897A (en) * 2019-03-15 2019-06-28 南京邮电大学 Judicial case event tree constructs system and method
CN110019655A (en) * 2017-07-21 2019-07-16 北京国双科技有限公司 Precedent case acquisition methods and device
CN110032721A (en) * 2018-01-11 2019-07-19 北京国双科技有限公司 A kind of judgement document's method for pushing and device
CN112561744A (en) * 2019-09-25 2021-03-26 北京国双科技有限公司 Method and device for generating similar case retrieval report
CN113160000A (en) * 2021-04-22 2021-07-23 广州广电运通信息科技有限公司 Legal information analysis method, system, device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120166477A1 (en) * 2000-01-05 2012-06-28 Apple Inc. Universal Interface for Retrieval of Information in a Computer System
US20140236895A1 (en) * 2013-02-15 2014-08-21 Red Hat, Inc. File link migration for decommisioning a storage server
US20140379695A1 (en) * 2013-06-19 2014-12-25 Research In Motion Limited Searching data using pre-prepared search data
CN105320699A (en) * 2014-08-04 2016-02-10 中国科学院深圳先进技术研究院 Old age preferential treatment certificate service pushing method and system
CN105354282A (en) * 2015-10-30 2016-02-24 青岛海尔智能家电科技有限公司 XML file retrieval method and apparatus
CN105447198A (en) * 2015-12-30 2016-03-30 深圳市瑞铭无限科技有限公司 Convenient page script importing method and device
CN105512339A (en) * 2015-12-31 2016-04-20 深圳市朗科科技股份有限公司 File searcher and searching method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120166477A1 (en) * 2000-01-05 2012-06-28 Apple Inc. Universal Interface for Retrieval of Information in a Computer System
US20140236895A1 (en) * 2013-02-15 2014-08-21 Red Hat, Inc. File link migration for decommisioning a storage server
US20140379695A1 (en) * 2013-06-19 2014-12-25 Research In Motion Limited Searching data using pre-prepared search data
CN105320699A (en) * 2014-08-04 2016-02-10 中国科学院深圳先进技术研究院 Old age preferential treatment certificate service pushing method and system
CN105354282A (en) * 2015-10-30 2016-02-24 青岛海尔智能家电科技有限公司 XML file retrieval method and apparatus
CN105447198A (en) * 2015-12-30 2016-03-30 深圳市瑞铭无限科技有限公司 Convenient page script importing method and device
CN105512339A (en) * 2015-12-31 2016-04-20 深圳市朗科科技股份有限公司 File searcher and searching method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066468A (en) * 2016-11-18 2017-08-18 北京市高级人民法院 A kind of case search method based on genetic algorithm and nearest neighbor algorithm
WO2018113498A1 (en) * 2016-12-23 2018-06-28 北京国双科技有限公司 Method and apparatus for retrieving legal knowledge
WO2018184427A1 (en) * 2017-04-06 2018-10-11 北京国双科技有限公司 Method and apparatus for recommending judicial knowledge
CN109033041A (en) * 2017-06-09 2018-12-18 北京国双科技有限公司 The treating method and apparatus of document similarity
CN110019655A (en) * 2017-07-21 2019-07-16 北京国双科技有限公司 Precedent case acquisition methods and device
CN110032721A (en) * 2018-01-11 2019-07-19 北京国双科技有限公司 A kind of judgement document's method for pushing and device
CN108595548A (en) * 2018-04-09 2018-09-28 南京网感至察信息科技有限公司 A kind of case judge's prediction of result method based on Markov Logic Network
CN109947897A (en) * 2019-03-15 2019-06-28 南京邮电大学 Judicial case event tree constructs system and method
CN112561744A (en) * 2019-09-25 2021-03-26 北京国双科技有限公司 Method and device for generating similar case retrieval report
CN113160000A (en) * 2021-04-22 2021-07-23 广州广电运通信息科技有限公司 Legal information analysis method, system, device and storage medium

Also Published As

Publication number Publication date
CN105930470B (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN105930470A (en) File retrieval method based on feature weight analysis technology
da Costa Pereira et al. Multidimensional relevance: Prioritized aggregation in a personalized Information Retrieval setting
JP5063682B2 (en) Method for document region identification in a document database
US20140181125A1 (en) Systems and methods for facilitating the gathering of open source intelligence
US9348934B2 (en) Systems and methods for facilitating open source intelligence gathering
US8135739B2 (en) Online relevance engine
CN105930473A (en) Random forest technology-based similar file retrieval method
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
US20140040267A1 (en) Information searching apparatus, information searching method, and computer product
JP2009093653A (en) Refining search space responding to user input
CN106066873A (en) A kind of travel information based on body recommends method
JP2009093650A (en) Selection of tag for document by paragraph analysis of document
CN105095281B (en) A kind of web catalogue method for optimization analysis based on Web log mining
CN105426514A (en) Personalized mobile APP recommendation method
CN106126695A (en) A kind of similar case search method and device
Portmann et al. FORA–A fuzzy set based framework for online reputation management
JPWO2010013473A1 (en) Data classification system, data classification method, and data classification program
KR20160053933A (en) Smart search refinement
CN104408083A (en) Socialized media analyzing system
CN106227788A (en) Database query method based on Lucene
Mazeika et al. Entity timelines: visual analytics and named entity evolution
EP2354975B1 (en) Automatic association of informational entities
JP5500070B2 (en) Data classification system, data classification method, and data classification program
CN113987139A (en) Knowledge graph-based visual query management system for software defect cases of aircraft engine FADEC system
Rauch et al. Knowminer search-a multi-visualisation collaborative approach to search result analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant