CN105930470B - A kind of document retrieval method based on feature weight analytical technology - Google Patents

A kind of document retrieval method based on feature weight analytical technology Download PDF

Info

Publication number
CN105930470B
CN105930470B CN201610259097.7A CN201610259097A CN105930470B CN 105930470 B CN105930470 B CN 105930470B CN 201610259097 A CN201610259097 A CN 201610259097A CN 105930470 B CN105930470 B CN 105930470B
Authority
CN
China
Prior art keywords
case
feature
tree
condition
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610259097.7A
Other languages
Chinese (zh)
Other versions
CN105930470A (en
Inventor
张静川
周宇
贾真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Fu Chi Information Technology Co Ltd
Original Assignee
Anhui Fu Chi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Fu Chi Information Technology Co Ltd filed Critical Anhui Fu Chi Information Technology Co Ltd
Priority to CN201610259097.7A priority Critical patent/CN105930470B/en
Publication of CN105930470A publication Critical patent/CN105930470A/en
Application granted granted Critical
Publication of CN105930470B publication Critical patent/CN105930470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Abstract

The present invention relates to a kind of document retrieval method based on feature weight analytical technology, solving compared with prior art can not be in the defect that specific area is effectively retrieved.The present invention the following steps are included: judgement document tissue, by judgement document according to case by hierarchical classification tissue;Case characteristics tree is constructed, for specified case by screening its publicly-owned feature and privately owned feature, and be organized into tree structure by logical relation between feature;Weight training is carried out to case characteristics tree, is trained using traditional decision-tree for different target, calculates the comprehensive weight of case feature;The acquisition of information, the filter condition and querying condition of input retrieval information are retrieved, input mode is condition selection, the text comprising condition or entire chapter judgement document;Calculate case similar matrix;Export search result.Based on the case characteristics tree that the present invention constructs meticulously by industrial nature is guidance, by semantic analysis and knowledge reasoning, retrieval rate and coverage rate are greatly improved.

Description

A kind of document retrieval method based on feature weight analytical technology
Technical field
The present invention relates to data retrieval technology field, specifically a kind of file inspections based on feature weight analytical technology Suo Fangfa.
Background technique
File Search Technique has been widely used in daily life, to daily information content acquisition provide it is very big just Benefit.Especially in the discussion of such as judicial case of special dimension, in the research process of certain difficult cases, professional remove according to Other than own service knowledge and experience, it is often necessary to have similar case by retrieval, to hold the processing of related episodes.And Existing Ordinary search technology (approach) includes universal search engine, industrial sustainability, guiding case;It has the following problems:
(1) universal search engine: such as Baidu, Yahoo;It absolutely not customizes, retrieval rate and covers for judicial domain Lid rate is very low;
(2) industrial sustainability: such as judgement document's net, nothing dispute net;Compared with universal search engine, retrieval rate and covering Rate has a distinct increment, and allows multi-filtering;But retrieval is based primarily upon keyword match, floats on surface, accuracy rate still compared with It is low;Filter condition be it is default, it is inflexible;
(3) it guiding case: is issued by most Supreme Court, there is authoritative, specific aim;But caseload is seldom, lag is tight Weight, and isolate each other, retrieval coverage rate is very low;This instructional model from top to bottom, regional adaptability also need to be considered.
In addition, above-mentioned retrieval technique does not support semantic retrieval, filtering, querying condition can not be freely combined, cannot be based on As a result consecutive retrieval is not carried out statistics and intuitive displaying to search result.Therefore it is more professional how to design a kind of retrieval Search method have become technical problem urgently to be solved.
Summary of the invention
The purpose of the present invention is to solve can not provide in the prior art in the defect that specific area is effectively retrieved It is a kind of to be solved the above problems based on the document retrieval method of feature weight analytical technology.
To achieve the goals above, technical scheme is as follows:
A kind of document retrieval method based on feature weight analytical technology, comprising the following steps:
The tissue of judgement document, by judgement document according to case by hierarchical classification tissue;
Case characteristics tree is constructed, for specified case by screening its publicly-owned feature and privately owned feature, and close by logic between feature System is organized into tree structure;
Weight training is carried out to case characteristics tree, is trained using traditional decision-tree for different target, calculates case The comprehensive weight of part feature;
The acquisition of information, the filter condition and querying condition of input retrieval information are retrieved, input mode is condition selection, packet Text or entire chapter judgement document containing condition;
Case similar matrix is calculated, validity feature tree is screened from characteristics tree set according to the filter condition of retrieval information; According to the querying condition of retrieval information, exploitation right is renewed, and is calculated in validity feature tree set using weighted manhattan distance method Similarity two-by-two forms similar matrix, and result is normalized;
Search result is exported, similar case is obtained from case similar matrix, finds the n case most like with querying condition Part or similarity are greater than the case of s, count to this information, and visualized.
The construction case characteristics tree the following steps are included:
Publicly-owned feature is defined, publicly-owned feature is case general property feature;
Privately owned feature is defined, privately owned feature is the specific properties of case;
It is special to form case by publicly-owned feature and privately owned feature organization at tree structure according to the logical relation between feature Sign tree.
The calculating case similar matrix the following steps are included:
The matrix for generating case similarity two-by-two is calculated by case characteristics tree, feature weight tree, querying condition;
Effective case is obtained by filter condition, individual features value and weight are obtained according to querying condition, calculates inquiry item Part and case, the similarity of case and case.
Beneficial effect
A kind of document retrieval method based on feature weight analytical technology of the invention, compared with prior art with industry spy Property for guidance based on the case characteristics tree that constructs meticulously, by semantic analysis and knowledge reasoning, it is accurate to greatly improve retrieval Rate and coverage rate.By the way that filtering and querying condition can be freely combined to retrieve information as guiding principle;Pass through the similar square of construction case Battle array realizes the consecutive retrieval based on case;It is for statistical analysis to search result, intuitively show relevant information.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
The effect of to make to structure feature of the invention and being reached, has a better understanding and awareness, to preferable Examples and drawings cooperation detailed description, is described as follows:
As shown in Figure 1, a kind of document retrieval method based on feature weight analytical technology of the present invention, including it is following Step:
The first step, the tissue of judgement document, by judgement document according to case by hierarchical classification tissue.Due to present specification Particularity is to propose: for different field, the industrial nature of different industries, the construction of Lai Jinhang characteristics tree, therefore being directed to For different field, owned industrial nature is not also identical, and herein for convenience of the elaboration of technical solution, spy is with judicial case The characteristic of part illustrates technical classification and design, therefore for judgement document, then according to its case by carrying out layering point Class loading.
Second step constructs case characteristics tree.For specified case by screening its publicly-owned feature and privately owned feature, and press feature Between logical relation be organized into tree structure.Construct herein case characteristics tree and case by one-to-one correspondence, reason is case by also having There are hierarchical structure (such as civil/marriage and family/divorce dispute), if by characteristics tree carry in corresponding case by hierarchical structure, that Whole characteristics trees can be organized into huge tree structure, convenient for safeguarding and browsing.Case feature is from structure in the technical program It is extracted in database and judgement document's text, is related to semantic analysis and knowledge reasoning, the similar case compared with the prior art retrieves system For system, accuracy rate, coverage rate have essential be substantially improved.Itself specifically includes the following steps:
(1) publicly-owned feature is defined.Publicly-owned feature is case general property feature, such as case time, area and case entity Information etc. is not accomplice as common to case.In general, publicly-owned feature is recorded in the structured database of Court business system, It directly acquires.
(2) privately owned feature is defined.Privately owned feature is reason for divorce, son in the specific properties of case, such as divorce dispute case Female's information, community property etc., it is peculiar by case for not accomplice.In general, in privately owned feature record judgement document's text.Generally , it is the comparison point of case similitude that the privately owned feature of case, which includes guiding case trial main idea and other central issues,.
(3) case is formed by publicly-owned feature and privately owned feature organization at tree structure according to the logical relation between feature Characteristics tree.
Third step carries out weight training to case characteristics tree.Based on domain knowledge, pass through informatics principle calculating case Part feature weight value is trained for different target using traditional decision-tree, calculates the comprehensive weight of case feature.
Case feature weight tree, be it is a kind of description case feature between relative weighting data structure.Case similar to having Searching system is different, and the information in search condition has weight, for calculate search condition and case, case and case it Between similarity.Introduce information weight can be realized again:
(1) when search condition can not all meet, the case sequence for meeting the higher condition of weight is forward;
(2) when search condition can all meet, the sequence of case can be weighted by other feature sorts.
And for the determination of case feature weight can there are many method, such as based on domain knowledge, it is former based on informatics Reason etc..Due to this programme by case feature organization at tree structure, corresponding feature weight is also tree structure, and is met certain Constraint, such as father node weight are equal to the sum of child node weight.
4th step retrieves the acquisition of information.The filter condition and querying condition of input retrieval information, input mode is condition Selection, the text comprising condition or entire chapter judgement document.
Wherein, filter condition is filter, and for limiting case time, area etc., usually the publicly-owned feature of case, does not join With case similarity calculation;Querying condition is requestor, retrieves dimension for specified, usually the privately owned feature of case, composition case Part similarity calculation dimension.The fundamental difference of two kinds of conditions is: filter condition must satisfy, the nonessential satisfaction of querying condition. User search condition is divided into filtering and inquiry, helps to improve the controllability and flexibility of searching system.
5th step calculates case similar matrix.It is screened from characteristics tree set effectively according to the filter condition of retrieval information Characteristics tree;According to the querying condition of retrieval information, exploitation right is renewed, and calculates validity feature tree using weighted manhattan distance method Similarity two-by-two in set forms similar matrix, and result is normalized.Itself specifically includes the following steps:
(1) matrix for generating case similarity two-by-two is calculated by case characteristics tree, feature weight tree, querying condition, that is, is retouched The matrix for stating case similarity two-by-two is calculated by case characteristics tree, feature weight tree, querying condition and is generated, and with querying condition Dynamic change.
(2) effective case is obtained by filter condition, individual features value and weight is obtained according to querying condition, calculate inquiry Condition and case, the similarity of case and case.After user inputs one group of retrieval information, effective case is obtained by filter condition Then part obtains individual features value and weight according to querying condition, calculate querying condition and case, case are similar to case Degree.The calculating of case similarity can be by defining suitable distance, and combines weight information.If effective caseload is N, So case similar matrix dimension is (N+1) × (N+1).The similarity for calculating case and case under querying condition, may be implemented Cascade retrieval based on case.
6th step exports search result.Similar case is obtained from case similar matrix, is found most like with querying condition N case or similarity be greater than s case, this information is counted, and is visualized.At this point it is possible to select As a result some case is condition in, obtains cascade search result by similar matrix.
The basic principles, main features and advantages of the present invention have been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and what is described in the above embodiment and the description is only the present invention Principle, various changes and improvements may be made to the invention without departing from the spirit and scope of the present invention, these variation and Improvement is both fallen in the range of claimed invention.The present invention claims protection scope by appended claims and its Equivalent defines.

Claims (3)

1. a kind of document retrieval method based on feature weight analytical technology, which comprises the following steps:
11) tissue of judgement document, by judgement document according to case by hierarchical classification tissue;
12) case characteristics tree is constructed, for specified case by screening its publicly-owned feature and privately owned feature, and close by logic between feature System is organized into tree structure;
13) weight training is carried out to case characteristics tree, is trained using traditional decision-tree for different target, calculates case The comprehensive weight of feature;
14) acquisition of information, the filter condition and querying condition of input retrieval information are retrieved, input mode is condition selection, packet Text or entire chapter judgement document containing condition;
15) case similar matrix is calculated, validity feature tree is screened from characteristics tree set according to the filter condition of retrieval information;Root According to the querying condition of retrieval information, exploitation right is renewed, and is calculated two in validity feature tree set using weighted manhattan distance method Two similarities form similar matrix, and result are normalized;
16) search result is exported, similar case is obtained from case similar matrix, finds the n case most like with querying condition Part or similarity are greater than the case of s, count to this information, and visualized.
2. a kind of document retrieval method based on feature weight analytical technology according to claim 1, which is characterized in that institute The construction case characteristics tree stated the following steps are included:
21) publicly-owned feature is defined, publicly-owned feature is case general property feature;
22) privately owned feature is defined, privately owned feature is the specific properties of case;
23) case feature is formed by publicly-owned feature and privately owned feature organization at tree structure according to the logical relation between feature Tree.
3. a kind of document retrieval method based on feature weight analytical technology according to claim 1, which is characterized in that institute The calculating case similar matrix stated the following steps are included:
31) matrix for generating case similarity two-by-two is calculated by case characteristics tree, feature weight tree, querying condition;
32) effective case is obtained by filter condition, individual features value and weight is obtained according to querying condition, calculate querying condition With the similarity of case, case and case.
CN201610259097.7A 2016-04-25 2016-04-25 A kind of document retrieval method based on feature weight analytical technology Active CN105930470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610259097.7A CN105930470B (en) 2016-04-25 2016-04-25 A kind of document retrieval method based on feature weight analytical technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610259097.7A CN105930470B (en) 2016-04-25 2016-04-25 A kind of document retrieval method based on feature weight analytical technology

Publications (2)

Publication Number Publication Date
CN105930470A CN105930470A (en) 2016-09-07
CN105930470B true CN105930470B (en) 2019-03-26

Family

ID=56837041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610259097.7A Active CN105930470B (en) 2016-04-25 2016-04-25 A kind of document retrieval method based on feature weight analytical technology

Country Status (1)

Country Link
CN (1) CN105930470B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066468A (en) * 2016-11-18 2017-08-18 北京市高级人民法院 A kind of case search method based on genetic algorithm and nearest neighbor algorithm
CN108241621B (en) * 2016-12-23 2019-12-10 北京国双科技有限公司 legal knowledge retrieval method and device
CN108694178B (en) * 2017-04-06 2020-11-27 北京国双科技有限公司 Method and device for recommending judicial knowledge
CN109033041A (en) * 2017-06-09 2018-12-18 北京国双科技有限公司 The treating method and apparatus of document similarity
CN110019655A (en) * 2017-07-21 2019-07-16 北京国双科技有限公司 Precedent case acquisition methods and device
CN110032721B (en) * 2018-01-11 2023-11-03 北京国双科技有限公司 Judge document pushing method and device
CN108595548A (en) * 2018-04-09 2018-09-28 南京网感至察信息科技有限公司 A kind of case judge's prediction of result method based on Markov Logic Network
CN109947897B (en) * 2019-03-15 2020-12-15 南京邮电大学 Judicial case event tree construction method
CN112561744A (en) * 2019-09-25 2021-03-26 北京国双科技有限公司 Method and device for generating similar case retrieval report
CN113160000A (en) * 2021-04-22 2021-07-23 广州广电运通信息科技有限公司 Legal information analysis method, system, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320699A (en) * 2014-08-04 2016-02-10 中国科学院深圳先进技术研究院 Old age preferential treatment certificate service pushing method and system
CN105354282A (en) * 2015-10-30 2016-02-24 青岛海尔智能家电科技有限公司 XML file retrieval method and apparatus
CN105447198A (en) * 2015-12-30 2016-03-30 深圳市瑞铭无限科技有限公司 Convenient page script importing method and device
CN105512339A (en) * 2015-12-31 2016-04-20 深圳市朗科科技股份有限公司 File searcher and searching method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6847959B1 (en) * 2000-01-05 2005-01-25 Apple Computer, Inc. Universal interface for retrieval of information in a computer system
US8983908B2 (en) * 2013-02-15 2015-03-17 Red Hat, Inc. File link migration for decommisioning a storage server
US9292525B2 (en) * 2013-06-19 2016-03-22 BlackBerry Limited; 2236008 Ontario Inc. Searching data using pre-prepared search data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320699A (en) * 2014-08-04 2016-02-10 中国科学院深圳先进技术研究院 Old age preferential treatment certificate service pushing method and system
CN105354282A (en) * 2015-10-30 2016-02-24 青岛海尔智能家电科技有限公司 XML file retrieval method and apparatus
CN105447198A (en) * 2015-12-30 2016-03-30 深圳市瑞铭无限科技有限公司 Convenient page script importing method and device
CN105512339A (en) * 2015-12-31 2016-04-20 深圳市朗科科技股份有限公司 File searcher and searching method

Also Published As

Publication number Publication date
CN105930470A (en) 2016-09-07

Similar Documents

Publication Publication Date Title
CN105930470B (en) A kind of document retrieval method based on feature weight analytical technology
CN105930473B (en) A kind of similar documents search method based on random forest technology
US10235421B2 (en) Systems and methods for facilitating the gathering of open source intelligence
US20160019217A1 (en) Systems and methods for recommending media items
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
US8224805B2 (en) Method for generating context hierarchy and system for generating context hierarchy
AU2011224139B2 (en) Analysis of object structures such as benefits and provider contracts
CN101639859A (en) Table classification device, table classification method, and table classification program
CN107180093A (en) Information search method and device and ageing inquiry word recognition method and device
US20120099785A1 (en) Using near-duplicate video frames to analyze, classify, track, and visualize evolution and fitness of videos
Hussein et al. Using the interestingness measure lift to generate association rules
KR102108683B1 (en) Method for providing recommendation contents including non-interest contents
Tibély et al. Extracting tag hierarchies
CN107341199A (en) A kind of recommendation method based on documentation & info general model
CN106354860A (en) Method for automatically labelling and pushing information resource based on label sets
CN108280124A (en) Product classification method and device, ranking list generation method and device, electronic equipment
CN110569273A (en) Patent retrieval system and method based on relevance sorting
CN104408083A (en) Socialized media analyzing system
CN108241713A (en) A kind of inverted index search method based on polynary cutting
JP5500070B2 (en) Data classification system, data classification method, and data classification program
Garcia-Buendia et al. A bibliometric study of lean supply chain management research: 1996–2020
Singh et al. Structure-aware visualization of text corpora
CN104462552A (en) Question and answer page core word extracting method and device
CN110825792A (en) High-concurrency distributed data retrieval method based on golang middleware coroutine mode
Zainol et al. Visualizing military explicit knowledge using document clustering techniques

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant