CN101625680B - Document retrieval method in patent field - Google Patents
Document retrieval method in patent field Download PDFInfo
- Publication number
- CN101625680B CN101625680B CN200810012248A CN200810012248A CN101625680B CN 101625680 B CN101625680 B CN 101625680B CN 200810012248 A CN200810012248 A CN 200810012248A CN 200810012248 A CN200810012248 A CN 200810012248A CN 101625680 B CN101625680 B CN 101625680B
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- text
- mover
- texts
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000004364 calculation method Methods 0.000 claims abstract description 29
- 238000012163 sequencing technique Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 3
- 238000003672 processing method Methods 0.000 claims description 2
- 238000010187 selection method Methods 0.000 claims description 2
- 238000012706 support-vector machine Methods 0.000 claims 1
- 238000011160 research Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 7
- 238000003058 natural language processing Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a document retrieval method in the patent field, which comprises the following steps: preprocessing query texts and patent texts; retrieving the patent texts correlative with the query texts, adopting a calculation method with various similarities to obtain values of different similarities, combining the values of different similarities to recalculate the similarities, and sequencing the patent texts according to the new values of the similarities; adopting various decision methods to map the sequencing of the similarities of the patent text into different sequencings of patent category interdependencies; integrating the sequencing results of various patent category interdependencies, and performing resequencing to obtain the sequencing of new patent category interdependencies; and selecting the patent category most relevant to the query texts from the sequencing of the new patent category interdependencies. The document retrieval method uses the calculation method with various similarities to finally weigh the degree of correlation of the query texts and the patent texts, and uses information of characteristic multi-angles and considers a plurality of system combinations to achieve the aim of mutual complementation and improve the system performance.
Description
Technical Field
The invention relates to a data retrieval method, in particular to a document retrieval method oriented to the patent field.
Background
With the rapid development of science and technology and the massive growth of documents recording scientific and technological achievements, patents are more and more regarded as one of the most important means for intellectual property protection. The patent documents describe the technical solutions involved in the most novel inventions, but the documents describing scientific and technological achievements include non-patent documents such as scientific research papers, technical reports, and the like, in addition to patents. There is a certain relation between patents and non-patents, for example, research on the relation between scientific research papers and patents can predict the technological development trend. The research on patent documents and non-patent scientific research documents can understand the latest technologies in various fields, thereby avoiding repeated development, avoiding infringement and even analyzing the development of the whole technical industry; the technical development status and strategy of competitors can be analyzed; invalid retrieval of patents can be achieved. The search for patent documents and non-patent documents is a relatively new problem in the field of patent research.
Patent documents usually refer to related patents or scientific research papers, and the relationship between non-patent documents and patent documents is only studied by using the reference relationship between patents and scientific research papers, which is very limited. Moreover, there are millions of patent documents in the patent database, and the patent operation simply by manual means is a time-consuming and labor-consuming task. How to retrieve relevant patents from a huge patent database and obtain useful patent information is a difficult problem in patent research.
The patent searching and classifying method includes two kinds, one is classified patent searching based on patent database and the other is natural language processing technology based searching method.
Most of the early patent retrieval methods are based on patent databases, for example, patent with publication number CN1996290A, and mainly utilize text information of patent structuring to extract patent citation relations and construct patent association graphs. Then, according to a certain patent inquiry condition, such as application number, patent number, application date, announcement date, inventor, patentee, etc., the patent is searched in the patent association diagram and the searched patent is searched. The method depends on a fixed structured text of the patent, is not intelligent enough, and does not analyze the patent content.
A method based on natural language processing refers to analyzing the content of a patent text by using a natural language processing technology, acquiring useful features representing a patent from texts such as a title, an abstract, a specification, a right specification and the like of the patent, giving weight information to the features, searching relevant patent texts, such as an article someiss in the Automatic Classification of u.s.patents (an author of the article is Leah s.larkey, and the article is a special invitation report on an AAAI-98 text Classification learning workshop), and introducing a method for classifying patents by using a natural language processing technology. Article POSTECH at NTCIR-5Patent Retrieval: smooth Experiments In a Language Modeling Approach patent Retrieval (the authors In In-Su Kang, Seung-Hoon Na, Jun-Kim, Jong-Hyeok Lee, published In Proceedings of NTCIR-5 Workshop Meeting, December 6-9, 2005, Tokyo, Japan) was achieved using natural Language processing techniques.
However, the existing method is only limited to keyword search, only aims at search among patent texts, does not consider the relation between non-patent texts and between non-patent texts and patent categories, and cannot realize intelligent full-text search of the non-patent texts and the patent texts.
Disclosure of Invention
Aiming at the defects that the relation between a non-patent text and a patent text, the relation between the non-patent text and the relation between the non-patent text and the patent category are not considered in the document retrieval in the patent field in the prior art, and the intelligent full-text retrieval of the non-patent text and the patent text cannot be realized, the technical problem to be solved by the invention is to provide a patent retrieval method, which can realize the feature vector representation of the patent text, calculate the similarity between the non-patent text and the related patent text, and retrieve the most related patent text.
In order to solve the technical problems, the patent retrieval method based on the natural language processing technology adopts the technical scheme, and comprises the following steps:
preprocessing the query text and the patent text;
retrieving patent texts related to the query text, obtaining values of different similarities by adopting a plurality of methods for calculating different similarities, combining the values of different similarities, recalculating the similarities, and sequencing the patent texts according to the new values of the similarities;
adopting a plurality of different decision methods to map the similarity ranking of the patent texts into different rankings of the relevance of the patent categories; integrating the relevance sorting results of a plurality of different patent categories, and re-sorting to obtain new relevance sorting of the patent categories;
and selecting the patent category most relevant to the query text from the relevance ranking of the new patent categories.
The text processing method comprises the steps of preprocessing a text to obtain candidates of feature words, counting data information of the feature words, selecting features by adopting a feature selection method, and converting the text into a vector representation form, and specifically comprises the following steps: removing labels which are not patent texts in the patent texts, extracting patent text information, and obtaining patent numbers, patent IPC category labels, patent names, abstract of the description, claims and the description; all capital words are reserved for English texts; removing words containing numbers; removing forbidden words; performing word type reduction processing on the English text to obtain a characteristic candidate word list; counting the characteristic candidate word list to obtain word frequency, document frequency and word category frequency information; and selecting a characteristic word list from the characteristic candidate words, calculating the characteristic weight of each characteristic word in the characteristic word list, and converting the patent text and the query text into a vector capable of being calculated according to the characteristic words and the characteristic weights thereof.
The similarity values of the query text and the patent text are obtained by the various different similarity calculation methods, the various different similarity values are integrated based on a Log-linear model, and the calculation formula is as follows:
wherein,is a query textAnd patent textSimilarity values obtained by adopting different similarity calculation methods are used as vectors formed by the characteristics,is a weight vector of similarity values obtained by adopting different similarity calculation methods, n is the total number of patent texts related to the query text,representing the kth relevant patent text vector.
The multiple different decision methods comprise a similarity adding method of patent category weights, a similarity adding method of patent text similarity ranking position weights and a patent text similarity adding method, wherein the similarity adding calculation formula of the patent category weights is as follows:
wherein k isrIs a penalty factor constant, k represents the number of candidate patent texts in the patent text similarity ranking result, ciRefers to the position obtained by sorting the patent categories to which the candidate patent text i belongs according to the similarity,is a query text and a patent text diICF is the reciprocal of the frequency of the class text, where CxThe number of texts in the category x, the total number of texts N, score (x) is a value for inquiring the correlation between the texts and the patent category x, and role (x, i) judges whether the patent text di belongs to the patent category x.
The similarity addition calculation formula of the patent text similarity ranking position weight is as follows:
the method integrates the relevance ranking results of a plurality of different patent categories, and the relevance ranking results of the patent categories are combined by adopting a method of a plurality of different similarity values and a plurality of different category decisions and serve as the characteristics of the patent category positions, and the relevance ranking results of the patent categories are combined based on a Rank-SVM model.
The integration of the relevance ranking results of the multiple different patent categories is to calculate a new relevance value of the patent category by adding the position values of the categories appearing in the relevance results of the multiple different patent categories.
The invention has the following beneficial effects and advantages:
1. the method adopts the natural language processing technology, utilizes a plurality of similarity calculation methods as the final balance of the correlation degree of the query text and the patent text, and fully utilizes the information of characteristic multi-angle. Finally, a plurality of system combinations are considered, the purpose of mutual complementation is achieved, and the system performance is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a text pre-processing flow diagram;
FIG. 3 is a flow chart of similarity calculation between query text and patent text;
FIG. 4 is a flowchart of the query text to patent category correlation calculation;
Detailed Description
The process according to the invention is further illustrated below with reference to examples and figures:
as shown in fig. 1, a method for retrieving a document in the patent domain includes the following steps:
preprocessing the query text and the patent text; retrieving patent texts related to the query text, obtaining values of different similarities by adopting a plurality of methods for calculating different similarities, combining the values of different similarities, recalculating the similarities, and sequencing the patent texts according to the new values of the similarities; adopting a plurality of different decision methods to map the similarity ranking of the patent texts into different ranking of the relevance of the patent categories, integrating the ranking results of the relevance of the plurality of different patent categories, and reordering to obtain new relevance ranking of the patent categories; and selecting the patent category most relevant to the query text from the relevance ranking of the new patent categories.
As shown in fig. 2, the preprocessing of the query text and the patent text includes the following steps:
a) removing labels which are not patent texts in the patent texts, extracting patent text information, and obtaining patent numbers, patent IPC category labels, patent names, abstract of the description, claims and the description; removing the internal non-letters or non-Chinese characters of the words in the obtained patent text information, such as: ' - ', ' (', '), etc.; all capital words are reserved for English texts; removing words containing numbers; remove stop words, such as: "mail", "said", etc. in English patents, and "step", "feature", etc. in Chinese patents, as well as prepositions, adverbs, articles, etc.; performing word type reduction processing on the English text to obtain a characteristic candidate word list;
b) counting the characteristic candidate word list to obtain word frequency, document frequency and word category frequency information;
c) and selecting a characteristic word list from the characteristic candidate words, calculating the characteristic weight of each characteristic word in the characteristic word list, and converting the patent text and the query text into a vector capable of being calculated according to the characteristic words and the characteristic weights thereof.
d) And constructing and storing inverted index documents for patent documents and patent text vectors by taking the feature words of the patents as index words.
As shown in fig. 3, the calculation method of the plurality of different similarities includes the following steps:
and finding the patent texts with the co-occurrence feature words with the query text in the patent text library to form a related patent text set.
The similarity between the relevant patents in the relevant patent text set and the query text is calculated, in this embodiment, a plurality of similarity calculation methods are adopted, wherein the similarity calculation method specifically includes the following steps:
1. vector cosine calculating method
Representing query text with a vector space modelAnd patent textThe cosine of the two vectors is calculated as:
BM25 calculation method
There are many variations of BM25, and in this example, BM25 is calculated as follows:
where n represents query textThe number of the feature words; f (t)i,D2) Is a feature word tiIn the patent textThe number of occurrences in (a);express patent textThe text length of (d); avgdl is the average length of text in the set of patent text associated with the query text; k is a radical of1And b is a free parameter, in this example, k1The value is 2.0, and the value of b is 0.75; IDF (t)i) Is the reciprocal of the document frequency and is the search term tiThe calculation formula is as follows:
where N is the total number of documents on the entire dataset, N (t)i) Means containing the search term tiThe number of documents.
SMART calculation method
The SMART algorithm is calculated as follows:
where T represents query textAnd patent textA set of co-occurring feature words of (a); tf isiIs the word frequency of the ith feature word in the text vector; n is the number of texts in all patent text sets, and N is the number of patent texts with ith characteristics; avtf is the average word frequency of the documents of the characteristic words in the relevant patent text set; utf is a patent text vectorThe number of feature words in (1); pivot is the average number of feature words per document in the entire patent text collection.
And respectively calculating to obtain similarity values of different query texts and patent texts by using three methods.
And carrying out normalization processing on the different similarity values obtained by the calculation methods to obtain the similarity value between 0 and 1.
And respectively taking logarithms of the different normalized similarity values.
Taking different similarity values after logarithm taking as the characteristics of the Log-linear model, and calculating the formula as follows:
wherein,is a query textAnd patent textSimilarity values obtained by adopting different similarity calculation methods are used as vectors formed by the characteristics,is a weight vector of similarity values obtained by adopting different similarity calculation methods, n is the total number of patent texts related to the query text,representing the kth relevant patent text vector.
As shown in fig. 4, the results of similarity ranking of different patent texts are calculated by using a plurality of different patent category decision methods, and the relevance ranking between the query text and the patent categories is calculated. In this embodiment, the patent category decision method adopted includes: the calculation method comprises the following steps:
1. the similarity addition method is calculated as the following formula:
wherein x represents the category of IPC, k represents the number of candidate patent texts in the similarity ranking result of the patent texts,representing the similarity value of the ith candidate patent text. role (x, i) judgment patent text diWhether it belongs to patent class x.
2. The patent category weight summation method comprises the following calculation formula:
wherein k isrIs a penalty factor constant, k represents the number of candidate patent texts in the patent text similarity ranking result, ciRefers to the position obtained by sorting the patent categories to which the candidate patent text i belongs according to the similarity,is a query text and a patent text diICF is the reciprocal of the frequency of the class text, where CxRefers to the number of texts under category x, N is the total number of texts, score (x) is the value of the relevance of the query text to patent category x. role (x, i) judgment patent text diWhether it belongs to patent class x.
3. The patent text similarity position weight adding method comprises the following calculation formula:
wherein k isiIs a penalty factor constant, k represents the number of candidate patent texts in the patent text similarity ranking result,is a query text and a patent text diThe similarity value of (a). role (x, i) judgment patent text diWhether it belongs to patent class x.
Combining the relevance sorting results 1-3 of a plurality of different patent categories, and re-sorting the category sorting results. There are various combinations, and the combination method adopted in this embodiment includes two types:
and combining the patent category relevance ranking results obtained by combining various different similarity values and various different category decision methods as the characteristics of the patent category positions, and combining the multiple patent category relevance ranking results based on a Rank-SVM model.
And adding the position values of the occurrence of the categories in the correlation results according to a plurality of different patent categories, and calculating to obtain a new value of the correlation of the patent categories.
And obtaining the similarity value of the query text and the patent text through the steps, sequencing according to the similarity value, and selecting the most relevant patent category of the query text.
The method of the present invention is not limited to the examples described in the collective embodiment method, and those skilled in the art can derive other embodiments from the apparent solution of the present invention, and also belong to the technical innovation scope of the present invention.
Claims (4)
1. A patent-field-oriented document retrieval method comprises the following steps:
preprocessing the query text and the patent text;
retrieving patent texts related to the query text, obtaining values of different similarities by adopting a plurality of methods for calculating different similarities, combining the values of different similarities, recalculating the similarities, and sequencing the patent texts according to the new values of the similarities;
adopting a plurality of different decision methods to map the similarity ranking of the patent texts into different rankings of the relevance of the patent categories; integrating the relevance sorting results of a plurality of different patent categories, and re-sorting to obtain new relevance sorting of the patent categories;
selecting a patent category most relevant to the query text from the relevance ranking of the new patent categories;
the similarity values of the query text and the patent text are obtained by the multiple different similarity calculation methods, the multiple different similarity values are integrated based on a log-linear model, and the calculation formula is as follows:
wherein,is a query textAnd patent textSimilarity values obtained by adopting different similarity calculation methods are used as vectors formed by the characteristics,is a weight vector of similarity values obtained by adopting different similarity calculation methods, n is the total number of patent texts related to the query text,representing a kth related patent text vector;
the multiple different decision methods comprise a similarity adding method of patent category weights, a similarity adding method of patent text similarity ranking position weights and a patent text similarity adding method, wherein the similarity adding calculation formula of the patent category weights is as follows:
wherein k isrIs a penalty factor constant, k represents the number of candidate patent texts in the patent text similarity ranking result, ciRefers to the position obtained by sorting the patent categories to which the candidate patent text i belongs according to the similarity,is a query text and a patent text diICF is the reciprocal of the frequency of the class text, where CxThe method comprises the steps of determining the number of texts under a category X, the total number of texts N, score (X) which is a value for inquiring the correlation between texts and a patent category X, and role (X, i) for judging whether a patent text di belongs to the patent category X;
the similarity addition calculation formula of the patent text similarity ranking position weight is as follows:
2. a patent-domain-oriented document retrieval method as recited in claim 1, wherein: the text processing method comprises the steps of preprocessing a text to obtain candidates of feature words, counting data information of the feature words, selecting features by adopting a feature selection method, and converting the text into a vector representation form, and specifically comprises the following steps:
removing labels which are not patent texts in the patent texts, extracting patent text information, and obtaining patent numbers, patent IPC category labels, patent names, abstract of the description, claims and the description; all capital words are reserved for English texts; removing words containing numbers; removing forbidden words; performing word type reduction processing on the English text to obtain a characteristic candidate word list;
counting the characteristic candidate word list to obtain word frequency, document frequency and word category frequency information;
and selecting a characteristic word list from the characteristic candidate words, calculating the characteristic weight of each characteristic word in the characteristic word list, and converting the patent text and the query text into a vector capable of being calculated according to the characteristic words and the characteristic weights thereof.
3. A patent-domain-oriented document retrieval method as claimed in claim 1, wherein: the integration of the multiple different patent category relevance ranking results is the patent category relevance ranking results combined by adopting multiple different similarity values and multiple different category decision methods, and the patent category relevance ranking results are used as the characteristics of the patent category positions and are combined based on the ranking-based support vector machine model.
4. A patent-domain-oriented document retrieval method as claimed in claim 1, wherein: the integration of the relevance ranking results of the multiple different patent categories is to calculate a new relevance value of the patent category by adding the position values of the categories appearing in the relevance results of the multiple different patent categories.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810012248A CN101625680B (en) | 2008-07-09 | 2008-07-09 | Document retrieval method in patent field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810012248A CN101625680B (en) | 2008-07-09 | 2008-07-09 | Document retrieval method in patent field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101625680A CN101625680A (en) | 2010-01-13 |
CN101625680B true CN101625680B (en) | 2012-08-29 |
Family
ID=41521531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200810012248A Expired - Fee Related CN101625680B (en) | 2008-07-09 | 2008-07-09 | Document retrieval method in patent field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101625680B (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9110971B2 (en) * | 2010-02-03 | 2015-08-18 | Thomson Reuters Global Resources | Method and system for ranking intellectual property documents using claim analysis |
US9189563B2 (en) | 2011-11-02 | 2015-11-17 | Microsoft Technology Licensing, Llc | Inheritance of rules across hierarchical levels |
US9558274B2 (en) * | 2011-11-02 | 2017-01-31 | Microsoft Technology Licensing, Llc | Routing query results |
CN102768679B (en) * | 2012-06-25 | 2015-04-22 | 深圳市汉络计算机技术有限公司 | Searching method and searching system |
CN103577462B (en) * | 2012-08-02 | 2018-10-16 | 北京百度网讯科技有限公司 | A kind of Document Classification Method and device |
CN103455609B (en) * | 2013-09-05 | 2017-06-16 | 江苏大学 | A kind of patent document similarity detection method based on kernel function Luke cores |
CN104778276A (en) * | 2015-04-29 | 2015-07-15 | 北京航空航天大学 | Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency) |
US10073890B1 (en) | 2015-08-03 | 2018-09-11 | Marca Research & Development International, Llc | Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm |
US10621499B1 (en) | 2015-08-03 | 2020-04-14 | Marca Research & Development International, Llc | Systems and methods for semantic understanding of digital information |
CN107193814B (en) * | 2016-03-14 | 2020-07-31 | 北京京东尚科信息技术有限公司 | Method and device for realizing automatic book sorting in digital reading |
US10540439B2 (en) | 2016-04-15 | 2020-01-21 | Marca Research & Development International, Llc | Systems and methods for identifying evidentiary information |
CN107153689A (en) * | 2017-04-29 | 2017-09-12 | 安徽富驰信息技术有限公司 | A kind of case search method based on Topic Similarity |
CN108090047B (en) * | 2018-01-10 | 2022-05-24 | 华南师范大学 | Text similarity determination method and equipment |
CN110633407B (en) | 2018-06-20 | 2022-05-24 | 百度在线网络技术(北京)有限公司 | Information retrieval method, device, equipment and computer readable medium |
CN109726401B (en) * | 2019-01-03 | 2022-09-23 | 中国联合网络通信集团有限公司 | Patent combination generation method and system |
CN109960757A (en) * | 2019-02-27 | 2019-07-02 | 北京搜狗科技发展有限公司 | Web search method and device |
CN110334269B (en) * | 2019-07-11 | 2021-05-07 | 中国船舶工业综合技术经济研究院 | Information retrieval method and system |
CN110516062B (en) * | 2019-08-26 | 2022-11-04 | 腾讯科技(深圳)有限公司 | Method and device for searching and processing document |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1758244A (en) * | 2004-04-30 | 2006-04-12 | 微软公司 | Method and system for ranking documents of a search result to improve diversity and information richness |
CN101030217A (en) * | 2007-03-22 | 2007-09-05 | 华中科技大学 | Method for indexing and acquiring semantic net information |
-
2008
- 2008-07-09 CN CN200810012248A patent/CN101625680B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1758244A (en) * | 2004-04-30 | 2006-04-12 | 微软公司 | Method and system for ranking documents of a search result to improve diversity and information richness |
CN101030217A (en) * | 2007-03-22 | 2007-09-05 | 华中科技大学 | Method for indexing and acquiring semantic net information |
Also Published As
Publication number | Publication date |
---|---|
CN101625680A (en) | 2010-01-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101625680B (en) | Document retrieval method in patent field | |
CN105653706B (en) | A kind of multilayer quotation based on literature content knowledge mapping recommends method | |
CN109101477B (en) | Enterprise field classification and enterprise keyword screening method | |
CN109670014B (en) | Paper author name disambiguation method based on rule matching and machine learning | |
CN108509629B (en) | Text emotion analysis method based on emotion dictionary and support vector machine | |
CN110543564B (en) | Domain label acquisition method based on topic model | |
Wang et al. | Ptr: Phrase-based topical ranking for automatic keyphrase extraction in scientific publications | |
CN108197117A (en) | A kind of Chinese text keyword extracting method based on document subject matter structure with semanteme | |
CN108763348B (en) | Classification improvement method for feature vectors of extended short text words | |
CN101097570A (en) | Advertisement classification method capable of automatic recognizing classified advertisement type | |
CN110543595B (en) | In-station searching system and method | |
CN105426426A (en) | KNN text classification method based on improved K-Medoids | |
CN107122382A (en) | A kind of patent classification method based on specification | |
CN105320646A (en) | Incremental clustering based news topic mining method and apparatus thereof | |
CN113312474A (en) | Similar case intelligent retrieval system of legal documents based on deep learning | |
CN108228612B (en) | Method and device for extracting network event keywords and emotional tendency | |
CN114491062B (en) | Short text classification method integrating knowledge graph and topic model | |
Alsharef et al. | Exploring the efficiency of text-similarity measures in automated resume screening for recruitment | |
Murthy et al. | A comparative study on term weighting methods for automated telugu text categorization with effective classifiers | |
CN115982359A (en) | Method, system, terminal and medium for extracting and aggregating efficacy words of files | |
CN103150371A (en) | Confusion removal text retrieval method based on positive and negative training | |
Guo et al. | A new method for rare feature extraction in patent documents | |
CN110807099A (en) | Text analysis retrieval method based on fuzzy set | |
Jiang et al. | Technical University of Munich | |
Gaur | Data mining and visualization on legal documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120829 |
|
CF01 | Termination of patent right due to non-payment of annual fee |