CN109255014A - The recognition methods of file keyword accuracy is promoted based on many algorithms - Google Patents

The recognition methods of file keyword accuracy is promoted based on many algorithms Download PDF

Info

Publication number
CN109255014A
CN109255014A CN201811210049.4A CN201811210049A CN109255014A CN 109255014 A CN109255014 A CN 109255014A CN 201811210049 A CN201811210049 A CN 201811210049A CN 109255014 A CN109255014 A CN 109255014A
Authority
CN
China
Prior art keywords
keyword
model
module
calculated result
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811210049.4A
Other languages
Chinese (zh)
Inventor
张永静
张彤
郝佳
高晓琼
李世成
郑春
郑春一
李景田
司敬
徐海
左晓辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinghang Computing Communication Research Institute
Original Assignee
Beijing Jinghang Computing Communication Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinghang Computing Communication Research Institute filed Critical Beijing Jinghang Computing Communication Research Institute
Priority to CN201811210049.4A priority Critical patent/CN109255014A/en
Publication of CN109255014A publication Critical patent/CN109255014A/en
Pending legal-status Critical Current

Links

Abstract

The invention belongs to keyword retrieval technical fields, and in particular to a kind of recognition methods that the accuracy of file keyword is promoted based on many algorithms.By comparing each algorithm to keyword hit-count, the weight ratio of each algorithm configuration can be configured voluntarily or using default configuration, be calculated according to the weight ratio of each algorithm hit-count, and as final result.Algorithm includes that the Chinese key extraction algorithm, the Chinese key extraction algorithm based on High Dimensional Clustering Analysis technology, algorithm of disjunctive model is used to accurately identify the method for extraction, semantic-based Chinese text keyword extraction algorithm, the Chinese key extraction algorithm based on model-naive Bayesian to file and file keyword.By this way, in keyword retrieval technical field, by the recognition methods for promoting the accuracy of file keyword based on many algorithms.

Description

The recognition methods of file keyword accuracy is promoted based on many algorithms
Technical field
The invention belongs to keyword retrieval technical fields, and in particular to one kind promotes file keyword standard based on many algorithms The recognition methods of exactness.
Background technique
In natural language processing field, the text file of magnanimity is handled it is crucial that the most concerned problem of user is mentioned It takes out.Regardless of being that can often spy upon the theme of entire text by several keywords for long text or short text Thought.At the same time, searched for whether based on the recommendation of text or text based, for text key word dependence also very Greatly, the order of accuarcy of keyword extraction is directly related to the final effect of recommender system or search system.Therefore, keyword mentions Taking in text mining field is a critically important part.
Keyword identification retrieval is based on Unified Policy, using deep content analysis, to static data, dynamic data and Data in use carry out the relevant technologies of instant identification, monitoring, protection.
Most of scheme mainly uses disjunctive model algorithm at present, extracts to key words and crucial word string is extracted.It is existing There is technical solution due to single using algorithm, and various algorithms have respective advantage and characteristic, are calculated using single algorithm crucial Word can not evade the drawbacks of algorithm itself.Therefore, the keyword identification technology accuracy used on the market at present has to be hoisted.
Summary of the invention
(1) technical problems to be solved
The technical problem to be solved by the present invention is how to solve at present since algorithm is single, can not be tied in conjunction with a variety of scannings Fruit carries out the problem of accurate comprehensive analysis.
(2) technical solution
In order to solve the above technical problems, the present invention provides a kind of knowledge for promoting the accuracy of file keyword based on many algorithms Other method, the recognition methods are implemented based on identifying system, and the identifying system includes: that original text input module, text are pre- Processing module, the Chinese key extraction module based on disjunctive model, the Chinese key based on High Dimensional Clustering Analysis technology extract mould Block, semantic-based Chinese key extraction module, the Chinese key extraction module based on model-naive Bayesian, algorithm power Again than distribution module, keyword recognition result generation module;Specifically,
The recognition methods includes the following steps:
Step 1: the original text of pending keyword identification is inputted by the original text input module;
Step 2: text formatting being carried out to the original text that original text input module inputs by the Text Pretreatment module and is turned Pretreatment is changed, the candidate word handled for subsequent recognizer is formed;
Step 3: by the Chinese key extraction module based on disjunctive model, disjunctive model is based on, to from text The candidate word of preprocessing module, carries out key words extraction and crucial word string is extracted, and generates the calculated result based on disjunctive model, Obtain keyword number of extracted information;
Step 4: by the Chinese key extraction module based on High Dimensional Clustering Analysis technology, it is based on High Dimensional Clustering Analysis technology, it is right Candidate word from Text Pretreatment module, carries out key words extraction and crucial word string is extracted, and generates based on High Dimensional Clustering Analysis skill The calculated result of art obtains keyword number of extracted information;
Step 5: by being set forth in semantic Chinese key extraction module, semantic-based Chinese text keyword extraction is calculated Method carries out key words extraction and crucial word string is extracted, generate semantic-based to the candidate word from Text Pretreatment module Calculated result obtains keyword number of extracted information;
Step 6: by the Chinese key extraction module based on model-naive Bayesian, being based on naive Bayesian mould Type carries out key words extraction and crucial word string is extracted, generate and be based on simple shellfish to the candidate word from Text Pretreatment module The calculated result of this model of leaf obtains keyword number of extracted information;
Step 7: by the algorithm weights than distribution module, configuring the above-mentioned calculated result based on disjunctive model, based on height Tie up each comfortable final pass of calculated result, semantic-based calculated result, the calculated result of model-naive Bayesian of clustering technique Weight ratio in keyword result operation generating process;
Step 8: by the keyword recognition result generation module, comparing the calculated result based on disjunctive model, be based on height It ties up in the calculated result, semantic-based calculated result, the calculated result of model-naive Bayesian of clustering technique respectively to key The hit-count of word, according to above-mentioned preconfigured weight ratio, COMPREHENSIVE CALCULATING obtains final keyword recognition result.
Wherein, which is characterized in that the Chinese key extraction module based on disjunctive model, using based on disjunctive model Chinese key extraction algorithm, the identification of keyword is extracted as a classification, to candidate keywords each in text area Divide keyword or non-key word.
Wherein, which is characterized in that the disjunctive model is respectively established to key words and crucial word string, in key In the selection of word feature, each model established respectively chooses different features.
Wherein, which is characterized in that the Chinese key extraction module of the High Dimensional Clustering Analysis technology, by according to small dictionary Fast word segmentation, secondary participle, High Dimensional Clustering Analysis and keyword select the extraction that four steps realize keyword.
Wherein, which is characterized in that the semantic-based Chinese key extraction module incorporates phrase semantic feature During keyword extraction, constructs semantic similarity network and utilize degree Density Metric phrase semantic criticality between two parties.
Wherein, which is characterized in that the Chinese key extraction module based on model-naive Bayesian passes through first Training process obtains the parameters in model-naive Bayesian, then takes it as a basis, and completes keyword in test process and mentions It takes.
Wherein, which is characterized in that the algorithm weights than distribution module according to 2:3:4:3 ratio-dependent it is above-mentioned based on point From the calculated result of model, the calculated result based on High Dimensional Clustering Analysis technology, semantic-based calculated result, model-naive Bayesian Each comfortable final keyword results operation generating process of calculated result in weight ratio.
Wherein, which is characterized in that the weight ratio of the 2:3:4:3 is default configuration.
Wherein, which is characterized in that the weight ratio is voluntarily to configure according to concrete application scene.
Wherein, the format of the original text includes WORD format, PDF format.
(3) beneficial effect
Compared with prior art, the present invention uses the Chinese key extraction algorithm of disjunctive model, is based on High Dimensional Clustering Analysis The Chinese key extraction algorithm of technology, semantic-based Chinese text keyword extraction algorithm are based on model-naive Bayesian Chinese key extraction algorithm, comprehensive matching judgement, come promoted keyword extraction identification accuracy.
Each algorithm is compared to keyword hit-count, the weight ratio default of each algorithm configuration is calculated using 2:3:4:3 Recognition result, weight can voluntarily be configured according to concrete application scene, be carried out according to the weight ratio of each algorithm to hit-count It calculates, and as final result.
By this way, in keyword retrieval technical field, by promoting the accuracy of file keyword based on many algorithms Recognition methods.
Detailed description of the invention
Fig. 1 is the schematic diagram of technical solution of the present invention.
Specific embodiment
To keep the purpose of the present invention, content and advantage clearer, with reference to the accompanying drawings and examples, to of the invention Specific embodiment is described in further detail.
In order to solve the above technical problems, the present invention provides a kind of knowledge for promoting the accuracy of file keyword based on many algorithms Other method, the recognition methods are implemented based on identifying system, and the identifying system includes: that original text input module, text are pre- Processing module, the Chinese key extraction module based on disjunctive model, the Chinese key based on High Dimensional Clustering Analysis technology extract mould Block, semantic-based Chinese key extraction module, the Chinese key extraction module based on model-naive Bayesian, algorithm power Again than distribution module, keyword recognition result generation module;Specifically,
The recognition methods includes the following steps:
Step 1: the original text of pending keyword identification is inputted by the original text input module;
Step 2: text formatting being carried out to the original text that original text input module inputs by the Text Pretreatment module and is turned Pretreatment is changed, the candidate word handled for subsequent recognizer is formed;
Step 3: by the Chinese key extraction module based on disjunctive model, disjunctive model is based on, to from text The candidate word of preprocessing module, carries out key words extraction and crucial word string is extracted, and generates the calculated result based on disjunctive model, Obtain keyword number of extracted information;
Step 4: by the Chinese key extraction module based on High Dimensional Clustering Analysis technology, it is based on High Dimensional Clustering Analysis technology, it is right Candidate word from Text Pretreatment module, carries out key words extraction and crucial word string is extracted, and generates based on High Dimensional Clustering Analysis skill The calculated result of art obtains keyword number of extracted information;
Step 5: by being set forth in semantic Chinese key extraction module, semantic-based Chinese text keyword extraction is calculated Method carries out key words extraction and crucial word string is extracted, generate semantic-based to the candidate word from Text Pretreatment module Calculated result obtains keyword number of extracted information;
Step 6: by the Chinese key extraction module based on model-naive Bayesian, being based on naive Bayesian mould Type carries out key words extraction and crucial word string is extracted, generate and be based on simple shellfish to the candidate word from Text Pretreatment module The calculated result of this model of leaf obtains keyword number of extracted information;
Step 7: by the algorithm weights than distribution module, configuring the above-mentioned calculated result based on disjunctive model, based on height Tie up each comfortable final pass of calculated result, semantic-based calculated result, the calculated result of model-naive Bayesian of clustering technique Weight ratio in keyword result operation generating process;
Step 8: by the keyword recognition result generation module, comparing the calculated result based on disjunctive model, be based on height It ties up in the calculated result, semantic-based calculated result, the calculated result of model-naive Bayesian of clustering technique respectively to key The hit-count of word, according to above-mentioned preconfigured weight ratio, COMPREHENSIVE CALCULATING obtains final keyword recognition result.
Wherein, which is characterized in that the Chinese key extraction module based on disjunctive model, using based on disjunctive model Chinese key extraction algorithm, the identification of keyword is extracted as a classification, to candidate keywords each in text area Divide keyword or non-key word.
Wherein, which is characterized in that the disjunctive model is respectively established to key words and crucial word string, in key In the selection of word feature, each model established respectively chooses different features.
Wherein, which is characterized in that the Chinese key extraction module of the High Dimensional Clustering Analysis technology, by according to small dictionary Fast word segmentation, secondary participle, High Dimensional Clustering Analysis and keyword select the extraction that four steps realize keyword.
Wherein, which is characterized in that the semantic-based Chinese key extraction module incorporates phrase semantic feature During keyword extraction, constructs semantic similarity network and utilize degree Density Metric phrase semantic criticality between two parties.
Wherein, which is characterized in that the Chinese key extraction module based on model-naive Bayesian passes through first Training process obtains the parameters in model-naive Bayesian, then takes it as a basis, and completes keyword in test process and mentions It takes.
Wherein, which is characterized in that the algorithm weights than distribution module according to 2:3:4:3 ratio-dependent it is above-mentioned based on point From the calculated result of model, the calculated result based on High Dimensional Clustering Analysis technology, semantic-based calculated result, model-naive Bayesian Each comfortable final keyword results operation generating process of calculated result in weight ratio.
Wherein, which is characterized in that the weight ratio of the 2:3:4:3 is default configuration.
Wherein, which is characterized in that the weight ratio is voluntarily to configure according to concrete application scene.
Wherein, the format of the original text includes WORD format, PDF format.
In addition, the present invention also provides a kind of identifying system for promoting the accuracy of file keyword based on many algorithms, such as Fig. 1 Shown, the identifying system includes:
Original text input module is used to input the original text of pending keyword identification;
Text Pretreatment module is used to carry out the original text that original text input module inputs at the pre- place of text formatting conversion Reason forms the candidate word handled for subsequent recognizer;
Chinese key extraction module based on disjunctive model is used for based on disjunctive model, to from Text Pretreatment The candidate word of module, carries out key words extraction and crucial word string is extracted, and generates the calculated result based on disjunctive model, acquisition is closed Keyword number of extracted information;
Chinese key extraction module based on High Dimensional Clustering Analysis technology is used for based on High Dimensional Clustering Analysis technology, to from text The candidate word of this preprocessing module, carries out key words extraction and crucial word string is extracted, and generates based on High Dimensional Clustering Analysis technology It calculates as a result, obtaining keyword number of extracted information;
Semantic-based Chinese key extraction module is used for semantic-based Chinese text keyword extraction (SKE) Algorithm carries out key words extraction and crucial word string is extracted, generate and be based on semanteme to the candidate word from Text Pretreatment module Calculated result, obtain keyword number of extracted information;
Chinese key extraction module based on model-naive Bayesian is used for based on model-naive Bayesian, to next From the candidate word of Text Pretreatment module, carries out key words extraction and crucial word string is extracted, generate based on naive Bayesian mould The calculated result of type obtains keyword number of extracted information;
Algorithm weights than distribution module, be used for concrete application scene configure the above-mentioned calculated result based on disjunctive model, Calculated result, semantic-based calculated result, each leisure of the calculated result of model-naive Bayesian based on High Dimensional Clustering Analysis technology Weight ratio in final keyword results operation generating process;
Keyword recognition result generation module is used to compare the calculated result based on disjunctive model, is based on High Dimensional Clustering Analysis The calculated result of technology, semantic-based calculated result, in the calculated result of model-naive Bayesian respectively to the life of keyword Middle number, according to above-mentioned preconfigured weight ratio, COMPREHENSIVE CALCULATING obtains final keyword recognition result.
Wherein, the Chinese key extraction module based on disjunctive model, it is crucial using the Chinese based on disjunctive model Word extraction algorithm extracts the identification of keyword as a classification, distinguishes keyword also to candidate keywords each in text It is non-keyword;
Wherein, disjunctive model is respectively established to key words and crucial word string, in the selection of keyword feature, The each model established respectively chooses different features.
Key words are extracted and crucial word string extracts the accuracy for improving extraction according to different features.The algorithm is Keyword identifies most common algorithm, and calculated result accounts for the 2/10 of result operation specific gravity.
Wherein, the Chinese key extraction module of the High Dimensional Clustering Analysis technology, to based on statistical information keyword extraction side The low problem of method accuracy rate proposes the Chinese key extraction algorithm based on High Dimensional Clustering Analysis technology;By according to the fast of small dictionary Speed participle, secondary participle, High Dimensional Clustering Analysis and keyword select the extraction that four steps realize keyword.
Theory analysis and experiment display, the Chinese key extracting method based on High Dimensional Clustering Analysis technology have better stabilization Property, higher efficiency and more accurate result.The algorithm speed is very fast and recognition accuracy is very high, and calculated result accounts for result operation The 3/10 of specific gravity.
Wherein, the semantic-based Chinese key extraction module, is mentioned using semantic-based Chinese text keyword Take (SKE) algorithm;During phrase semantic feature is incorporated keyword extraction by it, constructs semantic similarity network and utilize Density Metric phrase semantic criticality is spent between two parties.
Compared with the keyword extraction algorithm based on statistical nature, it is more excellent that SKE algorithm extracts key word algorithm performance.The calculation The keyword discrimination accuracy of method is high, and calculated result accounts for the 4/10 of result operation specific gravity.
Wherein, the Chinese key extraction module based on model-naive Bayesian, using based on naive Bayesian mould The Chinese key extraction algorithm of type;It obtains the parameters in model-naive Bayesian by training process first, then It takes it as a basis, completes keyword extraction in test process.Experiment shows that relative to traditional method, the algorithm can be from small rule More accurate keyword is extracted in the document sets of mould, and can neatly increase the characteristic item of characterization word importance, tool There is better scalability.The keyword of the algorithm identifies that accuracy is very high in small document, and calculated result accounts for result operation ratio The 3/10 of weight.
Wherein, the algorithm weights are more above-mentioned based on disjunctive model according to the ratio-dependent of 2:3:4:3 than distribution module Calculate result, the calculated result based on High Dimensional Clustering Analysis technology, semantic-based calculated result, the calculated result of model-naive Bayesian Weight ratio in each comfortable final keyword results operation generating process.
Wherein, the weight ratio of the 2:3:4:3 is default configuration.
Wherein, the weight ratio is voluntarily to configure according to concrete application scene.
Wherein, the format of the original text includes WORD format, PDF format.
Embodiment 1
The present embodiment provides a kind of methods for promoting the recognition accuracy of file keyword based on many algorithms, adopt to file With the Chinese key extraction algorithm of use disjunctive model, the Chinese key extraction algorithm based on High Dimensional Clustering Analysis technology, it is based on Semantic Chinese text keyword extraction (SKE) algorithm, the Chinese key extraction algorithm based on model-naive Bayesian carry out Keyword processing parsing simultaneously judges to promote accuracy by weight.
Wherein, the Chinese key extraction algorithm based on disjunctive model extracts and crucial word string key words It extracts, according to the Chinese key extraction algorithm based on disjunctive model, key words is extracted and crucial word string extracts the two Problem devises different features to improve the accuracy of extraction.
Wherein, the Chinese key extraction algorithm based on High Dimensional Clustering Analysis technology, to based on statistical information keyword The low problem of extracting method accuracy rate proposes the Chinese key extraction algorithm based on High Dimensional Clustering Analysis technology.Algorithm passes through foundation The fast word segmentation of small dictionary, secondary participle, High Dimensional Clustering Analysis and keyword select the extraction that four steps realize keyword.Theory point Analysis and experiment display, the Chinese key extracting method based on High Dimensional Clustering Analysis technology have better stability, higher efficiency And more accurate result.
Wherein, phrase semantic feature is incorporated and is closed by semantic-based Chinese text keyword extraction (SKE) algorithm In keyword extraction process, constructs semantic similarity network and utilize degree Density Metric phrase semantic criticality between two parties.With base It is compared in the keyword extraction algorithm of statistical nature, it is more excellent that SKE algorithm extracts key word algorithm performance.
Wherein, the Chinese key extraction algorithm based on model-naive Bayesian, the algorithm pass through training first Process obtains the parameters in model-naive Bayesian, then takes it as a basis, and completes keyword extraction in test process.It is real It tests and shows that, relative to traditional if*idf method, which can extract more accurate key from small-scale document sets Word, and can neatly increase the characteristic item of characterization word importance, there is better scalability.
Keyword is extracted by each algorithm, the keyword quantity to be accurately obtained in file/folder mentions It wins the confidence breath.Each algorithm is compared to keyword hit-count, the weight ratio default of each algorithm configuration is calculated using 2:3:4:3 to be known Not as a result, weight can voluntarily be configured according to concrete application scene, hit-count is counted according to the weight ratio of each algorithm It calculates, and as final result.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of recognition methods for promoting the accuracy of file keyword based on many algorithms, which is characterized in that the recognition methods Implemented based on identifying system, the identifying system includes: original text input module, Text Pretreatment module, based on splitting die The Chinese key extraction module of type, the Chinese key extraction module based on High Dimensional Clustering Analysis technology, semantic-based Chinese close Keyword extraction module, the Chinese key extraction module based on model-naive Bayesian, algorithm weights are than distribution module, keyword Recognition result generation module;Specifically,
The recognition methods includes the following steps:
Step 1: the original text of pending keyword identification is inputted by the original text input module;
Step 2: it is pre- that text formatting conversion being carried out to the original text that original text input module inputs by the Text Pretreatment module Processing forms the candidate word handled for subsequent recognizer;
Step 3: by the Chinese key extraction module based on disjunctive model, being based on disjunctive model, locate in advance to from text The candidate word of module is managed, key words extraction is carried out and crucial word string is extracted, generate the calculated result based on disjunctive model, acquisition Keyword number of extracted information;
Step 4: by the Chinese key extraction module based on High Dimensional Clustering Analysis technology, High Dimensional Clustering Analysis technology is based on, to coming from The candidate word of Text Pretreatment module, carries out key words extraction and crucial word string is extracted, and generates based on High Dimensional Clustering Analysis technology Calculated result obtains keyword number of extracted information;
Step 5: the Chinese key extraction module by being set forth in semanteme, semantic-based Chinese text keyword extraction algorithm, To the candidate word from Text Pretreatment module, carries out key words extraction and crucial word string is extracted, generate semantic-based meter It calculates as a result, obtaining keyword number of extracted information;
Step 6: by the Chinese key extraction module based on model-naive Bayesian, it is based on model-naive Bayesian, it is right Candidate word from Text Pretreatment module, carries out key words extraction and crucial word string is extracted, and generates based on naive Bayesian The calculated result of model obtains keyword number of extracted information;
Step 7: by the algorithm weights than distribution module, configuring the above-mentioned calculated result based on disjunctive model, gathered based on higher-dimension Each final keyword of leisure of the calculated result of class technology, semantic-based calculated result, the calculated result of model-naive Bayesian As a result the weight ratio in operation generating process;
Step 8: by the keyword recognition result generation module, comparing the calculated result based on disjunctive model, gathered based on higher-dimension The calculated result of class technology, semantic-based calculated result, in the calculated result of model-naive Bayesian respectively to keyword Hit-count, according to above-mentioned preconfigured weight ratio, COMPREHENSIVE CALCULATING obtains final keyword recognition result.
2. the recognition methods of file keyword accuracy is promoted based on many algorithms as described in claim 1, which is characterized in that The Chinese key extraction module based on disjunctive model, using the Chinese key extraction algorithm based on disjunctive model, The identification of keyword is extracted as a classification, distinguishes keyword or non-key word to candidate keywords each in text.
3. the recognition methods of file keyword accuracy is promoted based on many algorithms as claimed in claim 2, which is characterized in that The disjunctive model is respectively established to key words and crucial word string, in the selection of keyword feature, is established respectively Each model choose different features.
4. the recognition methods of file keyword accuracy is promoted based on many algorithms as described in claim 1, which is characterized in that The Chinese key extraction module of the High Dimensional Clustering Analysis technology, it is poly- by fast word segmentation, secondary participle, the higher-dimension according to small dictionary Class and keyword select the extraction that four steps realize keyword.
5. the recognition methods of file keyword accuracy is promoted based on many algorithms as described in claim 1, which is characterized in that The semantic-based Chinese key extraction module during phrase semantic feature is incorporated keyword extraction, constructs word Language semantic similarity network simultaneously utilizes degree Density Metric phrase semantic criticality between two parties.
6. the recognition methods of file keyword accuracy is promoted based on many algorithms as described in claim 1, which is characterized in that The Chinese key extraction module based on model-naive Bayesian obtains naive Bayesian mould by training process first Parameters in type, then take it as a basis, and complete keyword extraction in test process.
7. the recognition methods of file keyword accuracy is promoted based on many algorithms as described in claim 1, which is characterized in that The algorithm weights are than distribution module according to the above-mentioned calculated result based on disjunctive model of ratio-dependent of 2:3:4:3, based on high Tie up each comfortable final pass of calculated result, semantic-based calculated result, the calculated result of model-naive Bayesian of clustering technique Weight ratio in keyword result operation generating process.
8. the recognition methods of file keyword accuracy is promoted based on many algorithms as claimed in claim 7, which is characterized in that The weight ratio of the 2:3:4:3 is default configuration.
9. the recognition methods of file keyword accuracy is promoted based on many algorithms as described in claim 1, which is characterized in that The weight ratio is voluntarily to configure according to concrete application scene.
10. promoting the recognition methods of file keyword accuracy based on many algorithms as described in claim 1, feature exists In the format of the original text includes WORD format, PDF format.
CN201811210049.4A 2018-10-17 2018-10-17 The recognition methods of file keyword accuracy is promoted based on many algorithms Pending CN109255014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811210049.4A CN109255014A (en) 2018-10-17 2018-10-17 The recognition methods of file keyword accuracy is promoted based on many algorithms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811210049.4A CN109255014A (en) 2018-10-17 2018-10-17 The recognition methods of file keyword accuracy is promoted based on many algorithms

Publications (1)

Publication Number Publication Date
CN109255014A true CN109255014A (en) 2019-01-22

Family

ID=65045874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811210049.4A Pending CN109255014A (en) 2018-10-17 2018-10-17 The recognition methods of file keyword accuracy is promoted based on many algorithms

Country Status (1)

Country Link
CN (1) CN109255014A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339770A (en) * 2020-02-18 2020-06-26 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN111726285A (en) * 2020-08-21 2020-09-29 支付宝(杭州)信息技术有限公司 Instant messaging method and device
CN112307175A (en) * 2020-12-02 2021-02-02 龙马智芯(珠海横琴)科技有限公司 Text processing method, text processing device, server and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
US20170140464A1 (en) * 2015-11-16 2017-05-18 Uberple Co., Ltd. Method and apparatus for evaluating relevance of keyword to asset price
CN107480858A (en) * 2017-07-10 2017-12-15 武汉楚鼎信息技术有限公司 A kind of Aided intelligent decision-making and method based on the analysis of stock big data
CN108595425A (en) * 2018-04-20 2018-09-28 昆明理工大学 Based on theme and semantic dialogue language material keyword abstraction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170140464A1 (en) * 2015-11-16 2017-05-18 Uberple Co., Ltd. Method and apparatus for evaluating relevance of keyword to asset price
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN107480858A (en) * 2017-07-10 2017-12-15 武汉楚鼎信息技术有限公司 A kind of Aided intelligent decision-making and method based on the analysis of stock big data
CN108595425A (en) * 2018-04-20 2018-09-28 昆明理工大学 Based on theme and semantic dialogue language material keyword abstraction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王博: "基于云计算的多层次文本关键词抽取研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339770A (en) * 2020-02-18 2020-06-26 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN111339770B (en) * 2020-02-18 2023-07-21 百度在线网络技术(北京)有限公司 Method and device for outputting information
CN111726285A (en) * 2020-08-21 2020-09-29 支付宝(杭州)信息技术有限公司 Instant messaging method and device
CN112307175A (en) * 2020-12-02 2021-02-02 龙马智芯(珠海横琴)科技有限公司 Text processing method, text processing device, server and computer readable storage medium
CN112307175B (en) * 2020-12-02 2021-11-02 龙马智芯(珠海横琴)科技有限公司 Text processing method, text processing device, server and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN104376406B (en) A kind of enterprise innovation resource management and analysis method based on big data
CN107515877B (en) Sensitive subject word set generation method and device
Li et al. Twiner: named entity recognition in targeted twitter stream
CN106294593B (en) In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN106649490B (en) Image retrieval method and device based on depth features
TW201737118A (en) Method and device for webpage text classification, method and device for webpage text recognition
CN104881458B (en) A kind of mask method and device of Web page subject
CN108009135B (en) Method and device for generating document abstract
CN105824959A (en) Public opinion monitoring method and system
CN104866558B (en) A kind of social networks account mapping model training method and mapping method and system
CN110134792B (en) Text recognition method and device, electronic equipment and storage medium
CN112347223B (en) Document retrieval method, apparatus, and computer-readable storage medium
WO2023071118A1 (en) Method and system for calculating text similarity, device, and storage medium
CN109255014A (en) The recognition methods of file keyword accuracy is promoted based on many algorithms
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN110222250A (en) A kind of emergency event triggering word recognition method towards microblogging
CN109033212A (en) A kind of file classification method based on similarity mode
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
CN110489745A (en) The detection method of paper text similarity based on citation network
US20160283582A1 (en) Device and method for detecting similar text, and application
CN111368539A (en) Hotspot analysis modeling method
CN105678244A (en) Approximate video retrieval method based on improvement of editing distance
CN114491062A (en) Short text classification method fusing knowledge graph and topic model
CN110674243A (en) Corpus index construction method based on dynamic K-means algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhang Yongjing

Inventor after: Xu Hai

Inventor after: Zuo Xiaohui

Inventor after: Wang Jun

Inventor after: Zhang Tong

Inventor after: Hao Jia

Inventor after: Gao Xiaoqiong

Inventor after: Li Shicheng

Inventor after: Zheng Chunyi

Inventor after: Li Jingtian

Inventor after: Si Jing

Inventor before: Zhang Yongjing

Inventor before: Zuo Xiaohui

Inventor before: Zhang Tong

Inventor before: Hao Jia

Inventor before: Gao Xiaoqiong

Inventor before: Li Shicheng

Inventor before: Zheng Chunyi

Inventor before: Li Jingtian

Inventor before: Si Jing

Inventor before: Xu Hai

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190122