CN107402960A - A kind of inverted index optimized algorithm based on the weighting of the semantic tone - Google Patents
A kind of inverted index optimized algorithm based on the weighting of the semantic tone Download PDFInfo
- Publication number
- CN107402960A CN107402960A CN201710453251.9A CN201710453251A CN107402960A CN 107402960 A CN107402960 A CN 107402960A CN 201710453251 A CN201710453251 A CN 201710453251A CN 107402960 A CN107402960 A CN 107402960A
- Authority
- CN
- China
- Prior art keywords
- semantic
- phrase
- keyword
- tone
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/328—Management therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/319—Inverted lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The present invention discloses a kind of inverted index optimized algorithm based on the weighting of the semantic tone, the present invention relates to document information-handling technique field, solving prior art, inverted index accuracy is low and the big technical problem of index difficulty due to existing using only specific word word frequency index, and solve prior art due to keyword sequence and semantic weighted words stock repeat and caused by semantic weighting it is invalid or the technical problems such as material change can not be realized to inverted index.The present invention is mainly in combination with document semantic tone feature, construct brand-new Weighted Term Frequency definition, inverted entry is realized to sort according to Weighted Term Frequency, not only it is demonstrated by the word frequency density of keyword in a document, the intensity of expressing the meaning of keyword is also reflected, the user that can more assist search preferentially finds desired document.
Description
Technical field
The present invention relates to document information-handling technique field, and in particular to a kind of inverted index based on the weighting of the semantic tone
Optimized algorithm.
Background technology
Search engine at present, have become the most frequently used internet appliance, data tissue and index and information science field
Study hotspot.Inverted entry model reversely searches association information, has adapted to the work of search engine very well according to word frequency
Make scene.But word frequency is based solely on, and the weights ordering strategy that word-based frequency meter is calculated, it is impossible to completely reflect the keyword
Degree of expressing the meaning in a document.
The present invention has further completely quantified keyword to the important of document representation based on semantic and tone weighting processing
Property, the inverted entry indexing means based on the Weighted Term Frequency, it can preferably help user to find corresponding document and information.
The content of the invention
For above-mentioned prior art, present invention aims at provide a kind of inverted index optimization based on the weighting of the semantic tone
Algorithm, solves prior art due to having that inverted index accuracy is low using only specific word word frequency index and index difficulty is big
Technical problem, and solve prior art due to keyword sequence and semantic weighted words stock repeat and caused by semantic weighting
It is invalid or the technical problems such as material change can not be realized to inverted index.
To reach above-mentioned purpose, the technical solution adopted by the present invention is as follows:
A kind of inverted index optimized algorithm based on the weighting of the semantic tone, comprises the following steps:
Step 1, default semantic deactivation phrase, then enhancing semanteme phrase and reduction language with different semantic weighted values are set
Adopted phrase, and as the semantic subset for disabling phrase;
Step 2, cutting word processing is carried out to each input document, obtain orderly sequence of terms;
Step 3, orderly sequence of terms is disabled into phrase with semanteme matched, filtered out in matching process and appear in semanteme and stop
The keyword sequence of input document is obtained with the phrase in phrase;
Step 4, traversal keyword sequence, obtain current key word tone weighted value after, current key lexeme put to
There is the phrase matched in position range in inquiry document phrase with strengthening semantic phrase and the semantic phrase of reduction in its last time, by institute
The semantic weighted value combination tone weighted value for matching phrase calculates the Weighted Term Frequency of current key word, and text is obtained after the completion of traversal
The Weighted Term Frequency of shelves;
Step 5, arranged according to document Weighted Term Frequency, obtain the document sequence of optimiged index.
In such scheme, described step 1, being set by degree adverb strengthens semantic phrase and the semantic phrase of reduction.
In such scheme, described step 4, wherein, its language is determined by prototype statement sentence tail feature where current key word
Gas weighted value.
In such scheme, described step 4, wherein, obtaining tone weighted value includes:
Step 1., define the default tone weighted value of punctuate association of prototype statement;
2., by prototype statement sentence tail tag point where current key word step obtains its tone weighted value.
In such scheme, described step 4, wherein, define the keyword key currently indexed in former sentence jindexWeighting
Word frequency fkeyFor:
WiFor keyword key semantic weighted value, n represents the quantity of keyword key in document, m represent keyword key and
The semantic phrase quantity matched before between keyword with strengthening semantic phrase and the semantic phrase of reduction, WjFor tone weighted value.
A kind of method for determining document Weighted Term Frequency, comprises the following steps:
Step 1, the dictionary with different semantic weighted values is set;
Step 2, the keyword phrase and dictionary of document matched, and all keyword phrases not being matched are made
For keyword sequence;
Step 3, prototype statement sentence tail feature divide quantitative, determines tone weighted value corresponding to every kind of tail feature,
The tone weighted value of corresponding keyword is determined by the sentence tail feature of prototype statement where each keyword in keyword sequence again;
Step 4, put to its last time to occur inquiring about in document phrase in position range in current key lexeme and match with dictionary
Phrase, the semantic weighted value of current key word is obtained by the phrase of matching, passes through weight product meter with reference to tone weighted value
The Weighted Term Frequency of current key word is calculated, then travels through keyword sequence, the Weighted Term Frequency of document is gone out by read group total.
In such scheme, described step 1, semantic deactivation phrase is preset, then set the semantic phrase of enhancing and reduction semantic
Phrase and the subset as semantic deactivation phrase.
A kind of method for determining keywords semantics weighted value, comprises the following steps:
Step 1, enhancing semanteme phrase and the semantic phrase of reduction for sky with keyword phrase common factor are set, strengthen semantic word
Group and the semantic phrase of reduction possess different semantic weighted values respectively;
Step 2, occur in keyword position to its last time inquiring about in position range in document phrase with strengthening semantic phrase
With the phrase for weakening semantic phrase matching;
Step 3, the semantic weighted value possessed according to the phrase of matching, the language of the keyword is calculated by weight product
Adopted weighted value.
Compared with prior art, beneficial effects of the present invention:
Inverted entry of the present invention sorts according to Weighted Term Frequency, is not only demonstrated by the word frequency density of keyword in a document,
The intensity of expressing the meaning of keyword is also reflected, the user that can more assist search preferentially finds desired document, and prior art is deposited
How can just construct disable phrase and define its subset enhancing/reduction semanteme phrase (for after filtering out first again
With), how using tone semanteme accurate quantitative analysis weighted sum how to avoid with semantic phrase repeat keyword cause semantic weighting
Invalid technology barriers.
Brief description of the drawings
Fig. 1 is the main handling process schematic diagram of the present invention;
Fig. 2 is the handling process schematic diagram of the embodiment of the present invention.
Embodiment
All features disclosed in this specification, or disclosed all methods or during the step of, except mutually exclusive
Feature and/or step beyond, can combine in any way.
The present invention will be further described below in conjunction with the accompanying drawings:
A kind of inverted index optimized algorithm based on the weighting of the semantic tone, is comprised the following steps:
S0, the semantic phrase S (pos) of enhancing is preset, preset the semantic phrase S (neg) of reduction, preset semanteme and disable phrase S
(stop), and S (pos) and S (neg) are S (stop) subsets;
S1, cutting word processing is carried out to each input document, by the sequence L (org) that document representation is an orderly word;
S2, stop words in L (org) is handled, the word in set L (org) is gradually scanned, filters out S
(stop) word occurred in, document keyword sequence L (key) is obtained;
S3, the Weighted Term Frequency for calculating each keyword, and the tone of prototype statement where checking keyword, and do and:Wherein WiFor keyword key semantic weighted value, n is represented in document comprising pass
Keyword key quantity, m represent keyword key and the enhancing between keyword before/reduction semanteme word quantity, WjFor the tone plus
Weights;
S4, arranged according to document Weighted Term Frequency, in collection of document, document is indexed sequence according to Weighted Term Frequency;
Further, the semantic word of enhancing is preset, the adverbial word, auxiliary word etc. of positive reinforcement phrase semantic is represented, strengthens semantic word
Group is included and is not limited to for example " very " " genuine " " special " " very " " very " " suitable ".
Further, the semantic phrase of reduction is preset, represents reduction semantic meaning representation, reduces the adverbial word that statement determines row, auxiliary word
Deng, reduction give phrase include be not limited to for example:" possibility " " general " " a little " " indistinct " " seeming " " whether ".
Further, two phrases are default resource.
Further, as shown in Weighted Term Frequency calculation formula, on the semantic weight of each keyword, with its front position
The product positive correlation of each semantic word weights.If without semantic word, weights 1 before keyword.
Further, WjFor tone weighted value.The tone of the tone weighted value from sentence, it is weighted to each pass of the sentence
On keyword, with the sentence of fullstop ending, its tone is states, weighted value 1;The sentence to be ended up with exclamation, its tone are exclamation
With pray making, the tone is strong, weighted value be more than 1;With question mark end up sentence, its tone be query, ask in reply, show oneself or it is right
The uncertainty of side, tone reduction is semantic, and weighted value is less than 1.
Embodiment 1
Such as Fig. 2, so, a kind of implementation process of the present invention is:
S01, the default semantically enhancement dictionary S (pos) of loading and semantic reduction dictionary S (neg), and corresponding weight value;
S02, read any one document in document library;
S03, document is segmented, obtain document word order and represent L (org);Stop words is filtered to L (org), obtains crucial word order
L(key);
S04, to each key in L (key), sentence according to where it, obtain its tone weighted value Wj (key);
S05, traversal L (key), record each keyword key and its adjacent keyword key-1 in left side;
S06, traversal L (org), to the word between key-1 and key, search whether exist in S (pos) and S (neg);
S07, after finding existing semantic reinforcing/reduction word, on Weighted Term Frequency that its weighted value is taken to keyword key,
Sentence punctuate where keyword key is found, according to punctuate by tone weight, on the Weighted Term Frequency of product to keyword key;
S08, in ergodic process, Weighted Term Frequencies of the key being calculated under current context, be summed into keyword
Key is on the Weighted Term Frequency under linguistic context before;
If S09, L (key) also have untreated keyword, S5 steps are redirected, are continued;
If S10, collection of document also have untreated document, S2 steps are redirected, are continued;
S11, with each one group of lists of documents of keyword key indexes, the position of lists of documents, according to keyword key at this
Weighted Term Frequency in document falls to sort;
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Belong to those skilled in the art the invention discloses technical scope in, the change or replacement that can readily occur in, all should
It is included within the scope of the present invention.
Claims (8)
1. a kind of inverted index optimized algorithm based on the weighting of the semantic tone, it is characterised in that comprise the following steps:
Step 1, default semantic deactivation phrase S (stop), then the enhancing semanteme phrase S with different semantic weighted values is set
(pos) and semantic phrase S (neg) is weakened, and as the semantic subset for disabling phrase S (stop);
Step 2, cutting word processing is carried out to each input document, obtain orderly sequence of terms L (org);
Step 3, by orderly sequence of terms L (org) with semanteme disable phrase S (stop) matched, filtered out out in matching process
The semantic phrase disabled in phrase S (stop) now, obtain the keyword sequence L (key) of input document;
Step 4, keyword sequence L (key) is traveled through, after the tone weighted value for obtaining current key word, put in current key lexeme
Occurred inquiring about in position range in document phrase with strengthening semantic phrase S (pos) and the semantic phrase S (neg) of reduction to its last time
The phrase of matching, the Weighted Term Frequency of current key word is calculated by the semantic weighted value combination tone weighted value of matched phrase,
The Weighted Term Frequency of document is obtained after the completion of traversal;
Step 5, arranged according to document Weighted Term Frequency, obtain the document sequence of optimiged index.
A kind of 2. inverted index optimized algorithm based on the weighting of the semantic tone according to claim 1, it is characterised in that institute
The step 1 stated, being set by degree adverb strengthens semantic phrase S (pos) and the semantic phrase S (neg) of reduction.
A kind of 3. inverted index optimized algorithm based on the weighting of the semantic tone according to claim 1, it is characterised in that institute
The step 4 stated, wherein, its tone weighted value is determined by prototype statement sentence tail feature where current key word.
A kind of 4. inverted index optimized algorithm based on the weighting of the semantic tone according to claim 3, it is characterised in that institute
The step 4 stated, wherein, obtaining tone weighted value includes:
Step 1., define the default tone weighted value of punctuate association of prototype statement;
2., by prototype statement sentence tail tag point where current key word step obtains its tone weighted value.
A kind of 5. inverted index optimized algorithm based on the weighting of the semantic tone according to claim 1, it is characterised in that institute
The step 4 stated, wherein, define current key word key in former sentence jindexWeighted Term Frequency fkeyFor:
<mrow>
<msub>
<mi>f</mi>
<mrow>
<mi>k</mi>
<mi>e</mi>
<mi>y</mi>
</mrow>
</msub>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mi>n</mi>
<mi>d</mi>
<mi>e</mi>
<mi>x</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<munderover>
<mi>&Pi;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msub>
<mi>W</mi>
<mi>i</mi>
</msub>
<msub>
<mi>Key</mi>
<mrow>
<mi>i</mi>
<mi>n</mi>
<mi>d</mi>
<mi>e</mi>
<mi>x</mi>
</mrow>
</msub>
<msub>
<mi>W</mi>
<mi>j</mi>
</msub>
</mrow>
WiFor keyword key semantic weighted value, n represents the quantity of keyword key in document, and m represents keyword key and before
The semantic phrase quantity matched between keyword with strengthening semantic phrase S (pos) and the semantic phrase S (neg) of reduction, WjFor the tone
Weighted value.
A kind of 6. method for determining document Weighted Term Frequency, it is characterised in that comprise the following steps:
Step 1, the dictionary with different semantic weighted values is set;
Step 2, the keyword phrase and dictionary of document matched, and using all keyword phrases not being matched as closing
Keyword sequence L (key);
Step 3, prototype statement sentence tail feature divide quantitative, determine tone weighted value corresponding to every kind of tail feature, then lead to
The sentence tail feature for crossing prototype statement where each keyword in keyword sequence L (key) determines the tone weighting of corresponding keyword
Value;
Step 4, the word that occurs position range in inquiry document phrase with dictionary match was put to its last time in current key lexeme
Group, the semantic weighted value of current key word is obtained by the phrase of matching, calculated with reference to tone weighted value by weight product
The Weighted Term Frequency of current key word, keyword sequence L (key) is then traveled through, the Weighted Term Frequency of document is gone out by read group total.
A kind of 7. method for determining document Weighted Term Frequency according to claim 6, it is characterised in that described step 1, in advance
If semanteme disables phrase S (stop), then sets the semantic phrase S (pos) of enhancing and the semantic phrase S (neg) of reduction and be used as language
Justice disables phrase S (stop) subset.
A kind of 8. method for determining keywords semantics weighted value, it is characterised in that comprise the following steps:
Step 1, enhancing semanteme phrase S (pos) and the semantic phrase S (neg) of reduction occured simultaneously with keyword phrase for sky are set, increased
Strong semantic phrase S (pos) and the semantic phrase S (neg) of reduction possess different semantic weighted values respectively;
Step 2, occur in keyword position to its last time inquiring about in position range in document phrase with strengthening semantic phrase S
(pos) and reduction semantic phrase S (neg) matching phrase;
Step 3, the semantic weighted value possessed according to the phrase of matching, the semanteme that the keyword is calculated by weight product add
Weights.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710453251.9A CN107402960B (en) | 2017-06-15 | 2017-06-15 | Reverse index optimization algorithm based on semantic mood weighting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710453251.9A CN107402960B (en) | 2017-06-15 | 2017-06-15 | Reverse index optimization algorithm based on semantic mood weighting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107402960A true CN107402960A (en) | 2017-11-28 |
CN107402960B CN107402960B (en) | 2020-11-10 |
Family
ID=60404390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710453251.9A Active CN107402960B (en) | 2017-06-15 | 2017-06-15 | Reverse index optimization algorithm based on semantic mood weighting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107402960B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033478A (en) * | 2018-09-12 | 2018-12-18 | 重庆工业职业技术学院 | A kind of text information law analytical method and system for search engine |
CN109710796A (en) * | 2019-01-14 | 2019-05-03 | Oppo广东移动通信有限公司 | Voice-based image searching method, device, storage medium and terminal |
CN109977292A (en) * | 2019-03-21 | 2019-07-05 | 腾讯科技(深圳)有限公司 | Searching method, calculates equipment and computer readable storage medium at device |
CN111506726A (en) * | 2020-03-18 | 2020-08-07 | 大箴(杭州)科技有限公司 | Short text clustering method and device based on part-of-speech coding and computer equipment |
CN112559474A (en) * | 2019-09-26 | 2021-03-26 | 中国电信股份有限公司 | Log processing method and device |
CN117009384A (en) * | 2023-09-27 | 2023-11-07 | 湖南立人科技有限公司 | List query method based on quick search algorithm |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060106767A1 (en) * | 2004-11-12 | 2006-05-18 | Fuji Xerox Co., Ltd. | System and method for identifying query-relevant keywords in documents with latent semantic analysis |
CN101499091A (en) * | 2009-03-17 | 2009-08-05 | 辽宁般若网络科技有限公司 | Web page representative words recommending method |
CN101826102A (en) * | 2010-03-26 | 2010-09-08 | 浙江大学 | Automatic book keyword generation method |
CN103064969A (en) * | 2012-12-31 | 2013-04-24 | 武汉传神信息技术有限公司 | Method for automatically creating keyword index table |
CN103377232A (en) * | 2012-04-25 | 2013-10-30 | 阿里巴巴集团控股有限公司 | Headline keyword recommendation method and system |
CN103530789A (en) * | 2012-07-03 | 2014-01-22 | 百度在线网络技术(北京)有限公司 | Method, device and apparatus for determining key index terms |
CN103699567A (en) * | 2013-11-04 | 2014-04-02 | 北京中搜网络技术股份有限公司 | Method for realizing same news clustering based on title fingerprint and text fingerprint |
CN105095223A (en) * | 2014-04-25 | 2015-11-25 | 阿里巴巴集团控股有限公司 | Method for classifying texts and server |
CN106126561A (en) * | 2016-06-16 | 2016-11-16 | 北京百度网讯科技有限公司 | The generation method and device of Search Results summary |
CN106557508A (en) * | 2015-09-28 | 2017-04-05 | 北京神州泰岳软件股份有限公司 | A kind of text key word extracting method and device |
-
2017
- 2017-06-15 CN CN201710453251.9A patent/CN107402960B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060106767A1 (en) * | 2004-11-12 | 2006-05-18 | Fuji Xerox Co., Ltd. | System and method for identifying query-relevant keywords in documents with latent semantic analysis |
CN101499091A (en) * | 2009-03-17 | 2009-08-05 | 辽宁般若网络科技有限公司 | Web page representative words recommending method |
CN101826102A (en) * | 2010-03-26 | 2010-09-08 | 浙江大学 | Automatic book keyword generation method |
CN103377232A (en) * | 2012-04-25 | 2013-10-30 | 阿里巴巴集团控股有限公司 | Headline keyword recommendation method and system |
CN103530789A (en) * | 2012-07-03 | 2014-01-22 | 百度在线网络技术(北京)有限公司 | Method, device and apparatus for determining key index terms |
CN103064969A (en) * | 2012-12-31 | 2013-04-24 | 武汉传神信息技术有限公司 | Method for automatically creating keyword index table |
CN103699567A (en) * | 2013-11-04 | 2014-04-02 | 北京中搜网络技术股份有限公司 | Method for realizing same news clustering based on title fingerprint and text fingerprint |
CN105095223A (en) * | 2014-04-25 | 2015-11-25 | 阿里巴巴集团控股有限公司 | Method for classifying texts and server |
CN106557508A (en) * | 2015-09-28 | 2017-04-05 | 北京神州泰岳软件股份有限公司 | A kind of text key word extracting method and device |
CN106126561A (en) * | 2016-06-16 | 2016-11-16 | 北京百度网讯科技有限公司 | The generation method and device of Search Results summary |
Non-Patent Citations (2)
Title |
---|
TIAN XIA,YANMEI CHAI: ""An Improvement to TF-IDF: Term Distribution"", 《JOURNAL OF SOFTWARE》 * |
张建娥: ""基于TFIDF和词语关联度的中文关键词提取方法"", 《情报科学》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033478A (en) * | 2018-09-12 | 2018-12-18 | 重庆工业职业技术学院 | A kind of text information law analytical method and system for search engine |
CN109710796A (en) * | 2019-01-14 | 2019-05-03 | Oppo广东移动通信有限公司 | Voice-based image searching method, device, storage medium and terminal |
CN109977292A (en) * | 2019-03-21 | 2019-07-05 | 腾讯科技(深圳)有限公司 | Searching method, calculates equipment and computer readable storage medium at device |
CN109977292B (en) * | 2019-03-21 | 2022-12-27 | 腾讯科技(深圳)有限公司 | Search method, search device, computing equipment and computer-readable storage medium |
CN112559474A (en) * | 2019-09-26 | 2021-03-26 | 中国电信股份有限公司 | Log processing method and device |
CN111506726A (en) * | 2020-03-18 | 2020-08-07 | 大箴(杭州)科技有限公司 | Short text clustering method and device based on part-of-speech coding and computer equipment |
CN111506726B (en) * | 2020-03-18 | 2023-09-22 | 大箴(杭州)科技有限公司 | Short text clustering method and device based on part-of-speech coding and computer equipment |
CN117009384A (en) * | 2023-09-27 | 2023-11-07 | 湖南立人科技有限公司 | List query method based on quick search algorithm |
CN117009384B (en) * | 2023-09-27 | 2023-12-19 | 湖南立人科技有限公司 | List query method based on quick search algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN107402960B (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107402960A (en) | A kind of inverted index optimized algorithm based on the weighting of the semantic tone | |
CN106445998B (en) | Text content auditing method and system based on sensitive words | |
EP1622052B1 (en) | Phrase-based generation of document description | |
AU2005203239A1 (en) | Phrase-based indexing in an information retrieval system | |
Sun et al. | The keyword extraction of Chinese medical web page based on WF-TF-IDF algorithm | |
CN105095204A (en) | Method and device for obtaining synonym | |
CN108920599B (en) | Question-answering system answer accurate positioning and extraction method based on knowledge ontology base | |
CN111680509A (en) | Method and device for automatically extracting text keywords based on co-occurrence language network | |
CN110134777B (en) | Question duplication eliminating method and device, electronic equipment and computer readable storage medium | |
Sangodiah et al. | Question Classification Using Statistical Approach: A Complete Review. | |
CN107092675B (en) | Uyghur semantic string extraction method based on statistics and shallow language analysis | |
CN114780691B (en) | Model pre-training and natural language processing method, device, equipment and storage medium | |
CN114706972A (en) | Unsupervised scientific and technical information abstract automatic generation method based on multi-sentence compression | |
CN107239554B (en) | Method for retrieving English text based on matching degree | |
CN115238040A (en) | Steel material science knowledge graph construction method and system | |
CN102722526B (en) | Part-of-speech classification statistics-based duplicate webpage and approximate webpage identification method | |
Tomar et al. | Web page classification using modified naïve bayesian approach | |
JPH10254883A (en) | Automatic document sorting method | |
JP6340351B2 (en) | Information search device, dictionary creation device, method, and program | |
JPH06282587A (en) | Automatic classifying method and device for document and dictionary preparing method and device for classification | |
CN114997161A (en) | Keyword extraction method and device, electronic equipment and storage medium | |
Souza et al. | Extraction of keywords from texts: an exploratory study using Noun Phrases | |
Jia et al. | University of Otago at INEX 2010 | |
Premakumara et al. | Optimized Text Summarization method based on fuzzy logic | |
CN113609247A (en) | Big data text duplicate removal technology based on improved Simhash algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |