CN107402960A - A kind of inverted index optimized algorithm based on the weighting of the semantic tone - Google Patents

A kind of inverted index optimized algorithm based on the weighting of the semantic tone Download PDF

Info

Publication number
CN107402960A
CN107402960A CN201710453251.9A CN201710453251A CN107402960A CN 107402960 A CN107402960 A CN 107402960A CN 201710453251 A CN201710453251 A CN 201710453251A CN 107402960 A CN107402960 A CN 107402960A
Authority
CN
China
Prior art keywords
semantic
phrase
keyword
tone
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710453251.9A
Other languages
Chinese (zh)
Other versions
CN107402960B (en
Inventor
夏珺峥
傅玉生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Gifted Data Co Ltd
Original Assignee
Chengdu Gifted Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Gifted Data Co Ltd filed Critical Chengdu Gifted Data Co Ltd
Priority to CN201710453251.9A priority Critical patent/CN107402960B/en
Publication of CN107402960A publication Critical patent/CN107402960A/en
Application granted granted Critical
Publication of CN107402960B publication Critical patent/CN107402960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/328Management therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The present invention discloses a kind of inverted index optimized algorithm based on the weighting of the semantic tone, the present invention relates to document information-handling technique field, solving prior art, inverted index accuracy is low and the big technical problem of index difficulty due to existing using only specific word word frequency index, and solve prior art due to keyword sequence and semantic weighted words stock repeat and caused by semantic weighting it is invalid or the technical problems such as material change can not be realized to inverted index.The present invention is mainly in combination with document semantic tone feature, construct brand-new Weighted Term Frequency definition, inverted entry is realized to sort according to Weighted Term Frequency, not only it is demonstrated by the word frequency density of keyword in a document, the intensity of expressing the meaning of keyword is also reflected, the user that can more assist search preferentially finds desired document.

Description

A kind of inverted index optimized algorithm based on the weighting of the semantic tone
Technical field
The present invention relates to document information-handling technique field, and in particular to a kind of inverted index based on the weighting of the semantic tone Optimized algorithm.
Background technology
Search engine at present, have become the most frequently used internet appliance, data tissue and index and information science field Study hotspot.Inverted entry model reversely searches association information, has adapted to the work of search engine very well according to word frequency Make scene.But word frequency is based solely on, and the weights ordering strategy that word-based frequency meter is calculated, it is impossible to completely reflect the keyword Degree of expressing the meaning in a document.
The present invention has further completely quantified keyword to the important of document representation based on semantic and tone weighting processing Property, the inverted entry indexing means based on the Weighted Term Frequency, it can preferably help user to find corresponding document and information.
The content of the invention
For above-mentioned prior art, present invention aims at provide a kind of inverted index optimization based on the weighting of the semantic tone Algorithm, solves prior art due to having that inverted index accuracy is low using only specific word word frequency index and index difficulty is big Technical problem, and solve prior art due to keyword sequence and semantic weighted words stock repeat and caused by semantic weighting It is invalid or the technical problems such as material change can not be realized to inverted index.
To reach above-mentioned purpose, the technical solution adopted by the present invention is as follows:
A kind of inverted index optimized algorithm based on the weighting of the semantic tone, comprises the following steps:
Step 1, default semantic deactivation phrase, then enhancing semanteme phrase and reduction language with different semantic weighted values are set Adopted phrase, and as the semantic subset for disabling phrase;
Step 2, cutting word processing is carried out to each input document, obtain orderly sequence of terms;
Step 3, orderly sequence of terms is disabled into phrase with semanteme matched, filtered out in matching process and appear in semanteme and stop The keyword sequence of input document is obtained with the phrase in phrase;
Step 4, traversal keyword sequence, obtain current key word tone weighted value after, current key lexeme put to There is the phrase matched in position range in inquiry document phrase with strengthening semantic phrase and the semantic phrase of reduction in its last time, by institute The semantic weighted value combination tone weighted value for matching phrase calculates the Weighted Term Frequency of current key word, and text is obtained after the completion of traversal The Weighted Term Frequency of shelves;
Step 5, arranged according to document Weighted Term Frequency, obtain the document sequence of optimiged index.
In such scheme, described step 1, being set by degree adverb strengthens semantic phrase and the semantic phrase of reduction.
In such scheme, described step 4, wherein, its language is determined by prototype statement sentence tail feature where current key word Gas weighted value.
In such scheme, described step 4, wherein, obtaining tone weighted value includes:
Step 1., define the default tone weighted value of punctuate association of prototype statement;
2., by prototype statement sentence tail tag point where current key word step obtains its tone weighted value.
In such scheme, described step 4, wherein, define the keyword key currently indexed in former sentence jindexWeighting Word frequency fkeyFor:
WiFor keyword key semantic weighted value, n represents the quantity of keyword key in document, m represent keyword key and The semantic phrase quantity matched before between keyword with strengthening semantic phrase and the semantic phrase of reduction, WjFor tone weighted value.
A kind of method for determining document Weighted Term Frequency, comprises the following steps:
Step 1, the dictionary with different semantic weighted values is set;
Step 2, the keyword phrase and dictionary of document matched, and all keyword phrases not being matched are made For keyword sequence;
Step 3, prototype statement sentence tail feature divide quantitative, determines tone weighted value corresponding to every kind of tail feature, The tone weighted value of corresponding keyword is determined by the sentence tail feature of prototype statement where each keyword in keyword sequence again;
Step 4, put to its last time to occur inquiring about in document phrase in position range in current key lexeme and match with dictionary Phrase, the semantic weighted value of current key word is obtained by the phrase of matching, passes through weight product meter with reference to tone weighted value The Weighted Term Frequency of current key word is calculated, then travels through keyword sequence, the Weighted Term Frequency of document is gone out by read group total.
In such scheme, described step 1, semantic deactivation phrase is preset, then set the semantic phrase of enhancing and reduction semantic Phrase and the subset as semantic deactivation phrase.
A kind of method for determining keywords semantics weighted value, comprises the following steps:
Step 1, enhancing semanteme phrase and the semantic phrase of reduction for sky with keyword phrase common factor are set, strengthen semantic word Group and the semantic phrase of reduction possess different semantic weighted values respectively;
Step 2, occur in keyword position to its last time inquiring about in position range in document phrase with strengthening semantic phrase With the phrase for weakening semantic phrase matching;
Step 3, the semantic weighted value possessed according to the phrase of matching, the language of the keyword is calculated by weight product Adopted weighted value.
Compared with prior art, beneficial effects of the present invention:
Inverted entry of the present invention sorts according to Weighted Term Frequency, is not only demonstrated by the word frequency density of keyword in a document, The intensity of expressing the meaning of keyword is also reflected, the user that can more assist search preferentially finds desired document, and prior art is deposited How can just construct disable phrase and define its subset enhancing/reduction semanteme phrase (for after filtering out first again With), how using tone semanteme accurate quantitative analysis weighted sum how to avoid with semantic phrase repeat keyword cause semantic weighting Invalid technology barriers.
Brief description of the drawings
Fig. 1 is the main handling process schematic diagram of the present invention;
Fig. 2 is the handling process schematic diagram of the embodiment of the present invention.
Embodiment
All features disclosed in this specification, or disclosed all methods or during the step of, except mutually exclusive Feature and/or step beyond, can combine in any way.
The present invention will be further described below in conjunction with the accompanying drawings:
A kind of inverted index optimized algorithm based on the weighting of the semantic tone, is comprised the following steps:
S0, the semantic phrase S (pos) of enhancing is preset, preset the semantic phrase S (neg) of reduction, preset semanteme and disable phrase S (stop), and S (pos) and S (neg) are S (stop) subsets;
S1, cutting word processing is carried out to each input document, by the sequence L (org) that document representation is an orderly word;
S2, stop words in L (org) is handled, the word in set L (org) is gradually scanned, filters out S (stop) word occurred in, document keyword sequence L (key) is obtained;
S3, the Weighted Term Frequency for calculating each keyword, and the tone of prototype statement where checking keyword, and do and:Wherein WiFor keyword key semantic weighted value, n is represented in document comprising pass Keyword key quantity, m represent keyword key and the enhancing between keyword before/reduction semanteme word quantity, WjFor the tone plus Weights;
S4, arranged according to document Weighted Term Frequency, in collection of document, document is indexed sequence according to Weighted Term Frequency;
Further, the semantic word of enhancing is preset, the adverbial word, auxiliary word etc. of positive reinforcement phrase semantic is represented, strengthens semantic word Group is included and is not limited to for example " very " " genuine " " special " " very " " very " " suitable ".
Further, the semantic phrase of reduction is preset, represents reduction semantic meaning representation, reduces the adverbial word that statement determines row, auxiliary word Deng, reduction give phrase include be not limited to for example:" possibility " " general " " a little " " indistinct " " seeming " " whether ".
Further, two phrases are default resource.
Further, as shown in Weighted Term Frequency calculation formula, on the semantic weight of each keyword, with its front position The product positive correlation of each semantic word weights.If without semantic word, weights 1 before keyword.
Further, WjFor tone weighted value.The tone of the tone weighted value from sentence, it is weighted to each pass of the sentence On keyword, with the sentence of fullstop ending, its tone is states, weighted value 1;The sentence to be ended up with exclamation, its tone are exclamation With pray making, the tone is strong, weighted value be more than 1;With question mark end up sentence, its tone be query, ask in reply, show oneself or it is right The uncertainty of side, tone reduction is semantic, and weighted value is less than 1.
Embodiment 1
Such as Fig. 2, so, a kind of implementation process of the present invention is:
S01, the default semantically enhancement dictionary S (pos) of loading and semantic reduction dictionary S (neg), and corresponding weight value;
S02, read any one document in document library;
S03, document is segmented, obtain document word order and represent L (org);Stop words is filtered to L (org), obtains crucial word order L(key);
S04, to each key in L (key), sentence according to where it, obtain its tone weighted value Wj (key);
S05, traversal L (key), record each keyword key and its adjacent keyword key-1 in left side;
S06, traversal L (org), to the word between key-1 and key, search whether exist in S (pos) and S (neg);
S07, after finding existing semantic reinforcing/reduction word, on Weighted Term Frequency that its weighted value is taken to keyword key, Sentence punctuate where keyword key is found, according to punctuate by tone weight, on the Weighted Term Frequency of product to keyword key;
S08, in ergodic process, Weighted Term Frequencies of the key being calculated under current context, be summed into keyword Key is on the Weighted Term Frequency under linguistic context before;
If S09, L (key) also have untreated keyword, S5 steps are redirected, are continued;
If S10, collection of document also have untreated document, S2 steps are redirected, are continued;
S11, with each one group of lists of documents of keyword key indexes, the position of lists of documents, according to keyword key at this Weighted Term Frequency in document falls to sort;
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Belong to those skilled in the art the invention discloses technical scope in, the change or replacement that can readily occur in, all should It is included within the scope of the present invention.

Claims (8)

1. a kind of inverted index optimized algorithm based on the weighting of the semantic tone, it is characterised in that comprise the following steps:
Step 1, default semantic deactivation phrase S (stop), then the enhancing semanteme phrase S with different semantic weighted values is set (pos) and semantic phrase S (neg) is weakened, and as the semantic subset for disabling phrase S (stop);
Step 2, cutting word processing is carried out to each input document, obtain orderly sequence of terms L (org);
Step 3, by orderly sequence of terms L (org) with semanteme disable phrase S (stop) matched, filtered out out in matching process The semantic phrase disabled in phrase S (stop) now, obtain the keyword sequence L (key) of input document;
Step 4, keyword sequence L (key) is traveled through, after the tone weighted value for obtaining current key word, put in current key lexeme Occurred inquiring about in position range in document phrase with strengthening semantic phrase S (pos) and the semantic phrase S (neg) of reduction to its last time The phrase of matching, the Weighted Term Frequency of current key word is calculated by the semantic weighted value combination tone weighted value of matched phrase, The Weighted Term Frequency of document is obtained after the completion of traversal;
Step 5, arranged according to document Weighted Term Frequency, obtain the document sequence of optimiged index.
A kind of 2. inverted index optimized algorithm based on the weighting of the semantic tone according to claim 1, it is characterised in that institute The step 1 stated, being set by degree adverb strengthens semantic phrase S (pos) and the semantic phrase S (neg) of reduction.
A kind of 3. inverted index optimized algorithm based on the weighting of the semantic tone according to claim 1, it is characterised in that institute The step 4 stated, wherein, its tone weighted value is determined by prototype statement sentence tail feature where current key word.
A kind of 4. inverted index optimized algorithm based on the weighting of the semantic tone according to claim 3, it is characterised in that institute The step 4 stated, wherein, obtaining tone weighted value includes:
Step 1., define the default tone weighted value of punctuate association of prototype statement;
2., by prototype statement sentence tail tag point where current key word step obtains its tone weighted value.
A kind of 5. inverted index optimized algorithm based on the weighting of the semantic tone according to claim 1, it is characterised in that institute The step 4 stated, wherein, define current key word key in former sentence jindexWeighted Term Frequency fkeyFor:
<mrow> <msub> <mi>f</mi> <mrow> <mi>k</mi> <mi>e</mi> <mi>y</mi> </mrow> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mi>n</mi> <mi>d</mi> <mi>e</mi> <mi>x</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <munderover> <mi>&amp;Pi;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>W</mi> <mi>i</mi> </msub> <msub> <mi>Key</mi> <mrow> <mi>i</mi> <mi>n</mi> <mi>d</mi> <mi>e</mi> <mi>x</mi> </mrow> </msub> <msub> <mi>W</mi> <mi>j</mi> </msub> </mrow>
WiFor keyword key semantic weighted value, n represents the quantity of keyword key in document, and m represents keyword key and before The semantic phrase quantity matched between keyword with strengthening semantic phrase S (pos) and the semantic phrase S (neg) of reduction, WjFor the tone Weighted value.
A kind of 6. method for determining document Weighted Term Frequency, it is characterised in that comprise the following steps:
Step 1, the dictionary with different semantic weighted values is set;
Step 2, the keyword phrase and dictionary of document matched, and using all keyword phrases not being matched as closing Keyword sequence L (key);
Step 3, prototype statement sentence tail feature divide quantitative, determine tone weighted value corresponding to every kind of tail feature, then lead to The sentence tail feature for crossing prototype statement where each keyword in keyword sequence L (key) determines the tone weighting of corresponding keyword Value;
Step 4, the word that occurs position range in inquiry document phrase with dictionary match was put to its last time in current key lexeme Group, the semantic weighted value of current key word is obtained by the phrase of matching, calculated with reference to tone weighted value by weight product The Weighted Term Frequency of current key word, keyword sequence L (key) is then traveled through, the Weighted Term Frequency of document is gone out by read group total.
A kind of 7. method for determining document Weighted Term Frequency according to claim 6, it is characterised in that described step 1, in advance If semanteme disables phrase S (stop), then sets the semantic phrase S (pos) of enhancing and the semantic phrase S (neg) of reduction and be used as language Justice disables phrase S (stop) subset.
A kind of 8. method for determining keywords semantics weighted value, it is characterised in that comprise the following steps:
Step 1, enhancing semanteme phrase S (pos) and the semantic phrase S (neg) of reduction occured simultaneously with keyword phrase for sky are set, increased Strong semantic phrase S (pos) and the semantic phrase S (neg) of reduction possess different semantic weighted values respectively;
Step 2, occur in keyword position to its last time inquiring about in position range in document phrase with strengthening semantic phrase S (pos) and reduction semantic phrase S (neg) matching phrase;
Step 3, the semantic weighted value possessed according to the phrase of matching, the semanteme that the keyword is calculated by weight product add Weights.
CN201710453251.9A 2017-06-15 2017-06-15 Reverse index optimization algorithm based on semantic mood weighting Active CN107402960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710453251.9A CN107402960B (en) 2017-06-15 2017-06-15 Reverse index optimization algorithm based on semantic mood weighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710453251.9A CN107402960B (en) 2017-06-15 2017-06-15 Reverse index optimization algorithm based on semantic mood weighting

Publications (2)

Publication Number Publication Date
CN107402960A true CN107402960A (en) 2017-11-28
CN107402960B CN107402960B (en) 2020-11-10

Family

ID=60404390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710453251.9A Active CN107402960B (en) 2017-06-15 2017-06-15 Reverse index optimization algorithm based on semantic mood weighting

Country Status (1)

Country Link
CN (1) CN107402960B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033478A (en) * 2018-09-12 2018-12-18 重庆工业职业技术学院 A kind of text information law analytical method and system for search engine
CN109710796A (en) * 2019-01-14 2019-05-03 Oppo广东移动通信有限公司 Voice-based image searching method, device, storage medium and terminal
CN109977292A (en) * 2019-03-21 2019-07-05 腾讯科技(深圳)有限公司 Searching method, calculates equipment and computer readable storage medium at device
CN111506726A (en) * 2020-03-18 2020-08-07 大箴(杭州)科技有限公司 Short text clustering method and device based on part-of-speech coding and computer equipment
CN112559474A (en) * 2019-09-26 2021-03-26 中国电信股份有限公司 Log processing method and device
CN117009384A (en) * 2023-09-27 2023-11-07 湖南立人科技有限公司 List query method based on quick search algorithm

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106767A1 (en) * 2004-11-12 2006-05-18 Fuji Xerox Co., Ltd. System and method for identifying query-relevant keywords in documents with latent semantic analysis
CN101499091A (en) * 2009-03-17 2009-08-05 辽宁般若网络科技有限公司 Web page representative words recommending method
CN101826102A (en) * 2010-03-26 2010-09-08 浙江大学 Automatic book keyword generation method
CN103064969A (en) * 2012-12-31 2013-04-24 武汉传神信息技术有限公司 Method for automatically creating keyword index table
CN103377232A (en) * 2012-04-25 2013-10-30 阿里巴巴集团控股有限公司 Headline keyword recommendation method and system
CN103530789A (en) * 2012-07-03 2014-01-22 百度在线网络技术(北京)有限公司 Method, device and apparatus for determining key index terms
CN103699567A (en) * 2013-11-04 2014-04-02 北京中搜网络技术股份有限公司 Method for realizing same news clustering based on title fingerprint and text fingerprint
CN105095223A (en) * 2014-04-25 2015-11-25 阿里巴巴集团控股有限公司 Method for classifying texts and server
CN106126561A (en) * 2016-06-16 2016-11-16 北京百度网讯科技有限公司 The generation method and device of Search Results summary
CN106557508A (en) * 2015-09-28 2017-04-05 北京神州泰岳软件股份有限公司 A kind of text key word extracting method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106767A1 (en) * 2004-11-12 2006-05-18 Fuji Xerox Co., Ltd. System and method for identifying query-relevant keywords in documents with latent semantic analysis
CN101499091A (en) * 2009-03-17 2009-08-05 辽宁般若网络科技有限公司 Web page representative words recommending method
CN101826102A (en) * 2010-03-26 2010-09-08 浙江大学 Automatic book keyword generation method
CN103377232A (en) * 2012-04-25 2013-10-30 阿里巴巴集团控股有限公司 Headline keyword recommendation method and system
CN103530789A (en) * 2012-07-03 2014-01-22 百度在线网络技术(北京)有限公司 Method, device and apparatus for determining key index terms
CN103064969A (en) * 2012-12-31 2013-04-24 武汉传神信息技术有限公司 Method for automatically creating keyword index table
CN103699567A (en) * 2013-11-04 2014-04-02 北京中搜网络技术股份有限公司 Method for realizing same news clustering based on title fingerprint and text fingerprint
CN105095223A (en) * 2014-04-25 2015-11-25 阿里巴巴集团控股有限公司 Method for classifying texts and server
CN106557508A (en) * 2015-09-28 2017-04-05 北京神州泰岳软件股份有限公司 A kind of text key word extracting method and device
CN106126561A (en) * 2016-06-16 2016-11-16 北京百度网讯科技有限公司 The generation method and device of Search Results summary

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TIAN XIA,YANMEI CHAI: ""An Improvement to TF-IDF: Term Distribution"", 《JOURNAL OF SOFTWARE》 *
张建娥: ""基于TFIDF和词语关联度的中文关键词提取方法"", 《情报科学》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033478A (en) * 2018-09-12 2018-12-18 重庆工业职业技术学院 A kind of text information law analytical method and system for search engine
CN109710796A (en) * 2019-01-14 2019-05-03 Oppo广东移动通信有限公司 Voice-based image searching method, device, storage medium and terminal
CN109977292A (en) * 2019-03-21 2019-07-05 腾讯科技(深圳)有限公司 Searching method, calculates equipment and computer readable storage medium at device
CN109977292B (en) * 2019-03-21 2022-12-27 腾讯科技(深圳)有限公司 Search method, search device, computing equipment and computer-readable storage medium
CN112559474A (en) * 2019-09-26 2021-03-26 中国电信股份有限公司 Log processing method and device
CN111506726A (en) * 2020-03-18 2020-08-07 大箴(杭州)科技有限公司 Short text clustering method and device based on part-of-speech coding and computer equipment
CN111506726B (en) * 2020-03-18 2023-09-22 大箴(杭州)科技有限公司 Short text clustering method and device based on part-of-speech coding and computer equipment
CN117009384A (en) * 2023-09-27 2023-11-07 湖南立人科技有限公司 List query method based on quick search algorithm
CN117009384B (en) * 2023-09-27 2023-12-19 湖南立人科技有限公司 List query method based on quick search algorithm

Also Published As

Publication number Publication date
CN107402960B (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN107402960A (en) A kind of inverted index optimized algorithm based on the weighting of the semantic tone
CN106445998B (en) Text content auditing method and system based on sensitive words
EP1622052B1 (en) Phrase-based generation of document description
AU2005203239A1 (en) Phrase-based indexing in an information retrieval system
Sun et al. The keyword extraction of Chinese medical web page based on WF-TF-IDF algorithm
CN105095204A (en) Method and device for obtaining synonym
CN108920599B (en) Question-answering system answer accurate positioning and extraction method based on knowledge ontology base
CN111680509A (en) Method and device for automatically extracting text keywords based on co-occurrence language network
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
Sangodiah et al. Question Classification Using Statistical Approach: A Complete Review.
CN107092675B (en) Uyghur semantic string extraction method based on statistics and shallow language analysis
CN114780691B (en) Model pre-training and natural language processing method, device, equipment and storage medium
CN114706972A (en) Unsupervised scientific and technical information abstract automatic generation method based on multi-sentence compression
CN107239554B (en) Method for retrieving English text based on matching degree
CN115238040A (en) Steel material science knowledge graph construction method and system
CN102722526B (en) Part-of-speech classification statistics-based duplicate webpage and approximate webpage identification method
Tomar et al. Web page classification using modified naïve bayesian approach
JPH10254883A (en) Automatic document sorting method
JP6340351B2 (en) Information search device, dictionary creation device, method, and program
JPH06282587A (en) Automatic classifying method and device for document and dictionary preparing method and device for classification
CN114997161A (en) Keyword extraction method and device, electronic equipment and storage medium
Souza et al. Extraction of keywords from texts: an exploratory study using Noun Phrases
Jia et al. University of Otago at INEX 2010
Premakumara et al. Optimized Text Summarization method based on fuzzy logic
CN113609247A (en) Big data text duplicate removal technology based on improved Simhash algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant