CN106294662A - Inquiry based on context-aware theme represents and mixed index method for establishing model - Google Patents
Inquiry based on context-aware theme represents and mixed index method for establishing model Download PDFInfo
- Publication number
- CN106294662A CN106294662A CN201610634174.2A CN201610634174A CN106294662A CN 106294662 A CN106294662 A CN 106294662A CN 201610634174 A CN201610634174 A CN 201610634174A CN 106294662 A CN106294662 A CN 106294662A
- Authority
- CN
- China
- Prior art keywords
- context
- query
- topic
- aware
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 239000013598 vector Substances 0.000 claims abstract description 19
- 238000009826 distribution Methods 0.000 claims description 10
- 238000005065 mining Methods 0.000 claims description 3
- 230000009193 crawling Effects 0.000 abstract 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 8
- 238000013461 design Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009328 dry farming Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000003345 natural gas Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of inquiry based on context-aware theme to represent and mixed index method for establishing model, comprise the steps: step one: based on the keyword set inquired about, obtain the pseudo-linear filter document of inquiry, from pseudo-linear filter document, choose context associated with the query;Step 2: introduce context-aware topic model, context is incorporated in context-aware topic model, the subject information implied based on corpus Topics Crawling contextual window, obtain its corresponding theme vector;Step 3: inquiry is combined with keyword set with theme vector and represents, based on theme vector and keyword set, set up mixed index model, obtain final retrieval score.
Description
Technical Field
The invention relates to the technical field of internet information retrieval, in particular to a query expression and mixed retrieval model building method based on a context-aware topic model.
Background
Query representation has been the core of the information retrieval field, wherein the most common problem is that the user query is too short (only contains a few key words), which easily causes the relevant documents not to match with the query in the retrieval process. For example, for the user query of "water shortage", if the document contains words related to the query, such as "drought", and the like, although the relevance is high, the final matching degree will be low because the document does not contain the original query keyword "water shortage", and the accuracy of the query is further influenced.
A common solution is query expansion based on pseudo-correlation feedback. The method is based on the preliminary retrieval result, and the K documents (referred to as 'pseudo-relevant feedback documents') arranged in the front are assumed to be relevant to the original query, wherein the keywords can be extracted by adopting a relevant algorithm for query expansion representation. However, this method is unsupervised and tends to bring in some terms that are not relevant to the query. Although in theory, supervised classification methods may be employed, taking into account a variety of features of the expanded words, to pick out the words that are truly relevant to the query. However, the method depends on feature engineering and label training sets, and the cost of practical application is high.
Some recent studies have focused on how to mitigate the problem of irrelevant expansion word introduction in query representations using various contextual information. The context information sources mainly comprise high-quality external data sources (such as encyclopedias, domain ontologies and the like) and pseudo-relevant feedback documents based on the data sets themselves. The former is only suitable for partial query, and the external data source is mostly slow to update and difficult to acquire, so the practical application is not wide. The latter, based on the pseudo-relevant feedback documents of the data set itself, also actually provides a contextual background description of the query, with greater research prospects. For example, for the query "lack of water", the pseudo-relevance feedback document 1 describes: "the uk will face the problem of water shortage in the coming years, so please save water and repair your faucet. "; the pseudo-correlation feedback document 2 describes: "dry farming: a method to alleviate the problems of drought and water deficit. Both of these are countermeasures to the problem of water shortage, and these context information can be used to assist in query representation. However, the existing method for selecting the extension word only considers the co-occurrence degree of the extension word and the original query word in the context window of the pseudo-correlation feedback, and still has the following problems: (1) it is necessary to explicitly select which words are used as final query expansion, and some irrelevant words, even "harmful words", are still introduced without supervision. Such as: in articles related to various environmental resources, a keyword ' water shortage ' appears frequently, but similar ' hydroelectric power generation ', natural gas ' and the like can also appear in the articles, so that the original query can be deviated, and the query accuracy is reduced; (2) the final query representation is still based on a dictionary space, and the semantic information implied by the query, such as potential topics, is ignored; (3) search models based on such query representations primarily consider keyword matches, while ignoring document and query matches at the semantic level.
Disclosure of Invention
The invention aims to provide a method for designing a query expression and mixed retrieval model based on a context-aware topic model, aiming at the defects of the prior art, wherein context topic information based on pseudo-correlation feedback is integrated into the query expression, so that topic matching is increased on the basis of the original retrieval model based on keyword matching, and the accuracy of a retrieval result is improved.
The invention provides a query expression and mixed retrieval model building method based on context-aware theme, which comprises the following steps:
the method comprises the following steps: acquiring a pseudo-relevant feedback document of the query based on a queried keyword set, and selecting a context relevant to the query from the pseudo-relevant feedback document;
step two: introducing a context-aware topic model, merging the context into the context-aware topic model, mining topic information implied by the context window based on the topic of a corpus, and obtaining a corresponding topic vector;
step three: representing the query in the topic vector jointly with the set of keywords; and establishing a mixed retrieval model based on the theme vector and the keyword set to obtain a final retrieval score.
In the method for establishing the query expression and mixed retrieval model based on the context-aware theme, in the first step, the pseudo-relevant feedback document is divided into a plurality of sliding windows, the relevance between each window and the query is calculated, and the window with the relevance higher than a threshold value is taken as a context window relevant to the query.
In the method for establishing the query expression and mixed retrieval model based on the context-aware theme, the context selection threshold value related to the query is an average value of the relevance of all windows under the query.
In the method for establishing the query expression and mixed retrieval model based on the context-aware theme, the context-aware theme model is designed according to the query-related context and the whole corpus, and the context-aware theme model is used for assuming that a context window and a pseudo-related feedback document where the context window is located share the same theme distribution in the theme modeling process to obtain a theme vector of the context.
In the method for establishing the query expression and mixed retrieval model based on the context-aware theme, the pseudo-relevant feedback documents are obtained by calculating the keyword matching scores of the retrieval model.
In the method for establishing the query expression and hybrid retrieval model based on the context-aware theme, the retrieval score is expressed by the following formula:
wherein s represents the score based on keyword matching in the traditional retrieval model, s 'represents the topic matching score based on new query representation Q', and λ is the weighting parameter between the two scores and also the weighting coefficient of the two matching modes.
The invention has the beneficial effects that: the method fully utilizes the context information of the corpus based on the pseudo-correlation feedback, and solves the problem that high-quality external data sources are difficult to acquire. And the pseudo-relevant feedback document is divided into context windows, and the context segments relevant to the query are selected from the context windows for query representation, so that introduction of noise and query drift are reduced, and the pseudo-relevant feedback document is an innovative measure for controlling the quality of the query representation. The context-aware topic model provided by the invention fully excavates the topic information corresponding to the context related to the query, breaks through the traditional understanding only based on the keyword level, and is beneficial to more comprehensively and deeply understanding the user query. Traditional search models are mainly based on keyword matching, and ignore deep semantic relevance. The hybrid retrieval model designed by the invention comprehensively considers keyword matching and topic matching, and the diversified matching mode is helpful for promoting the improvement of the retrieval effect. The query representation method and the hybrid retrieval model provided by the invention are proved to be effective on the data set of Microblock Track 2011-2014, context topic information is blended in the query, and the MAP value finally retrieved exceeds some latest query representation methods.
Drawings
FIG. 1 is a flowchart of a method for building a context-aware topic-based query representation and hybrid search model according to the present invention.
Fig. 2 is a flow diagram of context selection based on pseudo-correlation feedback.
FIG. 3 is a graphical model representation of a context-aware topic model.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.
As shown in FIG. 1, the method for establishing a query expression and hybrid retrieval model based on context-aware topics comprises the following steps:
the method comprises the following steps: acquiring a pseudo-relevant feedback document of the query based on the queried keyword set, and selecting a context relevant to the query from the pseudo-relevant feedback document;
step two: introducing a context-aware topic model, fusing context into the context-aware topic model, mining topic information implied by a context window based on the topic of a corpus, and obtaining a corresponding topic vector;
step three: jointly representing the query by a topic vector and a keyword set; and establishing a mixed retrieval model based on the topic vector and the keyword set to obtain a final retrieval score.
(one) relevant context selection based on pseudo-relevant feedback
Since the pseudo-relevant feedback document is easy to obtain and contains much content relevant to the query, the context relevant to the query is selected from the pseudo-relevant feedback document and used for query representation, and the specific flow of the method is shown in fig. 2.
Firstly, segmenting the pseudo-related feedback document to obtain a plurality of context windows with the size of n. Definition Q ═ { Q ═ Q1,q2,...,q|Q|Is a query, where q isiRepresenting a query keyword, | Q | represents the number of keywords in the query.The document set is a pseudo-relevant feedback document set corresponding to the query Q, namely, the document ranked at top k in the first retrieval. For a pseudo-relevant feedback documentIt will be divided into several n-sized context windows (containing n words) as shown in fig. 2 in the form of sliding window, i.e. Qc1,Qc2,...,QclAnd I denotes the number of context windows.
Second, the relevance of the context window to the original query is computed. For a query and contextual window pair (Q, Q)c) The invention combines multiple methods to calculate the correlation R (Q, Q) between themc) Such as mean Mutual Information (poitwise Mutual Information) based on word co-occurrence, Jaccard similarity based on word sets, semantic similarity based on word vectors word2vec, and the like, and finally, the mean value is taken.
Contexts relevant to the query are then screened out. The correlation obtained above is first normalized. Then, a threshold value is set as the average value of the relevance of all windows under the query, a context window with the relevance lower than the threshold value is filtered, and the rest of the context which is more relevant to the query is further used as context-aware topic modeling.
(II) context topic perception modeling and query representation
Given the query-related context and the entire corpus obtained in (a), the present invention designs a context-aware topic model to incorporate the query-related context information into the topic model to generate a new query representation.
Inspired by relevant research, since the selected context window in (one) and the pseudo-relevant feedback document where it is located are both closely related to the query, it is assumed that they share the same topic distribution. Under this assumption, the traditional LDA topic model is improved to obtain a context-aware topic model CAT, which is shown in fig. 3. The relevant symbols involved in the model are illustrated in table 1. The model is a generative model, and the specific modeling process is shown in algorithm 1.
TABLE 1 description of related symbols in context aware topic model CAT
To solve for the parameters in the model, the present invention employs a widely used Gibbs sampling (Gibbs sampling) algorithm.
First, according to the gibbs sampling algorithm, the probability that the first word in the document is assigned to the topic is expressed by the following formula (1):
wherein,a topic assignment vector representing all other words not including the current ith word,representing the number of words in document d assigned to topic k (excluding the current word),the expression wiThe number of times assigned to topic k (excluding the current word) in the entire corpus. For missing superscripts or subscripts in the notation (e.g. forAnd) Representing the summation over the missing dimensions, 1 is a vector with all 1 elements.
Similarly, the probability that the jth query-relevant context window in document d is assigned to topic k can be expressed by the following equation (2):
wherein,the topic assignment vectors representing all other windows that do not include the current jth query-related contextual window,indicates the number of all contextual windows in topic k (excluding the current window), θ, associated with query Qd,kThe probability of the topic k in the document d can be further calculated by the following formula:
wherein,representing the total number of words in document d that are assigned to topic k.
When the model converges or reaches a preset number of iterations, the following distributions will result: "document-topic" distribution θ, "topic-word" distribution Φ, and "topic-query context" distribution η. Each column of η represents the distribution of all relevant contexts of a query over the topic, which is also the resulting new query representation. It can be seen that the representation naturally fuses together context information and topic information at the same time, and theoretically would be superior to the representation methods that model each separately.
(III) design of hybrid search model
The invention designs a mixed retrieval model considering keyword matching and topic matching simultaneously based on the obtained new query expression, and the retrieval score calculation formula is as follows:
where s represents a score based on keyword matching in the conventional search model, such as a language model search score or a BM25 search score, s 'represents a topic matching score based on a new query representation Q', and λ is a weighting parameter between the two scores and also a weighting coefficient of the two matching modes.
Regarding the topic matching score, various calculation methods may be employed. Specifically, given the topic distribution vector of the new query representation and the document, it can be derived by calculating topic distribution similarity between the two, such as Jensen-Shannon university (JSD) and Cosine similarity (Cosine similarity).
The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.
Claims (6)
1. A query expression and mixed retrieval model building method based on context-aware topics is characterized by comprising the following steps:
the method comprises the following steps: acquiring a pseudo-relevant feedback document of the query based on a queried keyword set, and selecting a context relevant to the query from the pseudo-relevant feedback document;
step two: introducing a context-aware topic model, merging the context into the context-aware topic model, mining topic information implied by the context window based on the topic of a corpus, and obtaining a corresponding topic vector;
step three: and jointly representing the query by the topic vector and the keyword set, and establishing a mixed retrieval model based on the topic vector and the keyword set to obtain a final retrieval score.
2. The method according to claim 1, wherein the pseudo-relevance feedback document is divided into a plurality of sliding windows, the relevance of each window to the query is calculated, and the window with the relevance higher than a threshold value is taken as the contextual window relevant to the query.
3. The method of claim 2, wherein the context selection threshold associated with the query is an average of the correlations of all windows under the query.
4. The method for building a query representation and hybrid retrieval model based on context-aware topics as claimed in claim 1, wherein the context-aware topic model is designed according to a query-related context and a whole corpus, and a topic vector of a context is obtained by assuming that a context window and a pseudo-related feedback document where the context window is located share the same topic distribution in a topic modeling process by using the context-aware topic model.
5. The method of claim 1, wherein the pseudo-relevant feedback documents are computed using a search model keyword matching score.
6. The method for building a query representation and hybrid retrieval model based on context-aware topics as claimed in claim 1, wherein the retrieval score is expressed by the following formula:
wherein s represents the score based on keyword matching in the traditional retrieval model, s 'represents the topic matching score based on new query representation Q', and λ is the weighting parameter between the two scores and also the weighting coefficient of the two matching modes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610634174.2A CN106294662A (en) | 2016-08-05 | 2016-08-05 | Inquiry based on context-aware theme represents and mixed index method for establishing model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610634174.2A CN106294662A (en) | 2016-08-05 | 2016-08-05 | Inquiry based on context-aware theme represents and mixed index method for establishing model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106294662A true CN106294662A (en) | 2017-01-04 |
Family
ID=57664982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610634174.2A Pending CN106294662A (en) | 2016-08-05 | 2016-08-05 | Inquiry based on context-aware theme represents and mixed index method for establishing model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106294662A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108121699A (en) * | 2017-12-21 | 2018-06-05 | 北京百度网讯科技有限公司 | For the method and apparatus of output information |
CN108520033A (en) * | 2018-03-28 | 2018-09-11 | 华中师范大学 | Enhancing pseudo-linear filter model information search method based on superspace simulation language |
CN108710611A (en) * | 2018-05-17 | 2018-10-26 | 南京大学 | A kind of short text topic model generation method of word-based network and term vector |
CN108804443A (en) * | 2017-04-27 | 2018-11-13 | 安徽富驰信息技术有限公司 | A kind of judicial class case searching method based on multi-feature fusion |
CN110333700A (en) * | 2019-05-24 | 2019-10-15 | 蓝炬兴业(赤壁)科技有限公司 | Industrial computer server remote management platform system and method |
CN110427400A (en) * | 2019-06-21 | 2019-11-08 | 贵州电网有限责任公司 | Search method is excavated based on operation of power networks information interactive information user's demand depth |
CN111897928A (en) * | 2020-08-04 | 2020-11-06 | 广西财经学院 | Chinese query expansion method for embedding expansion words into query words and counting expansion word union |
CN112685440A (en) * | 2020-12-31 | 2021-04-20 | 王程 | Structural query information expression method for marking search semantic role |
WO2021250488A1 (en) * | 2020-06-08 | 2021-12-16 | International Business Machines Corporation | Refining a search request to a content provider |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750315A (en) * | 2012-04-25 | 2012-10-24 | 北京航空航天大学 | Rapid discovering method of conceptual relations based on sovereignty iterative search |
CN103678412A (en) * | 2012-09-21 | 2014-03-26 | 北京大学 | Document retrieval method and device |
CN103927177A (en) * | 2014-04-18 | 2014-07-16 | 扬州大学 | Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm |
CN104050235A (en) * | 2014-03-27 | 2014-09-17 | 浙江大学 | Distributed information retrieval method based on set selection |
CN104391942A (en) * | 2014-11-25 | 2015-03-04 | 中国科学院自动化研究所 | Short text characteristic expanding method based on semantic atlas |
-
2016
- 2016-08-05 CN CN201610634174.2A patent/CN106294662A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750315A (en) * | 2012-04-25 | 2012-10-24 | 北京航空航天大学 | Rapid discovering method of conceptual relations based on sovereignty iterative search |
CN103678412A (en) * | 2012-09-21 | 2014-03-26 | 北京大学 | Document retrieval method and device |
CN104050235A (en) * | 2014-03-27 | 2014-09-17 | 浙江大学 | Distributed information retrieval method based on set selection |
CN103927177A (en) * | 2014-04-18 | 2014-07-16 | 扬州大学 | Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm |
CN104391942A (en) * | 2014-11-25 | 2015-03-04 | 中国科学院自动化研究所 | Short text characteristic expanding method based on semantic atlas |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804443A (en) * | 2017-04-27 | 2018-11-13 | 安徽富驰信息技术有限公司 | A kind of judicial class case searching method based on multi-feature fusion |
CN108121699A (en) * | 2017-12-21 | 2018-06-05 | 北京百度网讯科技有限公司 | For the method and apparatus of output information |
CN108520033A (en) * | 2018-03-28 | 2018-09-11 | 华中师范大学 | Enhancing pseudo-linear filter model information search method based on superspace simulation language |
CN108710611B (en) * | 2018-05-17 | 2021-08-03 | 南京大学 | Short text topic model generation method based on word network and word vector |
CN108710611A (en) * | 2018-05-17 | 2018-10-26 | 南京大学 | A kind of short text topic model generation method of word-based network and term vector |
CN110333700A (en) * | 2019-05-24 | 2019-10-15 | 蓝炬兴业(赤壁)科技有限公司 | Industrial computer server remote management platform system and method |
CN110427400A (en) * | 2019-06-21 | 2019-11-08 | 贵州电网有限责任公司 | Search method is excavated based on operation of power networks information interactive information user's demand depth |
WO2021250488A1 (en) * | 2020-06-08 | 2021-12-16 | International Business Machines Corporation | Refining a search request to a content provider |
US11238052B2 (en) | 2020-06-08 | 2022-02-01 | International Business Machines Corporation | Refining a search request to a content provider |
GB2611237A (en) * | 2020-06-08 | 2023-03-29 | Ibm | Refining a search request to a content provider |
AU2021289542B2 (en) * | 2020-06-08 | 2023-06-01 | International Business Machines Corporation | Refining a search request to a content provider |
CN111897928A (en) * | 2020-08-04 | 2020-11-06 | 广西财经学院 | Chinese query expansion method for embedding expansion words into query words and counting expansion word union |
CN112685440A (en) * | 2020-12-31 | 2021-04-20 | 王程 | Structural query information expression method for marking search semantic role |
CN112685440B (en) * | 2020-12-31 | 2022-03-22 | 上海欣兆阳信息科技有限公司 | Structural query information expression method for marking search semantic role |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106294662A (en) | Inquiry based on context-aware theme represents and mixed index method for establishing model | |
CN108052593B (en) | Topic keyword extraction method based on topic word vector and network structure | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
Fang et al. | Word-sentence co-ranking for automatic extractive text summarization | |
CN109241538B (en) | Chinese entity relation extraction method based on dependency of keywords and verbs | |
CN106844658B (en) | Automatic construction method and system of Chinese text knowledge graph | |
Min et al. | Nonparametric masked language modeling | |
CN108681557B (en) | Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN103150382B (en) | Automatic short text semantic concept expansion method and system based on open knowledge base | |
Jafari et al. | Automatic text summarization using fuzzy inference | |
CN103544242A (en) | Microblog-oriented emotion entity searching system | |
Sadr et al. | Unified topic-based semantic models: a study in computing the semantic relatedness of geographic terms | |
CN103324700A (en) | Noumenon concept attribute learning method based on Web information | |
CN109597995A (en) | A kind of document representation method based on BM25 weighted combination term vector | |
Liu et al. | Enhanced word embedding similarity measures using fuzzy rules for query expansion | |
CN106372122A (en) | Wiki semantic matching-based document classification method and system | |
Zhao et al. | Keyword extraction for social media short text | |
CN114265943A (en) | Causal relationship event pair extraction method and system | |
CN106776569A (en) | Tourist hot spot and its Feature Extraction Method and system in mass text | |
El Mahdaouy et al. | Semantically enhanced term frequency based on word embeddings for Arabic information retrieval | |
Ezzikouri et al. | Fuzzy-semantic similarity for automatic multilingual plagiarism detection | |
Pang et al. | Query expansion and query fuzzy with large-scale click-through data for microblog retrieval | |
Darling et al. | Pathsum: A summarization framework based on hierarchical topics | |
Albathan et al. | Enhanced n-gram extraction using relevance feature discovery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170104 |