CN107066589A - A kind of sort method and device of Entity Semantics and word frequency based on comprehensive knowledge - Google Patents

A kind of sort method and device of Entity Semantics and word frequency based on comprehensive knowledge Download PDF

Info

Publication number
CN107066589A
CN107066589A CN201710252110.0A CN201710252110A CN107066589A CN 107066589 A CN107066589 A CN 107066589A CN 201710252110 A CN201710252110 A CN 201710252110A CN 107066589 A CN107066589 A CN 107066589A
Authority
CN
China
Prior art keywords
entity
word
feature
semantics
comprehensive knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710252110.0A
Other languages
Chinese (zh)
Other versions
CN107066589B (en
Inventor
靳小波
王胜
曹鹤玲
肖乐
费选
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN201710252110.0A priority Critical patent/CN107066589B/en
Publication of CN107066589A publication Critical patent/CN107066589A/en
Application granted granted Critical
Publication of CN107066589B publication Critical patent/CN107066589B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The present invention relates to a kind of Entity Semantics based on comprehensive knowledge and the sort method and device of word frequency, after domain knowledge extension entity object is crawled, a variety of validity features, including words-frequency feature and semantic feature are designed, the correlation come using sequence learning method between predicted query and entity.The present invention makes full use of the words-frequency feature of bottom and the Entity Semantics feature of high level, can preferably embody inquiry with the correlation between entity, the result for improving retrieval performance, i.e. entity search is more accurate, and then improves the satisfaction of user search.

Description

A kind of sort method and device of Entity Semantics and word frequency based on comprehensive knowledge
Technical field
The invention belongs to entity search technical field, and in particular to a kind of Entity Semantics and word frequency based on comprehensive knowledge Sort method and device.
Background technology
The main flow search technique " keyword search " that present search engine is used is a kind of " existence search " technology, is returned The web page listings of keyword are included back to user, user generally requires further to browse these webpages and filtered out a large amount of useless Information can just find really desired result, and this procedural information consumption cost is high, significantly reduces Consumer's Experience, user Geng Xi Prestige " can directly obtain answer "." whom the wife of Barack Obama is " such as is inquired about, search result desired by user is Succinct data entries " meter Xie Er Obamas ", rather than substantial amounts of webpage, this search is exactly entity search (Entity Search).The distinguishing feature of entity search is " directly giving answer ", and it is concerned with " object ", object can be it is various not Same classification, such as:People, film, company, novel etc..For example, inquiry " film that Tom's hanks are acted the leading role ", it is desirable to To be a classification be " film " list of entities.
Traditional entity search is divided into three classes:Interrogation reply system based on webpage, the information extraction mode based on webpage and base The way of search demarcated in type.Interrogation reply system based on webpage finds answering for particular problem by excavating the diversity of webpage Case.It needs to search for the certain types of information near some keywords and verifies more evidences to determine final answer Case.And the information extraction based on webpage try find it is all<Query word, entity>Right, it needs to record a large amount of contexts and system Count match information.The search demarcated based on type is intended to search for certain types of information, it need keyword and type word it Between match some adjacent modes, then add up all match information to form final ordering score.
Sentence is embedded into a lower dimensional space by deep learning method using convolutional neural networks, and is kept between them Syntax and semantic relation, but they only be used only the implication of entity in itself, do not account for the inherent meaning of entity so that structure The order models made have larger deviation.
Previous entity sort method either focuses on inquiry and the co-occurrence of entity or straight based on specific model hypothesis Connect the relation weighed and inquired about between entity.However, co-occurrence feature is for representing that the relation between inquiry and entity is too weak, it is another Aspect, they seldom consider the semantic relation between inquiry and entity, the exactly user query word behind of semantic relation concern Demand.Semantic relation strengthens document full-text search and is extracting and handling the ability in semantic information, in particular improves in full Retrieve the ability of semantic ambiguity and semantic extension.
The content of the invention
It is an object of the invention to provide a kind of Entity Semantics based on comprehensive knowledge and the sort method and device of word frequency, The problem of to solve predicted query in the prior art and inaccurate physical correlation.
In order to solve the above technical problems, the technical scheme is that:
The sort method of a kind of Entity Semantics and word frequency based on comprehensive knowledge of the present invention, comprises the following steps:
1) description information on entity of external resource is collected, entity is extended;
2) according to query specification and entity description, data flow is extracted;
3) data flow is done into word segmentation processing, obtains word stream;
4) words-frequency feature and semantic feature of word stream are extracted, and regard the feature of extraction as sequence learning method in the lump Input, obtains the entity collating sequence arranged according to similarity between inquiry and entity.
Further, climbed using the cross-referenced and Vertical Website of multithreading, reptile agent pool, multiple search engines Technology is taken to collect the description information of external resource.
Further, title, text or title and the combination of text of the data flow for inquiry with entity.
Further, the word segmentation processing includes Chinese word segmentation processing and 2-gram word segmentation processings.
Further, the words-frequency feature includes TF-IDF features, BM25 features and LMIR features.
Further, the semantic feature include the inquiry and entity obtained using word2vec similarity and feature, Weighted Similarity and feature, maximum similarity feature and maximum weighted similarity feature.
Further, the sequence learning method is the sequence learning method based on point mode.
Further, the grader used in the sequence learning method based on point mode includes AdaBoost, random Forest and ExtraTree.
The collator of a kind of Entity Semantics and word frequency based on comprehensive knowledge of the present invention, including following module:
Description information for collecting external resource, extends the module of entity;
For according to query specification and entity description, extracting the module of data flow;
For data flow to be done into word segmentation processing, the module of word stream is obtained;
Words-frequency feature and semantic feature for extracting word stream, and it regard the feature of extraction as sequence learning method in the lump Input, obtain according between inquiry and entity similarity arrange entity collating sequence module.
Further, climbed using the cross-referenced and Vertical Website of multithreading, reptile agent pool, multiple search engines Technology is taken to collect description information of the external resource on entity.
Beneficial effects of the present invention:
The present invention designs a variety of validity features, including word frequency spy after domain knowledge extension entity object is crawled Seek peace semantic feature, the correlation come using sequence learning method between predicted query and entity.The present invention makes full use of bottom Text words-frequency feature and high level Entity Semantics feature, can preferably embody inquiry and the correlation between entity, carry The result of high retrieval performance, i.e. entity search is more accurate, and then improves the satisfaction of user search.
Brief description of the drawings
Fig. 1 is flow chart of the method for the present invention;
Fig. 2 is the example ranking results figure based on the present invention.
Embodiment
To make the objects, technical solutions and advantages of the present invention clearer, below in conjunction with the accompanying drawings and embodiment, to the present invention It is described in further detail, but embodiments of the present invention are not limited thereto.
The Entity Semantics based on comprehensive knowledge of the present invention and the sort method embodiment of word frequency:
Inquiry is the set of one group of keyword or key phrase, the demand for describing user.One entity is a spy Levy independent individual, such as personage, restaurant, movie or television play etc..Target is the entity that inquiry meets specific description, such as Certain impression under evaluation or restaurant environment for certain film etc..The present invention is used to train an order models, and Order based on this predicting candidate entity.
First, entity description information and user comment information from Baidu, bean cotyledon and popular comment is collected to extend reality Body, wherein the technology applied to has:Multi-thread design, reptile agent pool, cross-referenced, the Vertical Website of multiple search engines are climbed Take etc..Multi-thread design will make it is multiple crawl thread parallel, greatly speeded up the speed crawled;Reptile agent pool will be avoided The anti-reptile obstacle caused by frequently crawling;By the cross-referenced help between multiple search engines, we search entity More accurate description;It is vertical to crawl that the information for being beneficial to crawl is more targeted, while also improving the effect crawled Rate.
For query specification and entity description, data flow is extracted, data flow can be title, text or title and text Combination.These three data flows all can obtain final sequence learning outcome.The data flow combined below for text with title Do following processing.
Then, word segmentation processing is carried out to inquiry and entity.Word segmentation processing is handled using two methods, and one kind is Chinese point Word, one kind is 2-gram words.Chinese word segmentation is that a word sequence is divided into a basic Chinese word unit.It is in It is a crucial step in literary Language Processing, but the performance of algorithm depends on domain lexicon and corpus used.Big portion The segmentation methods divided can not handle polysemant and unregistered word well, meanwhile, most of Chinese phrase is all based on 2 words , so adding 2-gram words method for expressing again as a supplement of participle.Both participle processing methods are obtained Multiple word stream merger at one piece, obtain a word stream.
Then, a variety of validity features are designed, carry out the input in the lump as subsequent ranking algorithms, so as to more accurate The each inquiry-entity of prediction between similarity probability, realize accurate entity sequence or recommend.Specifically, in this implementation In example, using the text feature of bottom, contextual feature and high-level semantics features.
Text feature includes TF-IDF features, BM25 features, LMIR features.Wherein, LMIR includes three smoothing methods again, It is LMIR.JM, LMIR.ABS and LMIR.DIR respectively, LMIR is based on consistent a priori assumption, for calculating and language model phase The feature of pass, language model is attributed to conditional probability p (q | d) calculating:
Lower mask body introduces this several feature.
1) in the method for digging of web page contents, TF-IDF (Term Frequency-Inverse Document Frequency it is) a kind of conventional weighting technique explored for information retrieval with information, to assess a word to a file The importance of collection or a copy of it file in a corpus.
TF-IDF is actually TF*IDF, and word frequency (Term Frequency, TF) is referred in the given file of portion, The number of times that some given word occurs in this document, if the number of times that some word occurs in the data flow is more, then This word contribution degree in terms of the implication of this data flow is described is bigger.For the word t of a certain specific data streamiFor, it Importance be represented by:
Wherein, tfi,jRepresent word t in data flowiWord frequency importance, ni,jIt is the word in data flow djIn go out occurrence Number,It is data flow djIn all words occurrence number sum.
Reverse document-frequency (Inverse Document Frequency, IDF) is the universal important of one word of measurement Property, some word occurs in more documents, then this word should be smaller to the contribution degree of a certain document.A certain particular words IDF, can the file by total number of files divided by comprising the word number, then obtained business taken the logarithm obtained:
Wherein, idfiRepresent tiInverse document frequency, | D | be corpus in data flow summary,For comprising Word tiNumber of data streams, i.e. ni≠ 0 number of data streams, but n in practicei≠ 0 it is difficult to ensure that, it is necessary to do smooth, So when system is realized, allowing denominator molecule respectively to add 0.5 so that system robust is a little, and formula change is as follows:
Then, TF-IDF is:
tfidfi,j=tfi,j*idfi
Frequent words frequency in a certain specific file, and low document-frequency of the word in whole file set, The TF-IDF of high weight can be produced.TF-IDF tends to filter out common word, retains important word.
2) BM25 feature extracting methods are to propose that it is typical probability retrieval model by Robertson et al..BM25 Model is built upon on orthogonal hypothesis between all elements, but in fact, the element in identical document each other Between be not isolated, more or less semantic relation is there is between them, this relation causes contextual elements one Determine that the correlation of document interior element will be influenceed in degree.It is the ranking functions of an experience, and calculation formula is as follows:
Wherein, query statement is by query word q1……qiComposition, idf (qi) be query word IDF values, f (qi, d) it is document Q in diThe number of times of appearance, f (qi, q) it is query word q in inquiry qiThe number of times of appearance, | d | it is the summary of word in document d, avg (d) be document in whole data set average length, herein experience setting k1=2.0, k3=0, b=0.75.
3) LMIR.JM is Jelinek-Mercer methods, by realizing one between Maximum-likelihood estimation and language material model Individual linear interpolation estimates p (qi| d), it is a simple mixed model, i.e.,:
p(qi| d)=(1- λ) pml(qi|d)+λp(qi|C)
Wherein, λ=0.1 is used for the influence power of Controlling model, pml(qi| d) with p (qi| C) it is respectively qiIn document d and language material Frequency in the C of storehouse.
4) LMIR.DIR provides p (q based on Dirichlet priorii| Bayes d) smoothly estimates:
Wherein, smoothing parameter μ is set to 2000.
5) LMIR.ABS is Absoute discounting methods, and the document probability and language of word are realized by subtraction Expect a compromise between the probability of storehouse, its calculating p (qi| d) it is calculated as follows:
Wherein, | d |μRepresent the number of various words in document d, δ ∈ [0,1] are the constants subtracted, set herein δ= 0.7.Wherein f (qi| d) represent query word qiThe number of times occurred in document d, it is general that p (w | C) represents that word w occurs in classification C Rate.
For semanteme of word feature, word2vec is frequently used for producing the embedded vector of word, and it is in substantial amounts of corpus On set up the neutral net of two layers, the term vector of a higher-dimension is then exported to each word.In order to set up sentence with Semantic similarity measurement between sentence, the similarity between our first defined terms and sentence.
The kit that Word2vec is Google to be used to obtain word vector in one of release of increasing income in 2013, it Simply, efficiently, fast and effeciently a word can be expressed as by the training pattern after optimization according to given corpus Vector form.
Inquire about qiSimilarity between sentence s is defined as word s in sentence siWith inquiry qiBetween maximum similarity, i.e.,:
Wherein, qiAnd sjIt is the vector that length normalization method is 1, they are all the vectors generated by word2vec algorithms.
By all inquiry qi∈ q (i=1,2 ..., m) it is arranged in matrix Q=[q1,q2,…,qm]TWith it is allIt is arranged in S=[s1,s2,…,sn]T, so as to obtain:
R=QST
Wherein, R=[r1,r2,…,rm]T,Then:
sim(qi, s)=| | ri||
Based on summing and asking significant operational, 4 kinds of statistical semantic features can be defined, wherein having similarity and (Sum of Similarity, SS), Weighted Similarity and (Sum of Weighted Similarity, SWS), maximum similarity value (Max of Similarity, MS) and maximum weighted Similarity value (Max of Weighted Similarity, MWS), i.e.,:
When calculated inquiry and entity answer in each sentence between similarity after, we take all sentences similar The average value and maximum of degree.
Finally, the result of features described above is ranked up study, finds a label for being capable of Accurate Prediction unknown sample Decision function.Herein, the sequence learning method of selection point mode (point-wise).The feature of said extracted is made in the lump For the input for the learning method that sorts, the entity collating sequence arranged according to similarity between inquiry and entity is obtained.Wherein, select Combined method is as our point sort algorithm, for predicting the similarity probability between each inquiry-entity pair.We are main Selected by cross validation method from AdaBoost, random forest and ExtraTree graders.For point-wise sequences Algorithm, combined number is randomly choosed from interval [100,500], and the depth set is randomly choosed from { 4,6,8,10,12 }, is used In the model parameter that searching is optimal.
The model of point sort algorithm is simple, and the training time is short.It is related with it is irrelevant be relative concept, only document need to be pressed Fraction is ranked up from high to low, the fraction without accurately predicting each document.As other embodiment, also it may be selected To sort algorithm and list ordering algorithm.But, to sort algorithm compared with a sort algorithm, model is complex, during training Between it is longer, it is desirable to have relatively efficient learning algorithm;List ordering algorithm there are problems that in actual applications, for example:Instruction Practice data to be relatively difficult to obtain, for given inquiry, mark person needs to carry out a relevancy ranking to all documents, takes When it is laborious, can not objectively obtain substantial amounts of training data.
Further illustrated below with an instantiation.
As shown in Fig. 2 when we inquire about lexical item " on the Embroidered-Uniform Guard ", based on candidate's film (entity) " embroidering spring knife ", " brocade Clothing is defended " etc. the description of correspondence film and evaluation information, set up order models on training set, prediction test Integrated query and each wait The direct similarity of entity is selected, they are ranked up according to similarity and obtains a sequence, finally, contrast test is concentrated all Inquiry it is corresponding sequence the true sequence of entity between difference, use Average Accuracy (Mean Average Precision, MAP) weigh and obtain evaluation of estimate (it be located between 0 and 1).MAP is bigger, illustrates the accuracy of order models Better.
The Entity Semantics based on comprehensive knowledge of the present invention and the collator embodiment of word frequency:
The collator of a kind of Entity Semantics and word frequency based on comprehensive knowledge of the present invention, including following module:For The description information on entity of external resource is collected, the module of entity is extended;For according to query specification and entity description, carrying Take the module of data flow;For data flow to be done into word segmentation processing, the module of word stream is obtained;Word frequency for extracting word stream is special Seek peace semantic feature, and the feature of extraction is obtained according between inquiry and entity in the lump as the input of sequence learning method The module of the entity collating sequence of similarity arrangement.
The device is actually based on the Entity Semantics based on comprehensive knowledge of the present invention and the sort method flow of word frequency A kind of computer solution, i.e., a kind of software architecture, above-mentioned various modules are each processing corresponding with method flow Process or program.Because the sufficiently clear of the introduction to the above method is complete, therefore the device is no longer described in detail.
Although present disclosure is discussed in detail by above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read the above, for the present invention's A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims (10)

1. a kind of sort method of Entity Semantics and word frequency based on comprehensive knowledge, it is characterised in that comprise the following steps:
1) description information on entity of external resource is collected, entity is extended;
2) according to query specification and entity description, data flow is extracted;
3) data flow is done into word segmentation processing, obtains word stream;
4) words-frequency feature and semantic feature of word stream are extracted, and regard the feature of extraction as the defeated of sequence learning method in the lump Enter, obtain the entity collating sequence arranged according to similarity between inquiry and entity.
2. the sort method of Entity Semantics and word frequency according to claim 1 based on comprehensive knowledge, it is characterised in that adopt Crawled technology with the cross-referenced and Vertical Website of multithreading, reptile agent pool, multiple search engines and collected outside money Description information of the source on entity.
3. the sort method of Entity Semantics and word frequency according to claim 1 based on comprehensive knowledge, it is characterised in that institute State title, text or title and the fusion of text of the data flow for inquiry and entity.
4. the sort method of Entity Semantics and word frequency according to claim 1 based on comprehensive knowledge, it is characterised in that institute Stating word segmentation processing includes Chinese word segmentation processing and 2-gram word segmentation processings.
5. the sort method of Entity Semantics and word frequency according to claim 1 based on comprehensive knowledge, it is characterised in that institute Stating words-frequency feature includes TF-IDF features, BM25 features and LMIR features.
6. the sort method of Entity Semantics and word frequency according to claim 1 based on comprehensive knowledge, it is characterised in that institute State semantic feature include the similarity and feature of the inquiry and entity obtained using word2vec, Weighted Similarity and feature, Maximum similarity feature and maximum weighted similarity feature.
7. the sort method of Entity Semantics and word frequency according to claim 1 based on comprehensive knowledge, it is characterised in that institute It is the sequence learning method based on point mode to state sequence learning method.
8. the sort method of Entity Semantics and word frequency according to claim 7 based on comprehensive knowledge, it is characterised in that institute Stating the grader used in the sequence learning method based on point mode includes AdaBoost, random forest and ExtraTree.
9. a kind of collator of Entity Semantics and word frequency based on comprehensive knowledge, it is characterised in that including following module:
The description information on entity for collecting external resource, extends the module of entity;
For according to query specification and entity description, extracting the module of data flow;
For data flow to be done into word segmentation processing, the module of word stream is obtained;
Words-frequency feature and semantic feature for extracting word stream, and it regard the feature of extraction as the defeated of sequence learning method in the lump Enter, obtain the module of entity collating sequence arranged according to similarity between inquiry and entity.
10. the collator of Entity Semantics and word frequency according to claim 9 based on comprehensive knowledge, it is characterised in that Crawl technology using the cross-referenced and Vertical Website of multithreading, reptile agent pool, multiple search engines and collect outside Description information of the resource on entity.
CN201710252110.0A 2017-04-17 2017-04-17 Entity semantics and word frequency ordering method and device based on comprehensive knowledge Expired - Fee Related CN107066589B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710252110.0A CN107066589B (en) 2017-04-17 2017-04-17 Entity semantics and word frequency ordering method and device based on comprehensive knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710252110.0A CN107066589B (en) 2017-04-17 2017-04-17 Entity semantics and word frequency ordering method and device based on comprehensive knowledge

Publications (2)

Publication Number Publication Date
CN107066589A true CN107066589A (en) 2017-08-18
CN107066589B CN107066589B (en) 2020-04-10

Family

ID=59600458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710252110.0A Expired - Fee Related CN107066589B (en) 2017-04-17 2017-04-17 Entity semantics and word frequency ordering method and device based on comprehensive knowledge

Country Status (1)

Country Link
CN (1) CN107066589B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256968A (en) * 2018-01-12 2018-07-06 湖南大学 A kind of electric business platform commodity comment of experts generation method
CN108460016A (en) * 2018-02-09 2018-08-28 中云开源数据技术(上海)有限公司 A kind of entity name analysis recognition method
CN108733745A (en) * 2018-03-30 2018-11-02 华东师范大学 A kind of enquiry expanding method based on medical knowledge
CN110765239A (en) * 2019-10-29 2020-02-07 腾讯科技(深圳)有限公司 Hot word recognition method, device and storage medium
CN111159348A (en) * 2019-12-30 2020-05-15 苏州电力设计研究院有限公司 User behavior intention mining method based on entity search words
CN111401052A (en) * 2020-04-24 2020-07-10 南京莱科智能工程研究院有限公司 Semantic understanding-based multilingual text matching method and system
CN112487195A (en) * 2019-09-12 2021-03-12 医渡云(北京)技术有限公司 Entity sorting method, device, medium and electronic equipment
CN112948537A (en) * 2021-01-25 2021-06-11 昆明理工大学 Cross-border national culture text retrieval method integrating document word weight
CN114064855A (en) * 2021-11-10 2022-02-18 国电南瑞南京控制系统有限公司 Information retrieval method and system based on transformer knowledge base

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101059806A (en) * 2007-06-06 2007-10-24 华东师范大学 Word sense based local file searching method
CN104199809A (en) * 2014-04-24 2014-12-10 江苏大学 Semantic representation method for patent text vectors
CN104408148A (en) * 2014-12-03 2015-03-11 复旦大学 Field encyclopedia establishment system based on general encyclopedia websites

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101059806A (en) * 2007-06-06 2007-10-24 华东师范大学 Word sense based local file searching method
CN104199809A (en) * 2014-04-24 2014-12-10 江苏大学 Semantic representation method for patent text vectors
CN104408148A (en) * 2014-12-03 2015-03-11 复旦大学 Field encyclopedia establishment system based on general encyclopedia websites

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256968A (en) * 2018-01-12 2018-07-06 湖南大学 A kind of electric business platform commodity comment of experts generation method
CN108256968B (en) * 2018-01-12 2022-03-18 湖南大学 E-commerce platform commodity expert comment generation method
CN108460016A (en) * 2018-02-09 2018-08-28 中云开源数据技术(上海)有限公司 A kind of entity name analysis recognition method
CN108733745B (en) * 2018-03-30 2021-10-15 华东师范大学 Query expansion method based on medical knowledge
CN108733745A (en) * 2018-03-30 2018-11-02 华东师范大学 A kind of enquiry expanding method based on medical knowledge
CN112487195B (en) * 2019-09-12 2023-06-27 医渡云(北京)技术有限公司 Entity ordering method, entity ordering device, entity ordering medium and electronic equipment
CN112487195A (en) * 2019-09-12 2021-03-12 医渡云(北京)技术有限公司 Entity sorting method, device, medium and electronic equipment
CN110765239A (en) * 2019-10-29 2020-02-07 腾讯科技(深圳)有限公司 Hot word recognition method, device and storage medium
CN110765239B (en) * 2019-10-29 2023-03-28 腾讯科技(深圳)有限公司 Hot word recognition method, device and storage medium
CN111159348A (en) * 2019-12-30 2020-05-15 苏州电力设计研究院有限公司 User behavior intention mining method based on entity search words
CN111159348B (en) * 2019-12-30 2023-10-20 苏州电力设计研究院有限公司 User behavior intention mining method based on entity retrieval words
CN111401052A (en) * 2020-04-24 2020-07-10 南京莱科智能工程研究院有限公司 Semantic understanding-based multilingual text matching method and system
CN112948537A (en) * 2021-01-25 2021-06-11 昆明理工大学 Cross-border national culture text retrieval method integrating document word weight
CN114064855A (en) * 2021-11-10 2022-02-18 国电南瑞南京控制系统有限公司 Information retrieval method and system based on transformer knowledge base

Also Published As

Publication number Publication date
CN107066589B (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN107066589A (en) A kind of sort method and device of Entity Semantics and word frequency based on comprehensive knowledge
Mitra Exploring session context using distributed representations of queries and reformulations
Biancalana et al. An approach to social recommendation for context-aware mobile services
Liao et al. Task trail: An effective segmentation of user search behavior
Lofi Measuring semantic similarity and relatedness with distributional and knowledge-based approaches
US20090319342A1 (en) System and method for aggregating and summarizing product/topic sentiment
US20090265338A1 (en) Contextual ranking of keywords using click data
US20130110839A1 (en) Constructing an analysis of a document
WO2008131607A1 (en) A system and method for intelligent ontology based knowledge search engine
Kanwal et al. A review of text-based recommendation systems
Zhu et al. A recommendation engine for travel products based on topic sequential patterns
Raman et al. Understanding intrinsic diversity in web search: Improving whole-session relevance
Khatter et al. Content curation algorithm on blog posts using hybrid computing
Ren et al. Synonym discovery for structured entities on heterogeneous graphs
Moumtzidou et al. Discovery of environmental nodes in the web
Ren et al. Role-explicit query extraction and utilization for quantifying user intents
Yang et al. A new ontology-supported and hybrid recommending information system for scholars
de Gemmis et al. A retrieval model for personalized searching relying on content-based user profiles
Belém et al. Tagging and Tag Recommendation
KR101137491B1 (en) System and Method for Utilizing Personalized Tag Recommendation Model in Web Page Search
Heggo et al. Textual Matching Framework for Measuring Similarity Between Profiles in E-recruitment
Heggo et al. Behaviorally-Based Textual Similarity Engine for Matching Job-Seekers with Jobs
Cheng Relevance feedback-based optimization of search queries for Patents
Sharma Hybrid Query Expansion assisted Adaptive Visual Interface for Exploratory Information Retrieval
Yu et al. Automatic role‐explicit query extraction: a divide‐and‐conquer system leveraging on users' reformulating behaviors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200410

Termination date: 20210417