CN103020311A - Method and system for processing user search terms - Google Patents

Method and system for processing user search terms Download PDF

Info

Publication number
CN103020311A
CN103020311A CN2013100058046A CN201310005804A CN103020311A CN 103020311 A CN103020311 A CN 103020311A CN 2013100058046 A CN2013100058046 A CN 2013100058046A CN 201310005804 A CN201310005804 A CN 201310005804A CN 103020311 A CN103020311 A CN 103020311A
Authority
CN
China
Prior art keywords
term
vocabulary
word
user
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100058046A
Other languages
Chinese (zh)
Other versions
CN103020311B (en
Inventor
车天文
雷大伟
石志伟
周步恋
杨振东
王更生
王喜民
何宏靖
徐忆苏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen easou world Polytron Technologies Inc
Original Assignee
Shenzhen Yisou Science & Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yisou Science & Technology Development Co Ltd filed Critical Shenzhen Yisou Science & Technology Development Co Ltd
Priority to CN201310005804.6A priority Critical patent/CN103020311B/en
Publication of CN103020311A publication Critical patent/CN103020311A/en
Application granted granted Critical
Publication of CN103020311B publication Critical patent/CN103020311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention belongs to the field of information retrieval, and provides a method for processing user search terms. The method comprises the following steps: establishing a resource pool related with core words for recognizing user search; performing basic layering to the search terms input by a user; performing entity introduction to the search terms after basic layering; and outputting the hierarchies of the recognized search terms. The invention also provides a system for processing user search terms. Through the adoption of the technical scheme, the accuracy rate of entity extraction is ensured, so that the problem of local optimization caused by inspecting the hierarchy only relying on vocabulary as well as the problem of lack of identification of the special entity caused by only relying on studying the overall sentence structure are avoided; ultimately, the core words of sentence search is further optimized by virtue of the subordinate relationship, the core vocabulary of the sentence is recognized for users, as much information support is provided as possible for the search engine; and meanwhile, the result information from the online search engine is not totally relied on, and the method is easier to operate and implement.

Description

A kind of disposal route of user search word and system
Technical field
The present invention relates to information retrieval field, relate to especially a kind of disposal route and system of user search word.
Background technology
The appearance of search engine allows the user have can to search from mass data, the instrument of obtaining information.But be not the principle that every user understands search engine, so most user generally is oneself to organize query statement search, and think that the query word of input is more when using search engine, just more can obtain satisfied Search Results more in detail.And in fact may not, on the one hand, consider based on performance, search engine has the maximum length restriction to the query statement of user's input, surpasses maximum length will block, and only goes to retrieve with part.On the other hand, in the result that it returns, as long as return with the capital of term, comprise a large amount of irrelevant informations, accuracy rate is low, can not hit user's real intention.
And present search engine can be introduced merchant advertisement as a kind of means of income according to user's input.But the advertisement of sometimes getting has absolutely nothing to do with each other with user's input information.Main cause or search engine fail to identify user's core demand, have just hit the partial query word of user search.
So, how to allow Search Results more satisfy user's requirement, the essential requirement of more being close to the users just is appreciated that the retrieving information that the user inputs.Consider the complicacy of actual language, the retrieve statement of user's input has a lot of words that is used for restriction, and these words itself are little for the practical significance of retrieval.Therefore search engine need to be identified core or the trunk portion of retrieval, allow what hit in the Search Results is the core word of user search statement, the trunk word, but not hit be some have little significance abandon word or qualifier.How from user's search need, to extract corresponding core word, become term in the present search engine (Query) and analyze one of urgent problem.
Input the retrieve statement of oneself as the user, search engine can be done analysis to this statement automatically, the core word of identification user search input, and core word must hit and just go out Search Results; Identification user input abandon word or qualifier, this class word is with or without hit results and does not have what impact.So just can allow the result for retrieval (comprising advertisement) that shows more can satisfy user's core demand.
Up to now, the scheme of relevant search engine identification user search core word aspect is less, is summed up nothing more than following several, and a kind of click information of Search Results that is based on is afterwards extracted corresponding core word; Another is based on word Architecture Analysis Chinese semantic meaning.
For example, the patent of Chinese patent CN102043845A provides a kind of method and apparatus for extract kernel keyword based on query sequence cluster, comprise, when occurring the search need of the Search Results that a large amount of identical users click in the network, what these search needs often reflected is identical theme.By obtaining the query sequence cluster of multiple queries sequence, the corresponding at least Search Results that identical user clicks of each search sequence, extract corresponding kernel keyword, obtained to input the user's of the search sequence in this query sequence cluster search need, can also provide the search suggestion of more pressing close to or the search need of being correlated with for the user according to this kernel keyword, so that the user obtains better search experience.Its weak point is: at first high to the search engine requirement, require its performance, effect stability, and Search Results can satisfy user's demand substantially, and it is just reliable that the user who obtains like this clicks the result, just consistent with user's actual need based on this analyzing and processing of doing; Secondly, Search Results generally all is to obtain after processing was done in user's retrieval, and such as the Query expansion, Query synonym etc. so that not necessarily contain user's term in the Search Results, so just can't directly extract the core word of user search.
For example, the patent of Chinese patent CN102681982A can allow the method for automatic semantic identification of natural language sentences of computer understanding, a kind of method of computing machine accurate understanding Chinese Han language has been proposed, it has abandoned the method that word selection is in the past got word, language feature from Chinese, by the word framework, allow accurately computing machine know the language content that the operator inputs; The definite meaning of one's words that analyzes a Chinese sentence.At first set up ontology library in certain field, the unambiguous word of accurate descriptions all in certain field is returned to put together consist of ontology library (comprising domain knowledge ontology library and this exam pool of general term); Then based on understanding and the domain body of natural language sentences, set up the semantic frame knowledge base; The Ontology Mapping of last semantic-based framework realizes that natural language sentences are to the coupling directly perceived of semantic structure.Its weak point is: at first internet arena information increases severely every day, and some new terms also progressively produce, and some common vocabulary also progressively possess new meaning, for this class word, as core word or the auxiliary word of modification, relevant with the user search statement, can't lump together; The semantic frame knowledge base is similar to regularity again, and enormous amount can't be concluded fast, and effect needs further to investigate to improve.
Based on the afterwards core word identification of the user search of search, at first search engine is had higher requirements, stable in system performance, could support in the reasonable situation of effect; Next is too dependent on Search Results and user's reaction, easily introduce some unnecessary noises (such as advertisement, out of Memory etc.), and Search Results obtains through all kinds of conversion, not necessarily contains user's term in the Search Results, and retrieve statement is not necessarily directly on the correspondence.The result who again obtains under the line can only play reference function when subsequent user is inputted identical, similar Query, thereby recall rate is lower.
Based on the core word recognition methods of the retrieval of setting up the semantic frame knowledge base, to the particular entity undertreatment, there is not well to distinguish the entity word of the common meaning of word of that class; The rule that the semantic frame knowledge base is comprised of all kinds of words, and summarizing needs long time, and effect also needs progressively to improve.
Summary of the invention
The technical matters that the present invention solves has been to provide a kind of disposal route and system of user search word, to solve present None-identified user search core word problem.
For addressing the above problem, the embodiment of the invention provides a kind of disposal route of user search word, comprise,
Set up the resources bank relevant with the core word of identification user search;
Term to user's input carries out basic layering;
Term after the described basic layering is carried out entity to be introduced;
The hierarchical structure of the term that output identifies.
Above-mentioned method, wherein, described foundation comprises with the relevant resources bank of core word of identification user search, a series of vocabularys relevant with the core word of identification user search comprise inactive vocabulary, modification vocabulary and actual resource dictionary.
Above-mentioned method, wherein, described term to user's input carries out basic layering and comprises,
After the user search statement is carried out participle, can obtain a series of inquiry vocabulary term and part of speech pos, comprise term[1] _ pos[1], term[2] _ pos[2] ..., term[n] _ pos[n], term[i wherein] be i vocabulary, pos[i] be its corresponding part of speech;
Utilize the part of speech of inactive vocabulary, modification vocabulary and the vocabulary of resources bank that basic layering realized in the inquiry vocabulary of user's input, specific as follows,
level [ i ] = 0 term [ i ] ∈ stopwordList | | pos [ i ] cposList 1 term [ i ] ∈ mod ifywordList 2 other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, stopwordList is the vocabulary of stopping using, requirewordList is the demand vocabulary, cposList is the unessential part of speech table of a class, including but not limited to adjective, adverbial word, preposition, interjection, auxiliary word, modal particle, conjunction, symbol;
If term[i] belong to the stop words vocabulary or its part of speech belongs to cposList, level[i] be 0; If term[i] belong to qualifier, level[i] be 1; Other situation is 2.
Above-mentioned method wherein, is describedly carried out entity with the term after the described basic layering and is introduced and to comprise,
According to the retrieve statement of entity dictionary in conjunction with the user, extract actual entity word finder entityList;
level [ i ] = 2 term [ i ] ∈ entityList level [ i ] other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, entityList is the entity set of extraction.
Above-mentioned method, wherein described according to the retrieve statement of entity dictionary in conjunction with the user, extract actual entity word finder entityList and comprise,
Consider that the user search classification is relevant, when the classification of entity is relevant with classified information, then carry out the entity word and extract; Perhaps,
Utilizing statement law to carry out the entity word extracts.
Above-mentioned method further, also comprised before the hierarchical structure of the user search word that output identifies,
Described user search word is carried out a formula syntactic analysis; And/or,
The user search word is carried out subordinate relation identification.
The embodiment of the invention also provides a kind of disposal system of user search word, comprises,
Resources bank is set up module, is used for setting up the resources bank relevant with the core word of identification user search;
Basic hierarchical block is used for the term of user's input is carried out basic layering;
Entity is introduced module, is used for that the term after the described basic layering is carried out entity and introduces;
Output module is for the hierarchical structure of exporting the term that identifies.
Above-mentioned system, wherein, described foundation comprises with the relevant resources bank of core word of identification user search, a series of vocabularys relevant with the core word of identification user search comprise inactive vocabulary, modification vocabulary and actual resource dictionary.
Above-mentioned system wherein, is used for that the term that the user inputs is carried out basic layering and specifically comprises,
Described basic hierarchical block, be used for after the user search statement is carried out participle, can obtain a series of inquiry vocabulary term and part of speech pos, comprise term[1] _ pos[1], term[2] _ pos[2] ..., term[n] _ pos[n], term[i wherein] be i vocabulary, pos[i] be its corresponding part of speech;
And be used for utilizing the part of speech of inactive vocabulary, modification vocabulary and the vocabulary of resources bank that basic layering realized in the inquiry vocabulary of user's input, it is specific as follows,
level [ i ] = 0 term [ i ] ∈ stopwordList | | pos [ i ] cposList 1 term [ i ] ∈ mod ifywordList 2 other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, stopwordList is the vocabulary of stopping using, requirewordList is the demand vocabulary, cposList is the unessential part of speech table of a class, including but not limited to adjective, adverbial word, preposition, interjection, auxiliary word, modal particle, conjunction, symbol;
If term[i] belong to the stop words vocabulary or its part of speech belongs to cposList, level[i] be 0; If term[i] belong to qualifier, level[i] be 1; Other situation is 2.
Above-mentioned system further, also comprises,
Sentence formula syntactic analysis module is used for described user search word is carried out a formula syntactic analysis;
The subordinate relation identification module is used for the user search word is carried out subordinate relation identification.
Adopt technical scheme of the present invention, both considered the lexical feature of retrieve statement, consider again the special role of entity word, and introduce entity and carry out entity disambiguation operation, ensure the accuracy rate of entity extraction, and by the sentence formula syntactic analysis comes the retrieve statement of user's integral body is analyzed, avoided only relying on vocabulary to investigate the local optimum problem that level causes, and the problem to the particular entity lack of identification that only relies on the holistic approach sentence structure to cause.The final core word of further optimizing again retrieve statement by subordinate relation, the core vocabulary of identification user sentence is for search engine provides information support as much as possible.Not exclusively depend on simultaneously the object information of search engine on the line, be easier to operation and realize.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of a part of the present invention, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 is the first embodiment of the invention process flow diagram;
Fig. 2 is the second embodiment of the invention structural drawing.
Embodiment
In order to make technical matters to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.
When retrieval, the user can input retrieve statement as required, and in general, retrieve statement is made of several terms.In view of rich, the complicacy of Chinese language, the statement of user search input is diversified, in order to describe the demand of oneself in detail, does not stint word.But in fact a lot of vocabulary all are the words that can be used as assistant analysis, make the meaning of expression clearer and more definite, and are little for the practical significance of retrieval.In an embodiment of the present invention, the term that contains in the retrieve statement with the user is divided into four grades:
Abandoning word, is the word that does not have what practical significance, such as stop words, punctuation mark etc., can directly abandon and not join search inquiry, can improve recall precision and the Suo Xiaoguo that do not lapse;
Qualifier, namely the user expresses the word of the modification character of self using when semantic, does not play absolute effect, and is just abundant semantic, can hit also in the Search Results and can not hit;
Core word, i.e. the core of user search statement can be expressed the word of user search demand information, must hit just in the Search Results and can return to the user;
The demand word, i.e. a kind of attribute of the things of user's actual needs generally is the user to be replenished or emphasizes demand a kind of, such as " download ", " song ", " lyrics ", " film " etc., if hit in the Search Results better, this resource has been described, rank is forward.
As shown in Figure 1, be the first embodiment of the invention process flow diagram, a kind of disposal route of user search word is provided, specifically comprise,
Step S101 sets up the resources bank relevant with the core word of identification user search;
Resources bank is a series of vocabularys relevant with the core word of identification user search, comprises inactive vocabulary (stopwordList), modifies vocabulary (modifywordList), and actual resource dictionary (dicResource).
Inactive vocabulary comprises the common a series of inactive vocabulary of Chinese, as " ", " in ", " what "; Modify vocabulary and comprise common qualifier, such as " beauty ", " good-looking " etc.; The actual resource dictionary, comprise current all kinds of resource name, such as channel resources such as novel name, software name, movie name, with and corresponding classification, this can excavate from retrieve log or from each vertical website crawl, extraction information needed, guarantee that as far as possible the resource information of resources bank is complete.
Step S102, the term that the user is inputted carries out basic layering;
Input the retrieve statement of oneself as the user, after the statement to user search carries out participle, can obtain a series of inquiry vocabulary term and part of speech pos, term[1] _ pos[1], term[2] _ pos[2] ..., term[n] _ pos[n].Term[i] be i vocabulary, pos[i] be its corresponding part of speech.
Utilize the part of speech of inactive vocabulary, modification vocabulary and the vocabulary of resources bank that basic layering realized in the inquiry vocabulary of user's input, specific as follows,
level [ i ] = 0 term [ i ] ∈ stopwordList | | pos [ i ] cposList 1 term [ i ] ∈ mod ifywordList 2 other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, stopwordList is the vocabulary of stopping using, requirewordList is the demand vocabulary, cposList is the unessential part of speech table of a class, including but not limited to adjective, adverbial word, preposition, interjection, auxiliary word, modal particle, conjunction, symbol etc.
If term[i] belong to the stop words vocabulary or its part of speech belongs to cposList, level[i] abandon word for the 0(representative); If term[i] belong to qualifier, level[i] represent qualifier for 1(); Other situation is the 2(core word).
By this step, with each vocabulary initial setting of user search level.
Step S103 carries out entity with the term after the basic layering and introduces;
Importance degree, the grade of the vocabulary that contains in the retrieve statement of user's input are different, how to distinguish vocabulary prior, that the meaning of representing is arranged, and comparatively speaking, the entity word is even more important, generally more can show user's original idea demand.If contain the entity word in the retrieve statement, then to give prominence to the effect of entity word.
It mainly is that important vocabulary with being divided into qualifier in the basic layering or abandoning word drags for that entity is introduced, and again gives its important grade.
In view of importance and the complicacy of entity, need to determine whether entity in conjunction with user's input itself.Such as " why " be a most common word, but also may be present in the entity dictionary, classification is song.How distinguishing the entity word of this class word, especially ambiguity, then is a most important step of this link, can be referred to as entity disambiguation way.
Consider that two kinds of methods extract entity, wherein first method considers that the user search classification is relevant, then extract the classification of entity is relevant with classified information, otherwise need not.
Particularly, first method is utilized external information exactly, and such as Query classification (classification of user search statement), this is commonplace in search engine.Such as user search " the comedy time of Zhou Xingchi downloads ", the Query classification is for downloading class; " May song why audition ", the Query classification is the song class; " why mobile phone does not connect computer ", the Query classification is the question and answer class.
The entity that extracts retrieve statement utilizes these classification information exactly.A common word such as " time ", but actual in the user search in the above is the name of a film, it is a physical name, entity class is that the film class is (by calling above-mentioned actual resource dictionary, can obtain candidate's entity word, entity class in the statement of user input), when Query classification (download class) is relevant with the classification (film) of entity, just with its extraction." why " belongs to stop words for another example, be divided in the basic layering of the first step and abandoned the word grade, pass through the actual resource dictionary at this, also occur as candidate's entity, entity is song class (be named as " why " that song is arranged), Query classification (song class) is relevant with entity class (song), then thinks entity.And in " why mobile phone does not connect computer ", even if " why " occur as candidate's entity, but Query classification (question and answer class) is not related with entity class (song), does not then think entity.
A contingency table can be manually joined in this association flexibly, represents that each Query classification may be relevant with which entity class, such as " download class: song, film, TV play, game, software "; " song class: song "; " video class: film, TV play, animation " etc.
Certainly, actual conditions are, are not that each Query has classification.If what if the statement of user search does not have classification? by experience, if contain obvious entity word among the Query, Query substantially can tell classification, if really do not tell classification, that can be directly comes preferentially according to the length of candidate's entity, the number that is cut into vocabulary, ensures accuracy rate.
Entity is introduced major significance and is " dragging for " core word.After basic layering, according to letter a basic general layering has been arranged, but general common word may abandon word or qualifier level; And this class word is carefully analyzing to find being considerable entity word in fact, so this class word " is dragged for ", gives the core word grade.Such as " because love ", participle be " because love ", " because " too common, be endowed in basic layering meeting and abandon word.But it is the part of entity (song " because love "), can give the core word grade it in this step.As above-mentioned, it is the entity disambiguation that entity is introduced topmost work, namely how to extract real useful entity, and introducing noise still less, ensures recall rate and accuracy rate, and this step has been expected above-mentioned two kinds of methods.
Certainly first method is to rely on outside Query classification, and accuracy rate is higher.
2) utilize statement law to extract: as (name | the demand word)+word T, (name) word T+(demand word), dictionary then extracts if T appears at entity.Such as user search " the beautiful song of Cai Zhuo is why ", " song is why ", at this moment " why " can think entity.
Second method is just directly set about from some rules, such as the entity word generally can and name, demand word (song, film etc.) occur together, especially for the entity word of common meaning.Such as above-mentioned " song is why ", herein " why " is exactly entity, " why mobile phone does not connect computer " herein " why " be not entity, the method realizes simple and easy.
According to the retrieve statement of entity dictionary in conjunction with the user, extract actual entity word finder entityList.
level [ i ] = 2 term [ i ] ∈ entityList level [ i ] other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, entityList is the entity set of extraction.
This step is intended to the vocabulary (basic layering may have been given and abandoned or modify) of the entity that will comprise in the user search statement, and the level of upgrading highlights user's intention.
Step S106, the hierarchical structure of the user search word that output identifies.
For each retrieve statement, by above step, finally obtained hierarchical structure corresponding to each vocabulary that this statement comprises, namely this vocabulary is the demand word, core word, qualifier still abandon word.
Above-mentioned steps has been finished the identification that the user inputs term substantially, but if reach better effect, the embodiment of the invention can also may further comprise the steps, below two step S104 and S105 order in no particular order, also can select one and carry out choice for use:
Step S104 carries out a formula syntactic analysis to described user search word;
Above two steps are introduced with entity by the basic layering to the vocabulary of user input, have realized the layering of vocabulary that the user is inputted, but the angle that all is based on word realizes layering.The retrieve statement of user's input contains a lot of fixing sentence formulas, utilizes some formula rules, can be assisted layered.As (from) $ Adress.* $ Adress; (.* mobile phone) .* downloads; (relation) of (discussion) .* and .*; (take) .* is as the composition of (topic), the modification level can be given in the vocabulary in the bracket.
Also can do interdependent syntactic analysis to the user search statement in addition, the formation of parsing sentence obtains vocabulary that sentence contains and the dependence between the vocabulary, utilizes special sentence structure, and the hierarchical structure of the vocabulary angle based on sentence is adjusted.
This step is on the whole user's read statement to be held, and adjusts the level of vocabulary.
Step S105 carries out subordinate relation identification to the user search word.
As an embodiment, the embodiment of the invention is divided into two classes with subordinate relation: regional subordinate and industry subordinate.
The zone subordinate is the geographic position subordinate, when two place names are subordinate relation, during relationship between superior and subordinate, the higher level address is adjusted into modification.With outstanding core place name.Such as " Haidian, Beijing ", Haidian belongs to Beijing, and then " Haidian " can be more prone to core word than " Beijing ", and " Beijing " just is adjusted to qualifier herein, and regional subordinate can consider to utilize place name to encode the identified region subordinate relation.
The field subordinate is the classification field under the physical name, such as the TV play class, and the film class, song class etc., information source is in above-mentioned entity dictionary.After above-mentioned 103 Entity recognition, according to entity class, if its classification related term occurs before and after the entity, this class word is adjusted into the demand word.Essence, the demand word is a kind of attribute that shows the user search things, so be relevant with concrete entity, generally can follow entity to occur.Therefore after identifying entity, carry out subordinate relation and judge whether the demand word is arranged.Such as " song of Liu Dehua is water lustily ", " lustily water " belongs to " song ", therefore be adjusted into the demand word at this vocabulary " song ", core word is " Liu Dehua " and " lustily water ".Like this, one can the users read statement core vocabulary, two can users essential requirement (song), carry out searching order optimization.User's input " film of Liu Dehua " does not identify subordinate relation herein for another example, and vocabulary " film " still is core word, can not be identified as the demand word, otherwise result for retrieval just may be irrelevant with film.
As shown in Figure 2, be the second embodiment of the invention structural drawing, a kind of disposal system of user search word is provided, comprise,
Resources bank is set up module 201, is used for setting up the resources bank relevant with the core word of identification user search;
Basic hierarchical block 202 is used for the term of user's input is carried out basic layering;
Entity is introduced module 203, is used for that the term after the described basic layering is carried out entity and introduces;
Output module 204 is for the hierarchical structure of exporting the term that identifies.
Further, described foundation comprises with the relevant resources bank of core word of identification user search, and a series of vocabularys relevant with the core word of identification user search comprise inactive vocabulary, modification vocabulary and actual resource dictionary.
Further, described basic hierarchical block is used for that the term that the user inputs is carried out basic layering and specifically comprises,
Described basic hierarchical block, be used for after the user search statement is carried out participle, can obtain a series of inquiry vocabulary term and part of speech pos, comprise term[1] _ pos[1], term[2] _ pos[2] ..., term[n] _ pos[n], term[i wherein] be i vocabulary, pos[i] be its corresponding part of speech;
And be used for utilizing the part of speech of inactive vocabulary, modification vocabulary and the vocabulary of resources bank that basic layering realized in the inquiry vocabulary of user's input, it is specific as follows,
level [ i ] = 0 term [ i ] ∈ stopwordList | | pos [ i ] cposList 1 term [ i ] ∈ mod ifywordList 2 other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, stopwordList is the vocabulary of stopping using, requirewordList is the demand vocabulary, cposList is the unessential part of speech table of a class, including but not limited to adjective, adverbial word, preposition, interjection, auxiliary word, modal particle, conjunction, symbol.
If term[i] belong to the stop words vocabulary or its part of speech belongs to cposList, level[i] be 0; If term[i] belong to qualifier, level[i] be 1; Other situation is 2.
Further, described system also comprises,
Sentence formula syntactic analysis module is used for described user search word is carried out a formula syntactic analysis; And/or
The subordinate relation identification module is used for the user search word is carried out subordinate relation identification.
Above-mentioned explanation illustrates and has described a preferred embodiment of the present invention, but as previously mentioned, be to be understood that the present invention is not limited to the disclosed form of this paper, should not regard the eliminating to other embodiment as, and can be used for various other combinations, modification and environment, and can in invention contemplated scope described herein, change by technology or the knowledge of above-mentioned instruction or association area.And the change that those skilled in the art carry out and variation do not break away from the spirit and scope of the present invention, then all should be in the protection domain of claims of the present invention.

Claims (10)

1. the disposal route of a user search word is characterized in that, comprise,
Set up the resources bank relevant with the core word of identification user search;
Term to user's input carries out basic layering;
Term after the described basic layering is carried out entity to be introduced;
The hierarchical structure of the term that output identifies.
2. method according to claim 1, it is characterized in that, the resources bank that described foundation is relevant with the core word of identification user search comprises, a series of vocabularys relevant with the core word of identification user search, comprise inactive vocabulary, modify vocabulary and actual resource dictionary.
3. method according to claim 2 is characterized in that, described term to user's input carries out basic layering and comprises,
After the user search statement is carried out participle, can obtain a series of inquiry vocabulary term and part of speech pos, comprise term[1] _ pos[1], term[2] _ pos[2] ..., term[n] _ pos[n], term[i wherein] be i vocabulary, pos[i] be its corresponding part of speech;
Utilize the part of speech of inactive vocabulary, modification vocabulary and the vocabulary of resources bank that basic layering realized in the inquiry vocabulary of user's input, specific as follows,
level [ i ] = 0 term [ i ] ∈ stopwordList | | pos [ i ] cposList 1 term [ i ] ∈ mod ifywordList 2 other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, stopwordList is the vocabulary of stopping using, requirewordList is the demand vocabulary, cposList is the unessential part of speech table of a class, including but not limited to adjective, adverbial word, preposition, interjection, auxiliary word, modal particle, conjunction, symbol;
If term[i] belong to the stop words vocabulary or its part of speech belongs to cposList, level[i] be 0; If term[i] belong to qualifier, level[i] be 1; Other situation is 2.
4. method according to claim 3 is characterized in that, describedly term after the described basic layering is carried out entity introduce and to comprise,
According to the retrieve statement of entity dictionary in conjunction with the user, extract actual entity word finder entityList;
level [ i ] = 2 term [ i ] ∈ entityList level [ i ] other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, entityList is the entity set of extraction.
5. method according to claim 4 is characterized in that, and is described according to the retrieve statement of entity dictionary in conjunction with the user, extract actual entity word finder entityList and comprise,
Consider that the user search classification is relevant, when the classification of entity is relevant with classified information, then carry out the entity word and extract; Perhaps,
Utilizing statement law to carry out the entity word extracts.
6. according to claim 1 to 5 arbitrary described methods, it is characterized in that, before the hierarchical structure of the user search word that output identifies, also comprise,
Described user search word is carried out a formula syntactic analysis; And/or,
The user search word is carried out subordinate relation identification.
7. the disposal system of a user search word is characterized in that, comprise,
Resources bank is set up module, is used for setting up the resources bank relevant with the core word of identification user search;
Basic hierarchical block is used for the term of user's input is carried out basic layering;
Entity is introduced module, is used for that the term after the described basic layering is carried out entity and introduces;
Output module is for the hierarchical structure of exporting the term that identifies.
8. system according to claim 7, it is characterized in that, the resources bank that described foundation is relevant with the core word of identification user search comprises, a series of vocabularys relevant with the core word of identification user search, comprise inactive vocabulary, modify vocabulary and actual resource dictionary.
9. system according to claim 8 is characterized in that, described basic hierarchical block is used for that the term that the user inputs is carried out basic layering and specifically comprises,
Described basic hierarchical block, be used for after the user search statement is carried out participle, can obtain a series of inquiry vocabulary term and part of speech pos, comprise term[1] _ pos[1], term[2] _ pos[2] ..., term[n] _ pos[n], term[i wherein] be i vocabulary, pos[i] be its corresponding part of speech;
And be used for utilizing the part of speech of inactive vocabulary, modification vocabulary and the vocabulary of resources bank that basic layering realized in the inquiry vocabulary of user's input, it is specific as follows,
level [ i ] = 0 term [ i ] ∈ stopwordList | | pos [ i ] cposList 1 term [ i ] ∈ mod ifywordList 2 other , i = 1,2 . . . n
Term[i wherein] i term of expression, level[i] be corresponding level, stopwordList is the vocabulary of stopping using, requirewordList is the demand vocabulary, cposList is the unessential part of speech table of a class, including but not limited to adjective, adverbial word, preposition, interjection, auxiliary word, modal particle, conjunction, symbol;
If term[i] belong to the stop words vocabulary or its part of speech belongs to cposList, level[i] be 0; If term[i] belong to qualifier, level[i] be 1; Other situation is 2.
10. system according to claim 9 is characterized in that, also comprise,
Sentence formula syntactic analysis module is used for described user search word is carried out a formula syntactic analysis; And/or,
The subordinate relation identification module is used for the user search word is carried out subordinate relation identification.
CN201310005804.6A 2013-01-08 2013-01-08 A kind of processing method of user search word and system Active CN103020311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310005804.6A CN103020311B (en) 2013-01-08 2013-01-08 A kind of processing method of user search word and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310005804.6A CN103020311B (en) 2013-01-08 2013-01-08 A kind of processing method of user search word and system

Publications (2)

Publication Number Publication Date
CN103020311A true CN103020311A (en) 2013-04-03
CN103020311B CN103020311B (en) 2016-05-18

Family

ID=47968914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310005804.6A Active CN103020311B (en) 2013-01-08 2013-01-08 A kind of processing method of user search word and system

Country Status (1)

Country Link
CN (1) CN103020311B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491556A (en) * 2017-09-04 2017-12-19 湖北地信科技集团股份有限公司 Space-time total factor semantic query service system and its method
CN107992586A (en) * 2017-12-08 2018-05-04 成都谷问信息技术有限公司 Search method based on the intelligent meaning of one's words
CN109492214A (en) * 2017-09-11 2019-03-19 苏州大学 The identification of attribute word and its level construction method, device, equipment and storage medium
CN112800175A (en) * 2020-11-03 2021-05-14 广东电网有限责任公司 Cross-document searching method for knowledge entities of power system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997012333A1 (en) * 1995-09-15 1997-04-03 Infonautics Corporation Restricted expansion of query terms using part of speech tagging
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997012333A1 (en) * 1995-09-15 1997-04-03 Infonautics Corporation Restricted expansion of query terms using part of speech tagging
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
姜维: "统计中文词法分析及其强化学习机制的研究", 《中国博士学位论文全文数据库 信息科技辑》 *
齐波: "基于短语识别的自然语言理解搜索方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
齐波: "基于短语识别的自然语言理解搜索方法研究", 《中国优秀硕士学位论文数据库》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491556A (en) * 2017-09-04 2017-12-19 湖北地信科技集团股份有限公司 Space-time total factor semantic query service system and its method
CN109492214A (en) * 2017-09-11 2019-03-19 苏州大学 The identification of attribute word and its level construction method, device, equipment and storage medium
CN109492214B (en) * 2017-09-11 2023-09-19 苏州大学 Attribute word recognition and hierarchy construction method, device, equipment and storage medium
CN107992586A (en) * 2017-12-08 2018-05-04 成都谷问信息技术有限公司 Search method based on the intelligent meaning of one's words
CN112800175A (en) * 2020-11-03 2021-05-14 广东电网有限责任公司 Cross-document searching method for knowledge entities of power system

Also Published As

Publication number Publication date
CN103020311B (en) 2016-05-18

Similar Documents

Publication Publication Date Title
CN107210035B (en) Generation of language understanding systems and methods
CN101520786B (en) Method for realizing input method dictionary and input method system
CN102629246B (en) Recognize the server and browser voice command identification method of browser voice command
KR100815215B1 (en) Apparatus and method for integration search of web site
KR101524889B1 (en) Identification of semantic relationships within reported speech
US11521603B2 (en) Automatically generating conference minutes
CN113822067A (en) Key information extraction method and device, computer equipment and storage medium
CN104516949B (en) Web data treating method and apparatus, inquiry processing method and question answering system
Ogrodniczuk Polish parliamentary corpus
Maynard et al. Ontology-based information extraction for market monitoring and technology watch
WO2019169858A1 (en) Searching engine technology based data analysis method and system
CN103425714A (en) Query method and system
CN104991943A (en) Music searching method and apparatus
CN101655862A (en) Method and device for searching information object
JP2010532897A (en) Intelligent text annotation method, system and computer program
CN107656921B (en) Short text dependency analysis method based on deep learning
CN103870000A (en) Method and device for sorting candidate items generated by input method
JP2022532451A (en) How to disambiguate Chinese place name meanings based on encyclopedia knowledge base and word embedding
Lommatzsch et al. An Information Retrieval-based Approach for Building Intuitive Chatbots for Large Knowledge Bases.
CN111213136A (en) Generation of domain-specific models in networked systems
CN103020311B (en) A kind of processing method of user search word and system
CN104484379A (en) Method and device for determining relation among musical entities and inquiry processing method and device
JP2013190985A (en) Knowledge response system, method and computer program
CN111966792A (en) Text processing method and device, electronic equipment and readable storage medium
Al-Ghamdi et al. Exploring NLP web APIs for building Arabic systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP03 Change of name, title or address

Address after: 518057 C Building 5, Nanshan District software industry base, Shenzhen, Guangdong 403-409, China

Patentee after: Shenzhen easou world Polytron Technologies Inc

Address before: 518026 Guangdong city of Shenzhen province Futian District Binhe Road and CaiTian Road Interchange Union Square Tower A, A5501-A

Patentee before: Shenzhen Yisou Science & Technology Development Co., Ltd.

CP03 Change of name, title or address
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and system for processing user search terms

Effective date of registration: 20170918

Granted publication date: 20160518

Pledgee: Shenzhen SME financing Company limited by guarantee

Pledgor: Shenzhen easou world Polytron Technologies Inc

Registration number: 2017990000881

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20200428

Granted publication date: 20160518

Pledgee: Shenzhen SME financing Company limited by guarantee

Pledgor: Shenzhen easou world Polytron Technologies Inc

Registration number: 2017990000881

PC01 Cancellation of the registration of the contract for pledge of patent right