CN104850554B

CN104850554B - Searching method and system

Info

Publication number: CN104850554B
Application number: CN201410051875.4A
Authority: CN
Inventors: 张友书; 张坤; 张阔
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2014-02-14
Filing date: 2014-02-14
Publication date: 2020-05-19
Anticipated expiration: 2034-02-14
Also published as: CN104850554A

Abstract

The application provides a searching method and a searching system, wherein the method comprises the following steps: when a query word string is received, performing semantic analysis on the query word string to obtain a semantic expression corresponding to the query word string; matching analysis is carried out by combining the semantic expression, and the semantic label of each word in the current query word string is determined; rewriting the query word string according to the semantic tag; and searching by using the rewritten query word string to obtain matched network information. According to the method and the device, semantic analysis is carried out on the query word string to obtain the semantic expression, the semantic label to which each word belongs in the semantic expression conforming to the current context is further determined, the query word string is rewritten based on the semantic label, the user intention is better met, the success rate of information matching during searching is high, and the searching quality and the searching efficiency are improved.

Description

Searching method and system

Technical Field

The present application relates to the field of computer technologies, and in particular, to a search method and a search system.

Background

Query rewrite is to rewrite the original query terms input by the user in the search engine query process to return better search results. In the prior art, query rewrite is mainly to correct user input errors. Such as: when the user inputs 'go-to-conclusion', 'zoujielilun' or 'zhoujielilun', the search engine has difficulty in finding the correct web page for the user. After the query is corrected, the query is analyzed in the error correction model according to the zoujielilun, the proportion of the text matching result corresponding to the Zhoujilun in the analyzed result is large, the text matching result is modified into the query word Zhoujilun which accords with the original intention of the user, and the search engine can return the webpage which accords with the intention of the user under the condition that the user does not intervene, so that the user experience is improved.

The existing web page search technology mainly carries out inquiry based on key words. When a user inputs search information of the search terms, the search engine carries out Chinese word segmentation on the search terms, converts the search terms into a plurality of key words, then goes to an inverted index library of web pages for searching, returns the web pages hitting the key words, then adopts a certain sorting algorithm to sort the hit web pages from the aspects of relevance, timeliness, user intention and the like, and returns the web page links to the user in sequence.

The existing search technology based on keywords, namely the search mode of 'query word- > keyword- > search' which depends on character string matching, simply segments the query word, easily loses part of information, deviates from the intention of a user, and thus effective results cannot be obtained through the keywords.

For example, as shown in fig. 1, when a search engine searches for a query word "who is a son of the xungfeng", the keywords obtained after the word segmentation are "xungfeng", "who", and "son", and the search is performed by using the three keywords, because the frequency of occurrence of "lucas" in the network is much higher than the frequency of occurrence of "xungxian", most of the web pages returned by simply depending on text matching describe "son of the xungfeng", that is, web pages related to lucas, the matching success rate corresponding to the search result obtained by simply depending on matching is often low, and it is difficult to meet the user requirements.

Disclosure of Invention

The technical problem to be solved by the application is to provide a searching method and a searching system, and solve the problems that in the prior art, the matching success rate of a searching result is low in the process of solving and searching problems, and the user requirements are difficult to meet.

In order to solve the above problem, the present application discloses a search method, including:

when a query word string is received, performing semantic analysis on the query word string to obtain a semantic expression corresponding to the query word string;

matching analysis is carried out by combining the semantic expression, and the semantic label of each word in the current query word string is determined;

rewriting the query word string according to the semantic tag;

and searching by using the rewritten query word string to obtain matched network information.

Preferably, when a query word string is received, performing semantic analysis on the query word string to obtain a semantic expression corresponding to the query word string includes:

searching entity words corresponding to the query word string in an entity word list preset in a knowledge base;

and searching the attribute words corresponding to the query word string in an attribute word list preset in a knowledge base.

Preferably, the step of determining the semantic label to which each word in the current query word string belongs includes:

extracting preset semantic tags of the attribute words;

marking one or more original semantic labels on the entity words;

respectively judging whether the entity words marked with the original semantic tags have a predefined association relationship with the attribute words marked with the semantic tags; if so, determining that the original semantic label with the predefined association relationship is the semantic label to which the entity word belongs.

Preferably, the step of rewriting the query word string according to the semantic tag includes:

searching preset identification entity words by adopting the semantic tags;

replacing the entity words with preset identification entity words;

and/or the presence of a gas in the gas,

replacing the attribute words with preset identification attribute words;

and/or the presence of a gas in the gas,

judging whether the query word string accords with a syntactic rule of reverse expression; if yes, acquiring a corresponding preset expression which is stored in the server and corresponds to the syntax rule which accords with the forward expression; the preset expression has use frequency;

and when the use frequency of the preset expression is higher than a preset threshold value, rewriting the query word string according to a syntactic rule of forward expression.

Preferably, the entity words are identified as entity words which have the same semantic labels as the entity words and are used most frequently;

the identification attribute words are attribute words which describe the same kind of entity words and are used most frequently.

Preferably, the step of determining whether the query word string conforms to a syntactic rule of a reverse expression includes:

performing syntactic analysis on the query word string to obtain a subject and a modifier, and a dependency relationship between the subject and the modifier; the dependency relationship comprises a dependency relationship that the subject depends on the modifier;

and when the subject is the entity word, the modifier word is the attribute word, and the dependency relationship is the dependency relationship that the subject depends on the modifier word, the query word string conforms to the syntactic rule of reverse expression.

The present application also discloses a search system, comprising:

the part-of-speech analysis module is used for performing semantic analysis on the query word string when the query word string is received to obtain a semantic expression corresponding to the query word string;

the semantic tag determining module is used for performing matching analysis by combining the semantic expression and determining the semantic tag of each word in the current query word string;

the rewriting module is used for rewriting the query word string according to the semantic label;

and the query module is used for searching by using the rewritten query word string to obtain matched network information.

Preferably, the part of speech parsing module includes:

the entity word searching module is used for searching entity words corresponding to the query word string in an entity word list preset in a knowledge base;

and the attribute word searching module is used for searching the attribute words corresponding to the query word string in an attribute word list preset in a knowledge base.

Preferably, the semantic tag determining module comprises:

the extraction submodule is used for extracting the preset semantic tags of the attribute words;

a marking submodule for marking the entity word with one or more original semantic tags;

the incidence relation judging module is used for respectively judging whether the entity words marked with the original semantic labels have predefined incidence relation with the attribute words marked with the semantic labels; if yes, calling a determining submodule;

and the determining submodule is used for determining that the original semantic label with the predefined association relationship is the semantic label to which the current entity word belongs.

Preferably, the rewriting module includes:

the identification entity word searching submodule is used for searching preset identification entity words by adopting the semantic tags;

the identification entity word replacing submodule is used for replacing the entity words with preset identification entity words;

and/or the presence of a gas in the gas,

the mark attribute word replacing submodule is used for replacing the attribute words with preset mark attribute words;

and/or the presence of a gas in the gas,

the reverse expression judging submodule is used for judging whether the query word string accords with a reverse expression syntactic rule or not; if yes, calling a preset expression obtaining submodule;

the preset expression obtaining submodule is used for obtaining a corresponding preset expression which is stored in the server and accords with the forward expression syntax rule; the preset expression has use frequency;

and the forward expression rewriting submodule is used for rewriting the query word string according to a forward expression syntactic rule when the use frequency of the preset expression is higher than a preset threshold value.

Preferably, the reverse expression judgment submodule includes:

the syntax analysis submodule is used for carrying out syntax analysis on the query word string to obtain a subject and a modifier and a dependency relationship between the subject and the modifier; the dependency relationship comprises a dependency relationship that the subject depends on the modifier;

and the judging submodule is used for judging that the query word string conforms to the syntactic rule of the reverse expression when the subject is the entity word, the modifier word is the attribute word and the dependency relationship is the dependency relationship of the subject on the modifier word.

Compared with the prior art, the method has the following advantages:

according to the method and the device, semantic analysis is carried out on the query word string to obtain the semantic expression, the semantic label to which each word belongs in the semantic expression conforming to the current context is further determined, the query word string is rewritten based on the semantic label, the user intention is better met, the success rate of information matching during searching is high, and the searching quality and the searching efficiency are improved.

According to the method and the device, the entity words and the attribute words are rewritten into the entity identification words and the attribute identification words which are friendly to a search engine, the query word string which is not commonly used and is reversely expressed is rewritten into the query word string which is commonly used and is forwardly expressed, the coverage rate of search information of the search engine is improved, and the success rate of information matching is further improved.

Drawings

FIG. 1 is an exemplary diagram of a search result of the prior art;

FIG. 2 is a flow chart of the steps of one embodiment of a search method of the present application;

FIG. 3 is an exemplary diagram of a forward expression rewrite of the present application;

FIG. 4 is an exemplary diagram of a search result of the present application;

FIG. 5 is a block diagram of a search system embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

The knowledge base is a structured, easy-to-operate, easy-to-use, comprehensive and organized knowledge cluster in knowledge engineering, and is an interconnected knowledge slice set which is stored, organized, managed and used in a computer memory by adopting a certain knowledge representation mode (or a plurality of knowledge representation modes) according to the requirement of solving problems in a certain field (or certain fields). These knowledge pieces include theoretical knowledge related to a field, fact data, heuristic knowledge derived from expert experience, such as definition, theorem and algorithm related to a field, common sense knowledge, and the like.

One of the core ideas of the application is that the query word string is rewritten according to the grammar specification based on the knowledge base so as to obtain a search result which more comprehensively conforms to the intention of the user.

Referring to FIG. 2, a flow chart of steps of an embodiment of a search method of the present application is shown.

Step 201, when a query word string is received, performing semantic analysis on the query word string to obtain a semantic expression corresponding to the query word string;

the query word string may be a phrase or sentence input by a user at a client (e.g., a web page of a search engine, a search plug-in of a browser, etc.) for requesting a search for information related thereto.

For the query word string, semantic analysis is required, which may specifically include judging whether the query word string exceeds a preset length, performing word segmentation on the query word string, and the like, and then identifying entity words and attribute words in the query word string.

In a preferred embodiment of the present application, the step 201 may specifically include the following sub-steps:

substep S11, searching the attribute words corresponding to the query word string in an attribute word list preset in a knowledge base;

and a substep S12, searching the entity words corresponding to the query word string in an entity word list preset in a knowledge base.

By applying the embodiment of the application, the knowledge base can be analyzed and constructed in advance according to the data captured in the whole network. Specifically, the knowledge base may store an entity word list and an attribute word list.

In the entity word list, entity words collected in advance can be recorded; in the attribute word list, attribute words collected in advance may be recorded.

Based on a Resource Description Framework (RDF), i.e. a data model of network Resource objects and relationships between them, triples in the form of "entity-attribute-value" may be used to describe various resources and relationships between them.

1. Entity: a corresponding specific individual in the star category, such as liudelhi, zhangbaizhi, cichorium linnaeus, etc., also encompasses a broad representative category of individuals, such as people, movie stars, singers, etc.

2. The attributes are as follows: which is a property included in an entity, each attribute has a type variable reflecting an attribute value type, such as height: length, [ age: integer ], [ date of birth ], in addition to an attribute name.

3. Attribute values: the values corresponding to the attributes, such as 168cm (height), 87kg (weight), etc., are the knowledge in the knowledge base. The attribute values also record the knowledge source and are used for helping the user judge the reliability of the knowledge.

Wherein the attribute words can be obtained by mining web pages and search logs.

The RDF-based triple "entity-attribute-value" can find out the attribute word describing "husband-wife relationship" in the following way, if the entity is "liu de hua", the attribute "wife relationship", and the value is "mercury mercy":

1. and mining the webpage and the search log to obtain a text fragment between the entity and the value. For example, "Liu De Hua Lao Qiao Zhu Li", "Liu De Hua Tai Zhu Yu Li, and" Feng Xiao Fang Ma Nami ".

2. The frequency of use of text snippets between individual "entity-values" is counted. For example, the frequency of use of "Liu De Lao Zhu Yu Qian" is 2, the frequency of use of "Liu De Hua Tai Zhu Yu Qian" is 3,

the frequency of use of "von willebrand's wife xufan" was 2.

3. And counting the use frequency of the text fragments among the same type of entity-value. For example, the wife < value > "of" < entity > is used with a frequency of 4, and the tai < value > "of" < entity > is used with a frequency of 3.

4. And extracting attribute words exceeding a preset time threshold value from the text fragments. For example, if the threshold of the number of times is 2, and a text segment whose usage frequency exceeds 2 is extracted as the attribute word, the attribute words corresponding to "wife relationship" can be found as "wife" and "tai".

Step 202, performing matching analysis by combining the semantic expression, and determining semantic labels to which all words in the query word string belong;

and performing syntactic analysis on the query word string with the entity words and the attribute words identified by the method which is based on the knowledge base and is irrelevant to the context to obtain the association relation between the entity words and the attribute words, and further identifying the semantic tags of the entity words which accord with the current context.

A context-free method, also called type 2 grammar, is a transformation grammar in formal language theory, used to describe context-free languages. Specifically, a set of grammatical rules is defined, which can be used for syntactic analysis to obtain sentence structures and the association between sentence components. In particular, the grammar rules may be stored in a knowledge base.

In a preferred embodiment of the present application, the step 202 may specifically include the following sub-steps:

a substep S21 of extracting preset semantic tags of the attribute words;

the attribute words may have semantic tags with defined meanings, stored in a knowledge base.

Substep S22, labeling the entity word with one or more original semantic tags;

the original semantic tags may be information expressing the meaning of the entity words.

For example, for the query string "show on which day of luaojianghu," which is an entity word, there may be many original semantic labels, such as movies, dramas, novels, dramas, games, etc.

Substep S23, respectively determining whether the entity words marked with original semantic tags have predefined association relationship with the attribute words marked with semantic tags; if yes, go to substep S24;

for example, if a grammar rule < entity _ person > < attribute _ wife > is defined as having an association, then for the query word string "grandma in liu de hua", the corresponding semantic expression may be "grandma in liu de hua < entity _ person >", and by checking that < entity _ person > < attribute _ wife > satisfies the requirements of the grammar rule, it is legal, i.e., has a predefined association, so that it can be obtained that < attribute _ wife > grandma depends on < entity _ person > liu de hua.

Further, assuming that < entity _ person > < attribute _ height > is not predefined, then it is illegal to query the height < attribute _ height > of "liudeluxe < entity _ person > identified by the word string" height of liudeluxe ", with no predefined association.

And a substep S24, determining that the original semantic label having the predefined association relationship is the semantic label to which the entity word belongs currently.

For the query word string "show in what day of the luaojianghu lake", the "show in what day" is obtained by syntactic analysis to modify "luaojianghu", and the "show in what day" is the attribute of the "movie" category entity can be analyzed by the grammar rule, so that it can be determined that "luaojianghu lake" is a movie, not a tv drama, a novel, a game, etc.

Step 203, rewriting the query word string by using the semantic label;

in the embodiment of the application, the query word string with the entity attribute mark after the semantic tag is determined can be rewritten, and the natural language (query word string) input by the user is rewritten into the keyword friendly to the search engine, so that the search result is more matched with the semantic of the natural language corresponding to the query word string, the coverage rate of the search is improved, and the efficiency and the quality of the search are also improved.

Rewrites can be divided into two categories: one is entity word and attribute word replacement and rewriting, and the other is sentence pattern replacement and rewriting.

In a preferred embodiment of the present application, the step 203 may specifically include the following sub-steps:

substep S31, searching preset identification entity words by adopting the semantic tags;

a substep S32 of replacing the entity word with a preset identification entity word;

in the embodiment of the application, the corresponding relation between the natural language query and the search engine language is established in advance for the entity words and the attribute words in the knowledge base, the corresponding relation is recorded in the translation dictionary in advance, and the entity words friendly to the search engine can be obtained by searching the translation dictionary for replacement when the entity words and the attribute words are rewritten. In particular, the translation dictionary may be stored in a knowledge base.

Since the knowledge base is based on knowledge extracted from the Internet, the webpage standard description of each entity word and each attribute word can be counted. The method comprises the steps of webpage standard description recognition, text extraction, Chinese word segmentation, entity word recognition, attribute word recognition and the like on a webpage, and the times of occurrence of each entity word and attribute word in the Internet are counted, so that the entity word and the attribute word which are friendly to a search engine in different expressions of the same entity and have the highest frequency of occurrence in the Internet are defined as the entity word and the attribute word for identifying the entity word and the attribute word, and the coverage of the entity word and the attribute word is improved. For example, the entity words "swordsmen", "Bingke" and "Miss Hibisci" are the same entity, and represent the Miss Hibisci, and the times of the entity words appearing in the internet text are counted in combination with the context, so that the frequency of use of the "Miss Hibisci" is much higher than that of the "swordsmen" and the "Miss Hibisci". Then, at this time, the friendly entity word of the search engine corresponding to the Miss Hibisci is thought to be Miss Hibisci, and the entity words of Stachys hero and Bingo in the natural language query of the user are replaced and translated into the identified entity word of Miss Hibisci.

That is, for the embodiment of the present application, the entity word may be identified as an entity word that has the same semantic tag as the entity word and is used most frequently;

and/or the presence of a gas in the gas,

a substep S33 of replacing the attribute word with a preset identification attribute word;

in the embodiment of the application, the corresponding relation between the natural language query and the search engine language can be established for the attribute words by adopting the same processing method as the entity words.

And obtaining corresponding search engine friendly keywords as identification attribute words through the use frequency of different descriptions (namely attribute words) of the same attribute corresponding to the same kind of entity in the Internet.

That is, for the embodiment of the present application, the identifying attribute word may be an attribute word that describes the same type of entity word as the attribute word and is used most frequently.

The rewriting process is a process of looking up a translation dictionary, for example, the query word string is "where schoenlein man is born", after determining the semantic label of the current entity word, the semantic expression may be "where schoenlein < entity _ person > is born < attribute _ place of birth >", by querying the translation dictionary, the identified entity word corresponding to the entity word "schoenlein" may be "hibiscus sister", and the identified attribute word corresponding to the attribute word "where is born" is "place of birth".

And/or the presence of a gas in the gas,

a substep S34 of determining whether the query word string conforms to a syntactic rule of reverse expression; if yes, go to substep S35;

a reverse expression may be opposed to a forward expression, both of which have the same semantics, being descriptions of two opposite angles to the same thing.

In a preferred embodiment of the present application, the sub-step S34 further includes the following sub-steps:

substep S341, performing syntactic analysis on the query word string to obtain a subject and a modifier, and a dependency relationship between the subject and the modifier; the dependency relationship comprises a dependency relationship that the subject depends on the modifier;

the syntactic analysis can be used for deducing the syntactic structure of a sentence according to a given syntactic prompt, and analyzing syntactic units contained in the sentence and the relationship among the syntactic units.

In specific implementation, a syntactic analysis result can be obtained through statistics, and the main analysis is three steps:

1. performing syntactic analysis and labeling on each sentence in the collected corpus by adopting a manual labeling method, and further gathering the sentences into a sentence library;

2. on the basis of the sentence library, learning to obtain a PCFG (Probabilistic Context-free Grammar) model;

3. and analyzing the sentence by adopting a PCFG model to obtain corresponding sentence components (subject, predicate, object, modified component and the like) and the dependency relationship among the components. This dependency may include a dependency of a subject on a modifier, or a dependency of a modifier on a subject.

In the substep S342, when the subject is the entity word, the modifier word is the attribute word, and the dependency relationship is a dependency relationship in which the subject depends on the modifier word, the query word string conforms to a syntactic rule of reverse expression.

At this time, the dependency relationship of the subject dependent modifier is the dependency relationship of the entity word dependent on the attribute word.

In addition, when the subject is the entity word, the modifier word is the attribute word, and the dependency relationship is a dependency relationship in which the modifier word depends on the subject, the query word string conforms to a forward expression syntax rule.

At this time, the dependency relationship of the modifier depending on the subject is the dependency relationship of the attribute word depending on the entity word. For example, the attribute word "father" in the query word string "who the father of the thank you front is" depends on the entity word "thank you front", so that "who the father of the thank you is" conforms to the syntactic rule of forward expression; and for the query word string "the son of whom the thank you are", the entity word "the thank you are" dependent on the attribute word "son", so that the "son of whom the thank you are" conforms to the syntax rule of the reverse expression. The dependency is that the current object cannot leave a certain object and exists independently in the PCFG model. For example, in the query word string "who is the parent of the thank you front", if the "parent" cannot leave the "thank you front" and exists independently, the "parent" depends on the "thank you front", and conversely, the "thank you front" may leave the "parent" and exist independently.

Substep S35, obtaining a preset expression corresponding to the syntax rule which is stored in the server and accords with the forward expression; the preset expression has use frequency;

in a specific implementation, the corresponding relationship between the forward expression and the reverse expression can be obtained by web page mining on the internet based on a knowledge base. And mining all forward expression expressions and reverse expression expressions of entity attributes in the Internet through a machine translation model based on the text pairs of the knowledge base entities and the attribute values.

And a substep S36, rewriting the query word string according to the syntactic rule of forward expression when the use frequency of the preset expression is higher than a preset threshold value.

In the embodiment of the application, the use frequency of various forward expression expressions can be counted, and the forward expression with the use frequency higher than the predictive threshold value is used as a friendly sentence pattern of the search engine.

In a specific implementation, the dependency relationship that the entity word depends on the attribute word in the query word string can be rewritten into the dependency relationship that the attribute word depends on the entity word, and then the query word string is rewritten into the query word string conforming to the syntactic rule of forward expression

For example, as shown in fig. 3, for the son who the query word string "thank you front" is, the entity word "thank you front" depends on the attribute word "son", and the relationship between the entity word and the attribute word can be seen through syntax tree analysis, and the corresponding forward expression and the corresponding frequency of use are found in the corresponding relationship table of the reverse expression and the forward expression pre-made in the knowledge base. The syntax specification for the reverse expression of this example is "< property _ person _ son > of whom entity _ person > is", and the syntax specification for the corresponding forward expression is "< property _ person _ father > of entity _ person > is". Furthermore, the identification entity word of the search engine corresponding to the entity word "thank you front" can be obtained by searching the translation dictionary, the search engine friendly word corresponding to the attribute word "< attribute _ person _ father >" can be obtained by searching the translation dictionary as "thank you front" (namely, the identification attribute word), the identification entity word and the identification attribute word are adopted to be rewritten according to the forward expression syntax rule, the final rewritten query word string is obtained as "who is the father of the thank you front", and the rewritten query word string "who is the father of the thank you front" is used to replace the original "son of the thank you front" for searching, so that the webpage related to the thank you is obtained.

Note that, the rewrite of entity words (corresponding to sub-step S31 and sub-step S32), the rewrite of attribute words (corresponding to sub-step S33), and the rewrite of sentence patterns (corresponding to sub-step S34, sub-step S35, and sub-step S36) may be used individually or in combination of two or three, and the embodiment of the present application is not limited thereto.

And step 204, searching by using the rewritten query word string to obtain matched network information.

After rewriting of the query word string is completed, retrieval and matching of network information can be performed.

As shown in fig. 4, by applying the embodiment of the present application, the query word string "who is the son of the thank you front" input by the user can be rewritten to "who is the father of the thank you front", and then the search is performed based on "who is the father of the thank you front", and compared with the search result shown in fig. 2, the information returned by the embodiment of the present application is more suitable for the user's requirement.

According to the method and the device, semantic analysis is carried out on the natural language in the query word string to obtain the semantic expression, the semantic label to which each word belongs in the semantic expression conforming to the current context is further determined, the query word string is rewritten based on the semantic label, the user intention is better met, the success rate of information matching during searching is high, the searching quality is improved, the searching efficiency is high, the user requirements are met, and the user experience is improved.

According to the method and the device, the entity words and the attribute words can be rewritten into the entity identification words and the attribute identification words which are friendly to a search engine, the query word strings which are not commonly used and are reversely expressed can be rewritten into the query word strings which are commonly used and are forwardly expressed, the coverage rate of search information of the search engine is improved, and the success rate of information matching is further improved.

To make the application better understood by those skilled in the art, an example is provided below to illustrate the specific implementation process that the embodiments of the application apply to the query word string "where the hill is".

1. And performing semantic analysis on the query word string 'where the hill is located' by combining a knowledge base, wherein the semantic analysis comprises the following steps:

and (3) entity word analysis: through inquiring an entity word list in a knowledge base, identifying that ' duchu ' is an entity word, the type (original semantic label) is ' person ' and ' place name ', and a semantic expression is ' duchu < entity _ person > < entity _ place >;

attribute word analysis, namely identifying the place of the attribute word and the type of the attribute word as the attribute word and the place of the attribute word by inquiring an attribute word list in a knowledge base, marking semantic labels and showing the place of the attribute word as attribute-place-position,

the semantic expression to which the query word string corresponds is "where < attribute _ location > < entity _ location >" is anyu < entity _ person > ".

3. And performing matching analysis by combining the semantic expression: first, syntactic analysis is carried out, and attribute words 'where' depends on the entity words 'any dune', which has two types: "person" and "place name". By checking the type consistency of the entity words and the attribute words, the common type of the attribute words "where" and the entity words "anyu" is < place >, so that the semantic label of the current entity word "anyu" is determined as "place". This can result in the result after semantic tag analysis, which is "where < attribute _ location _ position > the duel < entity _ location >;

4. rewriting the query word string according to the semantic label:

a) and querying search engine-friendly entity identifying words and attribute identifying words corresponding to the entity words and the attribute words. By searching the translation dictionary, the identification entity word 'anyu city' corresponding to the entity word 'anyu', and the identification attribute word 'geographical position' corresponding to the attribute word 'where' are obtained;

b) replacing entities and attributes in the query word string with friendly words of the search engine (namely, identifying entity words and identifying attribute words) to obtain the rewritten query word string 'ren dun city geographical position';

5. and (3) searching by using the geographical position of the Anqiu city as the rewritten query word string, and returning the result to the user.

It is to be appreciated that while for simplicity of explanation, certain example method embodiments are described as a series of acts, those skilled in the art will appreciate that the example embodiments are not limited by the order of acts described, as some steps may occur in other orders and concurrently depending on the example embodiments. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required for the embodiments of the application.

Referring to fig. 5, a block diagram of a search system according to an embodiment of the present application is shown, which may specifically include the following modules:

a part-of-speech analysis module 501, configured to perform semantic analysis on a query word string when the query word string is received, to obtain a semantic expression corresponding to the query word string;

a semantic tag determining module 502, configured to perform matching analysis in combination with the semantic expression, and determine a semantic tag to which each word in the current query word string belongs;

a rewriting module 503, configured to rewrite the query word string according to the semantic tag;

and the query module 504 is configured to search by using the rewritten query word string to obtain the matched network information.

In a preferred embodiment of the present application, the part-of-speech parsing module 501 may include the following sub-modules:

In a preferred embodiment of the present application, the semantic tag determination module 502 may include the following sub-modules:

In a preferred embodiment of the present application, the rewrite module 503 may include the following sub-modules:

and/or the presence of a gas in the gas,

In a preferred embodiment of the present application, the identified entity word may be an entity word that has the same semantic tag as the entity word and is used most frequently;

the identification attribute words may be attribute words which describe the same type of entity words and are used most frequently.

In a preferred embodiment of the present application, the reverse expression judging sub-module further includes the following sub-modules:

For the system embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application is preferably applied to embedded systems.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

The above detailed description is provided for a search method and a search system, and the principles and embodiments of the present application are explained in detail by applying specific examples, and the descriptions of the above embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of searching, comprising:

when a query word string is received, performing semantic analysis on the query word string to obtain a semantic expression corresponding to the query word string; performing semantic analysis on the query word string, and identifying entity words and attribute words in the query word string;

matching analysis is carried out by combining the semantic expression, and the semantic label of each word in the current query word string is determined; the method comprises the steps that a query word string for identifying entity words and attribute words is subjected to syntactic analysis based on a knowledge base to obtain the association relation between the entity words and the attribute words, and semantic labels of the entity words conforming to the current context are identified;

rewriting the query word string according to the semantic tag;

searching by the rewritten query word string to obtain matched network information;

the step of performing syntactic analysis on the query word string with the entity words and the attribute words identified based on the knowledge base to obtain the association relationship between the entity words and the attribute words comprises the following steps:

defining a grammar rule;

performing syntactic analysis on the semantic expression by using the grammar rule to obtain the association relation between entity words and attribute words in the semantic expression;

wherein the step of rewriting the query word string according to the semantic tag comprises:

2. The method according to claim 1, wherein the step of performing semantic analysis on the query word string to obtain the semantic expression corresponding to the query word string when receiving the query word string comprises:

3. The method of claim 2, wherein the step of determining the semantic label to which each word in the current query word string belongs comprises:

extracting preset semantic tags of the attribute words;

marking one or more original semantic labels on the entity words;

4. The method of claim 1, 2 or 3, wherein the step of rewriting the query word string according to the semantic tag further comprises:

searching preset identification entity words by adopting the semantic tags;

replacing the entity words with preset identification entity words;

and/or the presence of a gas in the gas,

and replacing the attribute words with preset identification attribute words.

5. The method of claim 4, wherein the identifying entity words are entity words having the same semantic label as the entity words and used most frequently;

6. The method of claim 4, wherein said step of determining whether said query string complies with a syntactic rule of reverse expression comprises:

and when the subject is the entity word, the modifier is the attribute word, and the dependency relationship is the dependency relationship of the subject depending on the modifier, the query word string conforms to the syntactic rule of reverse expression.

7. A search system, comprising:

the part-of-speech analysis module is used for performing semantic analysis on the query word string when the query word string is received to obtain a semantic expression corresponding to the query word string; performing semantic analysis on the query word string, and identifying entity words and attribute words in the query word string;

the semantic tag determining module is used for performing matching analysis by combining the semantic expression and determining the semantic tag of each word in the current query word string; the method comprises the steps that a query word string for identifying entity words and attribute words is subjected to syntactic analysis based on a knowledge base to obtain the association relation between the entity words and the attribute words, and semantic labels of the entity words conforming to the current context are identified;

the query module is used for searching by using the rewritten query word string to obtain matched network information;

wherein the semantic tag determination module is further configured to:

defining a grammar rule;

wherein the rewrite module includes:

8. The system of claim 7, wherein the part of speech parsing module comprises:

9. The system of claim 8, wherein the semantic tag determination module comprises:

10. The system of claim 7, 8 or 9, wherein the rewrite module further comprises:

and/or the presence of a gas in the gas,

and the identification attribute word replacing submodule is used for replacing the attribute words with preset identification attribute words.

11. The system of claim 10, wherein the identifying entity words are entity words having the same semantic label as the entity word and used most frequently;

12. The system of claim 10, wherein the reverse expression decision sub-module comprises:

and the judgment submodule is used for ensuring that the query word string conforms to the syntactic rule of the reverse expression when the subject is the entity word, the modifier is the attribute word and the dependency relationship is the dependency relationship of the subject on the modifier.