CN106156141B - Method and device for constructing semantic query word template - Google Patents

Method and device for constructing semantic query word template Download PDF

Info

Publication number
CN106156141B
CN106156141B CN201510172096.4A CN201510172096A CN106156141B CN 106156141 B CN106156141 B CN 106156141B CN 201510172096 A CN201510172096 A CN 201510172096A CN 106156141 B CN106156141 B CN 106156141B
Authority
CN
China
Prior art keywords
semantic
words
word
query
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510172096.4A
Other languages
Chinese (zh)
Other versions
CN106156141A (en
Inventor
蒋雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510172096.4A priority Critical patent/CN106156141B/en
Publication of CN106156141A publication Critical patent/CN106156141A/en
Application granted granted Critical
Publication of CN106156141B publication Critical patent/CN106156141B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for constructing a semantic query word template, and belongs to the technical field of information. The method comprises the following steps: acquiring a seed semantic query word template, wherein the seed semantic query word template at least comprises a core word; inquiring according to the core words in the seed semantic inquiry word template to obtain a plurality of target words of each core word, wherein each target word comprises a core word and a semantic modifier; inquiring according to the semantic modifiers of each target word to obtain similar words of each semantic modifier; and constructing a semantic query word template based on the similar words of each semantic modifier. The invention queries according to the core words contained in the seed semantic query word template, and continuously expands based on the original core words in the query process, thereby automatically excavating a large number of semantic query word templates.

Description

Method and device for constructing semantic query word template
Technical Field
The invention relates to the technical field of information, in particular to a method and a device for constructing a semantic query word template.
Background
In a search engine, a user sometimes inputs some semantic query words, which are simply referred to as semantic query words, such as "cantonese song", "song listened to before sleep", "classic old song" and the like, which are input in a music search. Because the semantic query words lack pertinence, if the semantic query words are directly queried according to a common keyword matching mode, the query result required by the user is difficult to query. In order to solve the problem, semantic query term templates are usually required to be constructed, each constructed semantic query term template comprises a core term, and when any semantic query term input by a user comprises the core term in any semantic query term template, the semantic query term template is used for searching for the user. For example, if the semantic query word input by the user is "children song", the semantic query word contains the core word "song" in the semantic query word template ". cndot.song", then a search will be performed for the user using the semantic query word template ". cndot.song".
In the prior art, when a semantic query word template is constructed, a manual observation method is usually adopted to identify semantic query words from massive query words on the internet, and then the semantic query word template is constructed according to the identified semantic query words. In the process, because the number of the query terms on the Internet is large, the semantic query term template is constructed by adopting a manual observation method, so that the speed is low, and the cost is high.
Disclosure of Invention
In order to solve the problems of the related art, the embodiment of the invention provides a method and a device for constructing a semantic query word template. The technical scheme is as follows:
in one aspect, a method for constructing a semantic query term template is provided, the method comprising:
acquiring a seed semantic query word template, wherein the seed semantic query word template at least comprises a core word;
inquiring according to the core words in the seed semantic inquiry word template to obtain a plurality of target words of each core word, wherein each target word comprises the core words and semantic modifiers;
inquiring according to the semantic modifiers of each target word to obtain similar words of each semantic modifier;
and constructing a semantic query word template based on the similar words of each semantic modifier.
In another aspect, an apparatus for constructing a semantic query term template is provided, the apparatus comprising:
the acquisition module is used for acquiring a seed semantic query word template, and the seed semantic query word template at least comprises a core word;
the first query module is used for querying according to the core words in the seed semantic query word template to obtain a plurality of target words of each core word, and each target word comprises the core words and semantic modifiers;
the second query module is used for querying according to the semantic modifiers of each target word to obtain similar words of each semantic modifier;
and the building module is used for building a semantic query word template based on the similar words of each semantic modifier.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
the method has the advantages that the core words contained in the seed semantic query word template are queried, and the core words are continuously expanded based on the original core words in the query process, so that a large number of semantic query word templates are automatically excavated, and in the process, a user does not need to observe, so that the cost is reduced, and the construction speed is increased.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for constructing a semantic query term template according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for constructing a semantic query term template according to another embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating similarity calculation according to click-through rate according to another embodiment of the present invention;
FIG. 4 is an exemplary diagram of constructing a semantic query term template provided by another embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an apparatus for constructing a template of semantic query terms according to another embodiment of the present invention;
fig. 6 is a block diagram of an apparatus for constructing a semantic query term template according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
With the development of information technology, information on the internet is increasing explosively, and more users query by means of search engines in order to quickly acquire required information. In a query process with a search engine, a user may input some semantic query words in the search engine, such as "children's fairy", "children's songs", "light music", and so on. Because the semantic query words lack pertinence, if the query is directly carried out according to the semantic query words, the queried result query result hardly meets the requirements of the user. In order to better query a query result meeting the requirement for a user, searching is mainly performed by means of a constructed semantic query term template at present. However, when constructing the semantic query word template, if a manual observation mode is adopted, the speed is slow and the cost is high, so the embodiment of the invention provides a method for constructing the semantic query word template, and referring to fig. 1, the method provided by the embodiment comprises the following steps:
101. and acquiring a seed semantic query word template, wherein the seed semantic query word template at least comprises one core word.
102. And inquiring according to the core words in the seed semantic inquiry word template to obtain a plurality of target words of each core word, wherein each target word comprises a core word and a semantic modifier.
103. And inquiring according to the semantic modifiers of each target word to obtain similar words of each semantic modifier.
104. And constructing a semantic query word template based on the similar words of each semantic modifier.
According to the method provided by the embodiment of the invention, the query is carried out according to the core words contained in the seed semantic query word template, and the expansion is continuously carried out based on the original core words in the query process, so that a large number of semantic query word templates are automatically excavated, and in the process, a user does not need to observe, so that the cost is reduced, and the construction speed is increased.
In another embodiment of the present invention, performing a query according to core words in a seed semantic query word template to obtain a plurality of target words of each core word, includes:
inquiring whether an internet query word containing a core word in the seed semantic query word template exists in the internet query word set;
and when the internet query words containing the core words exist in the internet query word set, taking the internet query words containing the core words as target words.
In another embodiment of the present invention, the querying according to the semantic modifiers of each target word to obtain the similar words of each semantic modifier includes:
calculating the similarity between the semantic modifier of any target word and each internet query word in the internet query word set;
sequencing each internet query word according to the sequence of similarity from high to low to obtain a sequencing result;
and according to the sequencing result, taking the internet query words with the digits before the first designated digit as similar words among the semantic modifiers.
In another embodiment of the present invention, calculating the similarity between the semantic modifier of any target word and each internet query word in the internet query word set comprises:
acquiring a first click rate of a semantic modifier of a target word in a specified document;
acquiring a second click rate of any internet query word in the internet query word set in the specified document;
and calculating the similarity between the semantic modifiers of the target words and the internet query words according to the first click rate and the second click rate.
In another embodiment of the present invention, calculating the similarity between the semantic modifier of the target word and the internet query word according to the first click rate and the second click rate includes:
generating a first vector according to the first click rate;
generating a second vector according to the second click rate;
calculating a rotation value of an included angle between the first vector and the second vector;
taking the cosine value of the included angle as the similarity between the semantic modifier of the target word and the internet query word;
and the dimensions of the first vector and the second vector are equal to the number of the specified documents.
In another embodiment of the present invention, constructing a semantic query term template based on the similar terms of each semantic modifier comprises:
removing the semantic modifiers contained in the similar words of each semantic modifier to obtain semantic expansion words of each semantic modifier;
merging the semantic expansion words to obtain target semantic expansion words;
and removing noise words in the target semantic expansion words to obtain a semantic query word template.
In another embodiment of the present invention, removing noise words in the target semantic expansion words to obtain a semantic query word template, includes:
sequencing the target semantic expansion words according to the frequency from high to low to obtain a sequencing result;
and according to the sequencing result, taking the target semantic expansion word with the digit before the second specified digit as a semantic query word template.
All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.
The embodiment of the invention provides a method for constructing a semantic query word template, and referring to fig. 2, the method provided by the embodiment comprises the following steps:
201. the server obtains a seed semantic query word template, wherein the seed semantic query word template at least comprises a core word.
In a search engine, the semantic query words are some semantic query words input by the user, for example, in a music search engine, "80 year song", "classical old song", "revolutionary song", and so on, input by the user. The semantic query words are different from general query words, the semantic query words are a class of query words with fuzzy semantics, when a search engine queries according to the semantic query words, semantic similarity analysis needs to be performed by adopting some algorithms so as to perform query, and query results required by a user may not be a certain specified query result and are often a class of query results. The semantics of general query words are clear, when a search engine queries according to the general query words, the search engine can directly query in a keyword matching mode, and only provides a specific query result for a user at the time, for example, when the user inputs 'Party B' in a movie and television play search engine, the search engine only needs to provide the relevant movie and television information of the 'Party A' and the 'Party B' for the user at the time.
In this embodiment, the seed semantic query term template is a query term template that is specified in advance by the user before the semantic query term template is constructed, and the seed semantic query term template generally consists of ". about." and a core term, for example, ". about. Oume song", ". about. Piano song", and the like that are specified in advance by the user in the music search. The core word is a keyword for querying in a search engine, and the part of speech of the core word may be a noun, an adjective, or the like, and this embodiment does not specifically limit the part of speech of the core word. For example, the core word may be "europe and america song", "piano song", or the like as described above.
When the server acquires the seed semantic query word template, a user can randomly acquire a preset number of query words from the Internet, remove part of characters in the query words to obtain query keywords, input the query keywords into the server, and acquire the query keywords input by the user and use the query keywords as the seed semantic query word template. The preset number may be 10, 20, 30, and the like, and the embodiment of the present invention does not specifically limit the preset number. Of course, in order to increase the speed of constructing the semantic query word template, the larger the number of the seed semantic query word templates, the better.
202. And the server queries according to the core words in the seed semantic query word template to obtain a plurality of target words of each core word, wherein each target word comprises a core word and a semantic modifier.
In order to better provide a search service for a user, the method provided by the embodiment needs to construct a large number of semantic query term templates based on the seed semantic query term template specified by the user. In the process, the server needs to query according to the core words in the seed semantic query word template to obtain a plurality of target words of each core word. The target words comprise core words and semantic modifiers in any seed semantic query word template. For example, if the core word in the seed semantic query word template is "child song", the queried word containing "child song" may be referred to as the target word according to the core word, such as "child song grand", "child song list", "good-listening child song", and so on.
When the server queries according to the core words in the seed semantic query word template to obtain a plurality of target words of each core word, whether the internet query words containing the core words in the seed semantic query word template exist in the internet query word set or not can be queried, and when the internet query words containing the core words exist in the internet query word set, the internet query words containing the core words are used as the target words. For example, the internet query words included in the internet semantic query word set are "children song master", "children song list", "good-listening children song", "old song master", "recommended several old songs", "which movie songs are available", "classic movies", and the like, the core word in the seed semantic query word template is "good-listening", and since the internet query word "good-listening children song" including "good-listening" exists in the internet query word set, the internet query word "good-listening children song" is taken as the target word, wherein "good-listening" is the core word of the target word, and "children song" is the semantic modifier of the target word.
203. And the server queries according to the semantic modifiers of each target word to obtain similar words of each semantic modifier.
Since the number of the target words queried according to the core words in the seed semantic query word template in step 202 is limited, the semantic query word template constructed according to the limited target words still cannot meet the search requirement of the user, and at this time, the number of the semantic query words used for constructing the semantic query word template needs to be further increased. In the process, the server can take the semantic modifiers of each target word as the core words of the query to perform the query so as to obtain the similar words of each semantic modifier.
It should be noted that the query performed in this step is a similarity query, the similarity query is different from the query in the above step 202, the query in the above step 202 is a query method according to keyword matching, and the query result obtained by this method needs to include a core word; the similarity query mode is wider than the keyword matching query mode, and the query result queried by the similarity query mode can be used as the query result meeting the query requirement of the user as long as the similarity with the query word reaches a certain degree.
In specific implementation, the server performs query according to the semantic modifiers of each target word to obtain the similar words of each semantic modifier, including but not limited to the following (1) to (3):
(1) and the server calculates the similarity between the semantic modifier of any target word and each internet query word in the internet query word set.
In practical application, when the server calculates the similarity between the semantic modifier of any target word and each internet query word in the internet query word set, the server can calculate according to the click rate of the document, and specifically, if the semantic modifier and the click rate in the document queried by any internet query word are similar, the semantic modifier and the internet query word have similarity; specifically, if the similarity between the character string included in the semantic modifier and the character string included in a certain internet query word reaches a certain ratio, such as 60%, 75%, 80%, etc., it indicates that the semantic modifier has similarity with the internet query word, for example, the semantic modifier includes 4 character strings, the internet query word also includes 4 character strings, and the character strings in the semantic modifier and the character strings in the internet query word have 3 similarities, and at this time, the similarity between the character strings in the semantic query word and the character strings in the internet query word reaches 75%. Of course, in addition to calculating the similarity between the semantic modifier of any target word and each internet query word in the internet query word set by using the above several methods, other methods may be used, and this embodiment will not be described one by one.
For the above several ways of calculating the similarity between the semantic modifier of any target word and each internet query word in the internet query word set, details are given below by taking the click rate of a specified document as an example. See the following (a) to (c):
(a) and the server acquires a first click rate of the semantic modifiers of the target words in the appointed document.
The number of the appointed documents is at least two, and the appointed documents are documents which are inquired by the server simultaneously according to the semantic modifiers and any internet inquiry words. The first click rate comprises the document click rate of each document contained in the specified document by the user, and the number of the document click rates contained in the first text click rate is the same as the number of the documents contained in the specified document. When the specified document comprises a first document, a second document and a third document, the first click rate comprises a document click rate of the first document, a document click rate of the second document and a document click rate of the third document. In addition, since the click rate of the user on each document in the designated documents is different in the designated time, the click rate of at least two documents included in the first click rate is different, and if the user does not click on one document in the designated documents in the designated time, the click rate of the document is 0%.
Ways for the server to obtain the first click rate of the semantic modifiers of the target word in the specified document include, but are not limited to: and collecting click results of the user-specified document within the specified time, and obtaining a first click rate based on the collected click results.
(b) And the server acquires a second click rate of any internet query term in the internet query term set in the appointed document.
The second click rate comprises the document click rate of each document contained in the specified document by the user, and the number of the document click rates contained in the second click rate is the same as the number of the documents contained in the specified document. When the specified document comprises a first document, a second document, a third document and a fourth document, the second click rate comprises a document click rate of the first document, a document click rate of the second document, a document click rate of the third document and a document click rate of the fourth document. In addition, since the user clicks on each of the specified documents within the specified time period are different, the click rate of at least two documents included in the second click rate is different, and if the user does not click on one of the specified documents within the specified time period, the click rate of the document is 0%.
(c) And the server calculates the similarity between the semantic modifier of the target word and the internet query word according to the first click rate and the second click rate.
Regarding the way for the server to calculate the similarity between the semantic modifier of the target word and the internet query word according to the first click rate and the second click rate, the following methods are included (c) but not limited thereto1)~(c3):
(c1) And the server generates a first vector according to the first click rate.
The dimension of the first vector is equal to the number of the specified documents, and when the number of the specified documents is 4, the first vector is a four-dimensional vector; when the number of designated documents is 6, the first vector will be a six-dimensional vector. When the server generates the first vector according to the first click rate, the click rate of each document included in the first click rate can be directly used as the coordinate of the first vector, when the appointed document includes n (n is more than or equal to 2) documents, the click rate of the first document is a1The click rate of the document two is a2…, the click rate of document n is anIf the first vector is a ═ a (a)1,a2,…,an). For example, if the number of documents included in the designated document is 3, and the designated document is document one, document two, and document three, respectively, where the click rate of document one is 30%, the click rate of document two is 40%, and the click rate of document three is 30%, the server generates a first vector of (0.3,0.4,0.3) according to the first click rate.
(c2) And the server generates a second vector according to the second click rate.
Wherein the dimension of the second vector is equal to the number of the specified documents, and when the number of the specified documents is 4,the second vector will be a four-dimensional vector; the second vector will be a six-dimensional vector when the number of designated documents is 6. When the server generates the second vector according to the second click rate, the click rate of each document included in the second click rate can be directly used as the coordinate of the second vector, when the appointed document includes n (n is more than or equal to 2) documents, the click rate of the document one is b1The click rate of the document two is b2…, click rate of document n is bnIf the first vector is B ═ B (B)1,b2,…,bn). If the number of the documents contained in the designated document is 3, the documents are respectively a document I, a document II and a document III, wherein the click rate of the document I is 20%, the click rate of the document II is 50% and the click rate of the document III is 30%, the server generates a second vector (0.2,0.5,0.3) according to the second click rate.
(c3) And the server calculates the included angle cosine value of the first vector and the second vector, and then the included angle cosine value is used as the similarity between the semantic modifier of the target word and the internet query word.
Based on the above (c)1) And (c)2) When the server calculates the cosine value of the included angle of the first vector, the server may calculate by using the following formula:
Figure BDA0000697780790000101
for the above-mentioned manner of calculating the similarity between the semantic modifier of any target word and each internet query in the internet query term set according to the click rate of the specified document, details are given below by taking fig. 3 as an example.
Referring to fig. 3, the documents included in the document are designated as a first document, a second document, a third document and a fourth document, and when query is performed according to the semantic modifiers of the target word, the click rate of the first document is 20%, the click rate of the second document is 50%, the click rate of the third document is 30%, and the click rate of the fourth document is 0%, then the first vector a generated by the server is (0.2,0.5,0.3, 0); when a query is performed according to a certain internet query term, the click rate of the first document is 0%, the click rate of the second document is 20%, the click rate of the third document is 50%, and the click rate of the fourth document is 30%, then the second vector B generated by the server is (0,0.2,0.5, 0.3). The cosine value of the included angle between the first vector A and the second vector B is as follows:
Figure BDA0000697780790000111
Figure BDA0000697780790000112
that is, the similarity between the semantic modifier of the target word and the internet query word is 0.181.
(2) And sequencing each internet query word by the server according to the sequence of the similarity from high to low to obtain a sequencing result.
In order to improve the precision of the constructed semantic query word template, after the similarity between the semantic modifier of any target word and each internet query word is calculated by adopting the above method, the method provided by this embodiment further ranks each internet query word according to the sequence of the similarity from high to low, so as to obtain a ranking result. Of course, each internet query term may also be sorted in other manners, and this embodiment is not described one by one here.
It should be noted that, in the process of sorting each internet query term according to the sequence of similarity from high to low, when the similarities of at least two internet query terms and the semantic modifiers of the target term are the same, the same digits can be assigned to the internet query terms with the same similarities.
(3) And the server takes the internet query word with the digit number before the first designated digit number as a similar word of the semantic modifier according to the sequencing result.
The first designated number of bits may be 10, 20, 30, etc., and the embodiment does not specifically limit the first designated number of bits. Taking the first designated digit 10 as an example, the server may use the internet query word with the top 10 digits as the similar word of the semantic modifier.
204. Based on the similar words of each semantic modifier, the server constructs a semantic query word template.
Based on the original seed semantic query word template, by adopting the query of the steps 202-203, the server acquires a large number of similar words, and then the server can construct the semantic query word template according to the similar words of each semantic modifier.
Based on the similar words of each semantic modifier, the server can adopt the following (1) to (3) when constructing the semantic query word template:
(1) and the server removes the semantic modifiers contained in the similar words of each semantic modifier to obtain the semantic extension words of each semantic modifier.
When the query is performed according to the semantic modifiers of the target word, the obtained similar words not only include the semantic modifiers, but also include words obtained by expanding the semantic modifiers, that is, semantic expanded words. After the semantic modifiers included in the similar words of each semantic modifier are removed, the semantic extended words of each semantic modifier can be obtained. In the process, if the obtained similar words contain words with similar semantic modifiers, the words with similar semantic modifiers can be removed to obtain semantic expansion words of the semantic modifiers.
(2) And the server combines the semantic expansion words to obtain the target semantic expansion words.
Since the semantic expansion words of the semantic modifiers obtained in the step (1) may be the same, and the semantic query word templates obtained according to the same semantic expansion words are also the same, in order to avoid repeated templates in the constructed semantic query word templates, the server also performs a merging operation on the semantic expansion words. And merging the semantic expansion words to obtain the target semantic expansion words. In addition, in the process of merging the semantic expansion words, the server also records the frequency of occurrence of each semantic expansion word.
(3) And the server removes the noise words in the target semantic expansion words to obtain a semantic query word template.
In the process of query, the target semantic expansion words may have noise words due to interference of other internet words, and in order to improve the accuracy of the constructed semantic query word template, the noise words are often removed to obtain the semantic query word template. And because the frequency of the noise words is generally low, when the server removes the noise words in the target semantic expansion words to obtain the semantic query word template, the server can sequence the target semantic expansion words according to the frequency of occurrence of each target semantic expansion word from high to low to obtain a sequencing result, and then according to the sequencing result, the target semantic expansion words with the digits before the second designated digit are used as the semantic query word template. The second designated bit number may be 5 bits, 6 bits, 7 bits, etc., and the second designated bit number is not specifically limited in this embodiment.
For the above-mentioned whole process of constructing the semantic query term template, for the convenience of understanding, the following will use fig. 4 as an example to describe in detail.
In the first step, a user designates a template' classical ″, and a server acquires the template and uses the template as a seed semantic query word template.
And secondly, the server queries according to the core word 'classic' in the seed semantic query word template to query target words 'classic children song', 'classic old song' and 'classic film and television song'. The semantic modifier of the target word "classical children song" is "children song", the semantic modifier of the target word "classical old song" is "old song", and the semantic modifier of the target word "classical film song" is "film song".
And thirdly, inquiring according to the semantic modifiers of the target words respectively. When the query is carried out according to the semantic modifier 'children's song 'of the target word, the similar words' children's songs are large and complete', 'children's songs list ',' good-hearing children's songs' can be queried; when the query is carried out according to the semantic modifier 'old song' of the target word, similar words 'old song is big and complete', 'good-hearing old song', 'several old songs are recommended'; when the query is carried out according to the semantic modifier 'movie songs', the similar words 'good-listening movie songs', 'recommended movie songs' and 'movie songs' can be queried.
And fourthly, removing semantic modifiers in the obtained similar words, namely removing ' children ' songs ' from ' children ' songs big whole ', ' children ' songs list ' and ' children ' songs ', obtaining semantic expansion words of ' old songs ' from ' stars big whole ', ' stars list ' and ' good-listening ' words, removing ' old songs ' from ' good-listening old songs ' from ' songs from ' stars big whole ', ' good-listening ' from ' songs from ' movies ' from ' songs from ' good-listening ' songs ', from ' songs from ' movies ' from ' and ' songs from ' movies ' from ' and ' movies ' from ' and ' from ' movies ' from ' to ' from ' movies ' and ' from ' movies ' from ' to ' from. Merging the semantic expansion words to obtain target semantic expansion words, sequencing according to the occurrence frequency of the target semantic expansion words, removing noise words according to a sequencing result, and finally obtaining semantic query word templates of full, simple and audible words, recommended heads and certain words.
205. And the server judges whether the number of the obtained semantic query word templates meets the requirement, if so, the process is ended, and if not, the obtained semantic query word templates are used as seed semantic query word templates for secondary iteration.
In order to better query more query results meeting the requirements for the user according to the constructed semantic query word template, after the semantic query word template is constructed by adopting the method, the server also judges whether the number of the semantic query word template meets the requirements or not. In the specific judgment, the server compares the number of the obtained semantic query word templates with a preset threshold, wherein the preset threshold may be 2000, 3000, 5000 and the like. If the number of the obtained semantic query word templates is larger than a preset threshold value, judging that the number of the obtained semantic query word templates meets the requirement, and ending the process of constructing the semantic query word templates; if the number of the obtained semantic query word templates is smaller than the preset threshold, judging that the number of the obtained semantic query word templates does not meet the requirement, performing secondary iteration by using the obtained semantic query word templates as seed semantic query word templates, wherein the specific implementation process is the same as that of the steps 201 to 204, and is not repeated here.
According to the method provided by the embodiment of the invention, the query is carried out according to the core words contained in the seed semantic query word template, and the expansion is continuously carried out based on the original core words in the query process, so that a large number of semantic query word templates are automatically excavated, and in the process, a user does not need to observe, so that the cost is reduced, and the construction speed is increased.
Referring to fig. 5, an embodiment of the present invention provides an apparatus for constructing a semantic query term template, where the apparatus includes:
an obtaining module 501, configured to obtain a seed semantic query term template, where the seed semantic query term template at least includes a core term;
a first query module 502, configured to perform query according to core words in the seed semantic query word template to obtain multiple target words of each core word, where each target word includes a core word and a semantic modifier;
the second query module 503 is configured to perform query according to the semantic modifiers of each target word to obtain similar words of each semantic modifier;
a building module 504, configured to build a semantic query term template based on the similar terms of each semantic modifier.
In another embodiment of the present invention, the first query module 502 is configured to query whether an internet query term including a core term in a seed semantic query term template exists in the internet query term set; and when the internet query words containing the core words exist in the internet query word set, taking the internet query words containing the core words as target words.
In another embodiment of the present invention, the second query module 503 is configured to calculate a similarity between a semantic modifier of any target word and each internet query word in the internet query word set; sequencing each internet query word according to the sequence of similarity from high to low to obtain a sequencing result; and according to the sequencing result, taking the internet query word with the digit number before the first designated digit number as a similar word of the semantic modifier.
In another embodiment of the present invention, the second query module 503 is specifically configured to obtain a first click rate of a semantic modifier of a target word in a specified document; acquiring a second click rate of any internet query word in the internet query word set in the specified document; and calculating the similarity between the semantic modifiers of the target words and the internet query words according to the first click rate and the second click rate.
In another embodiment of the present invention, the second query module 503 is specifically configured to generate a first vector according to the first click rate; generating a second vector according to the second click rate; calculating a rotation value of an included angle between the first vector and the second vector; taking the cosine value of the included angle as the similarity between the semantic modifier of the target word and the internet query word; and the dimensions of the first vector and the second vector are equal to the number of the specified documents.
In another embodiment of the present invention, the constructing module 504 is configured to remove the semantic modifiers included in the similar words of each semantic modifier to obtain semantic extension words of each semantic modifier; merging the semantic expansion words to obtain target semantic expansion words; and removing noise words in the target semantic expansion words to obtain a semantic query word template.
In another embodiment of the present invention, the constructing module 504 is specifically configured to sort the target semantic expansion words from high to low according to the frequency to obtain a sorting result; and according to the sequencing result, taking the target semantic expansion word with the digit before the second specified digit as a semantic query word template.
In summary, the apparatus provided in the embodiment of the present invention performs query according to the core words included in the seed semantic query word template, and continuously expands based on the original core words in the query process, so as to automatically dig out a large number of semantic query word templates.
FIG. 6 is a block diagram illustrating an apparatus 600 for constructing a semantic query term template in accordance with an exemplary embodiment. For example, the apparatus 600 may be provided as a server that constructs a template of semantic query terms. Referring to fig. 6, the apparatus 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, for storing instructions, such as applications, that are executable by the processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform the above-described method of constructing a semantic query term template, the method comprising:
acquiring a seed semantic query word template, wherein the seed semantic query word template at least comprises a core word;
inquiring according to the core words in the seed semantic inquiry word template to obtain a plurality of target words of each core word, wherein each target word comprises a core word and a semantic modifier;
inquiring according to the semantic modifiers of each target word to obtain similar words of each semantic modifier;
and constructing a semantic query word template based on the similar words of each semantic modifier.
In another embodiment of the present invention, performing a query according to core words in a seed semantic query word template to obtain a plurality of target words of each core word, includes:
inquiring whether an internet query word containing a core word in the seed semantic query word template exists in the internet query word set;
and when the internet query words containing the core words exist in the internet query word set, taking the internet query words containing the core words as target words.
In another embodiment of the present invention, the querying according to the semantic modifiers of each target word to obtain the similar words of each semantic modifier includes:
calculating the similarity between the semantic modifier of any target word and each internet query word in the internet query word set;
sequencing each internet query word according to the sequence of similarity from high to low to obtain a sequencing result;
and according to the sequencing result, taking the internet query word with the digit number before the first designated digit number as a similar word of the semantic modifier.
In another embodiment of the present invention, calculating the similarity between the semantic modifier of any target word and each internet query word in the internet query word set comprises:
acquiring a first click rate of a semantic modifier of a target word in a specified document;
acquiring a second click rate of any internet query word in the internet query word set in the specified document;
and calculating the similarity between the semantic modifiers of the target words and the internet query words according to the first click rate and the second click rate.
In another embodiment of the present invention, calculating the similarity between the semantic modifier of the target word and the internet query word according to the first click rate and the second click rate includes:
generating a first vector according to the first click rate;
generating a second vector according to the second click rate;
calculating a rotation value of an included angle between the first vector and the second vector;
taking the cosine value of the included angle as the similarity between the semantic modifier of the target word and the internet query word;
and the dimensions of the first vector and the second vector are equal to the number of the specified documents.
In another embodiment of the present invention, constructing a semantic query term template based on the similar terms of each semantic modifier comprises:
removing the semantic modifiers contained in the similar words of each semantic modifier to obtain semantic expansion words of each semantic modifier;
merging the semantic expansion words to obtain target semantic expansion words;
and removing noise words in the target semantic expansion words to obtain a semantic query word template.
In another embodiment of the present invention, removing noise words in the target semantic expansion words to obtain a semantic query word template, includes:
sequencing the target semantic expansion words according to the frequency from high to low to obtain a sequencing result;
and according to the sequencing result, taking the target semantic expansion word with the digit before the second specified digit as a semantic query word template.
The server 600 may also include a power component 626 configured to perform power management of the server 600, a wired or wireless network interface 650 configured to connect the server 600 to a network, and an input/output (I/O) interface 658. The server 600 may operate based on an operating system stored in memory 632, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
The device provided by the embodiment of the invention queries according to the core words contained in the seed semantic query word template, and continuously expands based on the original core words in the query process, so that a large number of semantic query word templates are automatically excavated, and in the process, a user does not need to observe, so that the cost is reduced, and the construction speed is increased.
It should be noted that: the apparatus for constructing a semantic query term template according to the above embodiments is exemplified by only the division of the above functional modules when constructing a semantic query term template, and in practical applications, the above functions may be allocated by different functional modules as needed, that is, the internal structure of the semantic query term template is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus for constructing a semantic query term template and the method for constructing a semantic query term template provided in the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (16)

1. A method of constructing a semantic query term template, the method comprising:
acquiring a seed semantic query word template appointed by a user in advance, wherein the seed semantic query word template at least comprises a core word;
inquiring according to the core words in the seed semantic inquiry word template to obtain a plurality of target words of each core word, wherein each target word comprises the core words and semantic modifiers;
inquiring according to the semantic modifiers of each target word to obtain similar words of each semantic modifier;
and constructing a semantic query word template based on the similar words of each semantic modifier, wherein the semantic query word template is used for searching for the user by using the semantic query word template when the user inputs semantic query words in a search engine, and the semantic query words are query words with fuzzy semantics on a semantic level.
2. The method of claim 1, wherein the querying according to the core words in the seed semantic query word template to obtain a plurality of target words of each core word comprises:
inquiring whether an internet query word containing a core word in the seed semantic query word template exists in an internet query word set or not;
and when the internet query words containing the core words exist in the internet query word set, taking the internet query words containing the core words as target words.
3. The method according to claim 1, wherein said querying according to the semantic modifiers of each target word to obtain the similar words of each semantic modifier comprises:
calculating the similarity between the semantic modifier of any target word and each internet query word in the internet query word set;
sequencing each internet query word according to the sequence of similarity from high to low to obtain a sequencing result;
and according to the sequencing result, taking the Internet query word with the digit number before the first designated digit number as the similar word of the semantic modifier.
4. The method of claim 3, wherein the calculating the similarity between the semantic modifier of any target word and each internet query word in the internet query word set comprises:
acquiring a first click rate of the semantic modifiers of the target words in the appointed document;
acquiring a second click rate of any internet query word in the internet query word set in the specified document;
and calculating the similarity between the semantic modifiers of the target words and the internet query words according to the first click rate and the second click rate.
5. The method according to claim 4, wherein the calculating the similarity between the semantic modifier of the target word and the internet query word according to the first click rate and the second click rate comprises:
generating a first vector according to the first click rate;
generating a second vector according to the second click rate;
calculating a complementary rotation value of an included angle between the first vector and the second vector;
taking the cosine value of the included angle as the similarity between the semantic modifier of the target word and the internet query word;
and the dimensions of the first vector and the second vector are equal to the number of the specified documents.
6. The method of claim 1, wherein constructing a semantic query term template based on the similar terms of each semantic modifier comprises:
removing the semantic modifiers contained in the similar words of each semantic modifier to obtain semantic extension words of each semantic modifier;
merging the semantic expansion words to obtain target semantic expansion words;
and removing noise words in the target semantic expansion words to obtain a semantic query word template.
7. The method according to claim 6, wherein the removing the noise word in the target semantic expansion word to obtain a semantic query word template comprises:
sequencing the target semantic expansion words according to the frequency from high to low to obtain a sequencing result;
and according to the sequencing result, taking the target semantic expansion word with the digit number before the second specified digit number as a semantic query word template.
8. An apparatus for constructing a template of semantic query terms, the apparatus comprising:
the acquisition module is used for acquiring a seed semantic query word template appointed by a user in advance, wherein the seed semantic query word template at least comprises a core word;
the first query module is used for querying according to the core words in the seed semantic query word template to obtain a plurality of target words of each core word, and each target word comprises the core words and semantic modifiers;
the second query module is used for querying according to the semantic modifiers of each target word to obtain similar words of each semantic modifier;
and the construction module is used for constructing a semantic query word template based on the similar words of each semantic modifier, the semantic query word template is used for searching for the user by using the semantic query word template when the user inputs the semantic query words in a search engine, and the semantic query words are query words with fuzzy semantics on a semantic level.
9. The apparatus of claim 8, wherein the first query module is configured to query whether there is an internet query term in the internet query term set that includes a core term in the seed semantic query term template; and when the internet query words containing the core words exist in the internet query word set, taking the internet query words containing the core words as target words.
10. The apparatus of claim 8, wherein the second query module is configured to calculate a similarity between a semantic modifier of any target word and each internet query word in the internet query word set; sequencing each internet query word according to the sequence of similarity from high to low to obtain a sequencing result; and according to the sequencing result, taking the Internet query word with the digit number before the first designated digit number as the similar word of the semantic modifier.
11. The apparatus according to claim 10, wherein the second query module is specifically configured to obtain a first click rate of the semantic modifier of the target word in a specified document; acquiring a second click rate of any internet query word in the internet query word set in the specified document; and calculating the similarity between the semantic modifiers of the target words and the internet query words according to the first click rate and the second click rate.
12. The apparatus of claim 11, wherein the second query module is specifically configured to generate a first vector according to the first click rate; generating a second vector according to the second click rate; calculating a complementary rotation value of an included angle between the first vector and the second vector; taking the cosine value of the included angle as the similarity between the semantic modifier of the target word and the internet query word;
and the dimensions of the first vector and the second vector are equal to the number of the specified documents.
13. The apparatus according to claim 8, wherein the constructing module is configured to remove the semantic modifiers included in the similar words of each semantic modifier to obtain the semantic expansion words of each semantic modifier; merging the semantic expansion words to obtain target semantic expansion words; and removing noise words in the target semantic expansion words to obtain a semantic query word template.
14. The apparatus according to claim 13, wherein the building module is specifically configured to rank the target semantic expansion words according to a frequency from high to low, so as to obtain a ranking result; and according to the sequencing result, taking the target semantic expansion word with the digit number before the second specified digit number as a semantic query word template.
15. A computer-readable storage medium storing one or more programs for use by one or more processors in performing the method of constructing a semantic query term template according to any one of claims 1 to 7.
16. A server, comprising one or more processors and memory, the memory storing one or more programs, the one or more programs being used by the one or more processors to perform the method of constructing semantic query term templates according to any one of claims 1 to 7.
CN201510172096.4A 2015-04-13 2015-04-13 Method and device for constructing semantic query word template Active CN106156141B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510172096.4A CN106156141B (en) 2015-04-13 2015-04-13 Method and device for constructing semantic query word template

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510172096.4A CN106156141B (en) 2015-04-13 2015-04-13 Method and device for constructing semantic query word template

Publications (2)

Publication Number Publication Date
CN106156141A CN106156141A (en) 2016-11-23
CN106156141B true CN106156141B (en) 2020-04-24

Family

ID=57336713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510172096.4A Active CN106156141B (en) 2015-04-13 2015-04-13 Method and device for constructing semantic query word template

Country Status (1)

Country Link
CN (1) CN106156141B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236664A (en) * 2010-04-28 2011-11-09 百度在线网络技术(北京)有限公司 Retrieval system, retrieval method and information processing method based on semantic normalization
WO2012142552A1 (en) * 2011-04-15 2012-10-18 Microsoft Corporation Interactive semantic query suggestion for content search
CN104331456A (en) * 2014-10-31 2015-02-04 百度在线网络技术(北京)有限公司 Method and device for mining sort named entities

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236664A (en) * 2010-04-28 2011-11-09 百度在线网络技术(北京)有限公司 Retrieval system, retrieval method and information processing method based on semantic normalization
WO2012142552A1 (en) * 2011-04-15 2012-10-18 Microsoft Corporation Interactive semantic query suggestion for content search
CN104331456A (en) * 2014-10-31 2015-02-04 百度在线网络技术(北京)有限公司 Method and device for mining sort named entities

Also Published As

Publication number Publication date
CN106156141A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
US7895195B2 (en) Method and apparatus for constructing a link structure between documents
Phan et al. Pair-linking for collective entity disambiguation: Two could be better than all
CN108280114B (en) Deep learning-based user literature reading interest analysis method
JP2017508214A (en) Provide search recommendations
CN110704743A (en) Semantic search method and device based on knowledge graph
JP5616444B2 (en) Method and system for document indexing and data querying
KR20080031262A (en) Relationship networks
CN103577416A (en) Query expansion method and system
CN109783628B (en) Method for searching KSAARM by combining time window and association rule mining
US10467307B1 (en) Grouping of item data using seed expansion
CN112115232A (en) Data error correction method and device and server
CN106294358A (en) The search method of a kind of information and system
CN103226601B (en) A kind of method and apparatus of picture searching
CN105404677A (en) Tree structure based retrieval method
JP5497105B2 (en) Document retrieval apparatus and method
CN113761162A (en) Code searching method based on context awareness
CN111859079B (en) Information searching method, device, computer equipment and storage medium
CN110472058B (en) Entity searching method, related equipment and computer storage medium
CN116610810A (en) Intelligent searching method and system based on regulation and control of cloud knowledge graph blood relationship
CN105426490A (en) Tree structure based indexing method
CN106156141B (en) Method and device for constructing semantic query word template
CN106294784B (en) resource searching method and device
Martins et al. Modeling temporal evidence from external collections
CN110930189A (en) Personalized marketing method based on user behaviors
CN116738065B (en) Enterprise searching method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant