CN111324805B - Query intention determining method and device, searching method and searching engine - Google Patents

Query intention determining method and device, searching method and searching engine Download PDF

Info

Publication number
CN111324805B
CN111324805B CN201811523459.4A CN201811523459A CN111324805B CN 111324805 B CN111324805 B CN 111324805B CN 201811523459 A CN201811523459 A CN 201811523459A CN 111324805 B CN111324805 B CN 111324805B
Authority
CN
China
Prior art keywords
query
intention
search
type information
strength
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811523459.4A
Other languages
Chinese (zh)
Other versions
CN111324805A (en
Inventor
肖佳坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201811523459.4A priority Critical patent/CN111324805B/en
Publication of CN111324805A publication Critical patent/CN111324805A/en
Application granted granted Critical
Publication of CN111324805B publication Critical patent/CN111324805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a query intention determining method and a query intention determining device, wherein the method comprises the following steps: receiving a query term; acquiring historical query information of the query words; obtaining intention recognition features of the query words according to the historical query information of the query words; and determining the intention strength of the query term for the specified type information by utilizing the intention recognition characteristic of the query term and a pre-constructed specified type intention strength prediction model. By using the method and the device, the intention of the query word input by the user can be accurately identified.

Description

Query intention determining method and device, searching method and searching engine
Technical Field
The invention relates to the field of information search, in particular to a query intention determining method and device, and also relates to a searching method and a searching engine.
Background
At present, with the increasing amount of information on the internet, users often obtain massive search results when searching for the information by using a traditional search engine, and a large part of the search results are far away from the search intention of the users, so that the users are not benefited for the information. For this reason, it is highly desirable that during information retrieval, search engines understand their personalized information needs and return search results that are highly matched to their query intent, especially for hot events or news-like events, where users would prefer to be able to obtain up-to-date relevant information.
Disclosure of Invention
An aspect of the embodiment of the invention provides a query intention determining method and device, so as to accurately identify intention of query words input by a user to query specified type information.
An aspect of the embodiments of the present invention provides a search method and a search engine, which can accurately provide a search result matching with a query intention to a user.
Therefore, the invention provides the following technical scheme:
a query intent determination method, the method comprising:
receiving a query term;
acquiring historical query information of the query words;
obtaining intention recognition features of the query words according to the historical query information of the query words;
and determining the intention strength of the query term for the specified type information by utilizing the intention recognition characteristic of the query term and a pre-constructed specified type intention strength prediction model.
Optionally, the obtaining the historical query information of the query term includes:
and counting the query times of the query words in at least one unit time within a preset time period to obtain a query times list.
Optionally, the obtaining the intention recognition feature of the query word according to the historical query information of the query word includes:
converting the query times list into a line graph;
And acquiring the change trend characteristic of the line graph, and taking the change trend characteristic as the intention recognition characteristic of the query word corresponding to the time period.
Optionally, the trend feature includes any one or more of the following: the number of wave crests, the positions of the wave crests, the width of the wave crests and the height of the wave crests; the width of the wave crest comprises the nearest wave crest width in time and/or the widest wave crest width; the height of the peaks may include the most recent peak height in time, and/or the highest peak height.
Optionally, the preset time periods are multiple, the multiple different time periods have inclusion relations, and unit time granularity in the different time periods is different;
the method further comprises the steps of:
and weighting the intention strength of the query words based on the query specified type information of different granularity levels, and taking the weighted result as the intention strength of the query words.
Optionally, the method further comprises:
acquiring related words of the query words, and determining the intention strength of the related words for querying the appointed type information, wherein the related words comprise synonyms and/or paraphrasing;
and correcting the intention strength of the appointed type information of the query word according to the intention strength of the appointed type information of the related word query.
Optionally, the correcting the intention strength of the specified type information of the query term according to the intention strength of the specified type information of the related term query includes:
and weighting the intention strength of the query term query appointed type information and the intention strength of the related term query appointed type information, and taking the weighted result as the intention strength of the query term query appointed type information.
A search method, comprising:
receiving a search sentence input by a user, and extracting a query word from the search sentence;
obtaining search results corresponding to the query terms;
determining the intention strength of the information of the designated type of the query word query by using the method;
and sequencing the search results according to the intention strength of the query specified type information.
Optionally, the ranking the search results according to the intent strength of the query-specific type information includes:
and if the intention strength of the query appointed type information is greater than a set value, arranging the documents corresponding to the appointed type information in the search result before other documents.
A query intent determination device, the device comprising:
The query term acquisition module is used for receiving the query terms;
the historical information acquisition module is used for acquiring historical query information of the query words;
the characteristic acquisition module is used for acquiring the intention recognition characteristic of the query word according to the historical query information;
and the intention determining module is used for determining the intention strength of the query term for inquiring the specified type information by utilizing the intention recognition characteristic of the query term and a pre-constructed specified type intention strength prediction model.
Optionally, the history information obtaining module is specifically configured to count the number of times of querying the query term in at least one unit time within a preset time period, so as to obtain a query number list.
Optionally, the feature acquisition module includes:
the data conversion unit is used for converting the query times list into a line graph;
and the characteristic determining unit is used for acquiring the change trend characteristic of the line graph and taking the change trend characteristic as the intention recognition characteristic of the query word corresponding to the time period.
Optionally, the trend feature includes any one or more of the following: the number of wave crests, the positions of the wave crests, the width of the wave crests and the height of the wave crests; the width of the wave crest comprises the nearest wave crest width in time and/or the widest wave crest width; the height of the peaks may include the most recent peak height in time, and/or the highest peak height.
Optionally, there are at least two preset time periods, different time periods have inclusion relation, and unit time granularity in different time periods is different;
the apparatus further comprises:
and the weighting processing module is used for weighting the intention strength of the query word obtained by the intention determining module based on the query specified type information of different granularity levels, and taking the weighted result as the intention strength of the query word query specified type information.
Optionally, the apparatus further comprises: the related word acquisition module and the correction module;
the related word acquisition module is used for acquiring related words of the query words, wherein the related words comprise synonyms and/or paraphrasing;
the history information acquisition module is also used for acquiring history query information of the related words;
the characteristic acquisition module is also used for acquiring the intention recognition characteristic of the related word according to the historical query information of the related word;
the intention determining module is used for determining the intention strength of the related word query specified type information by utilizing the intention recognition characteristics of the related word.
The correction module is used for correcting the intention strength of the appointed type information of the query word according to the intention strength of the appointed type information of the associated word query.
Optionally, the correction module is specifically configured to weight the intensity of the intention of the query term for querying the specified type information and the intensity of the intention of the related term for querying the specified type information, and use the weighted result as the intensity of the intention of the query term for querying the specified type information.
A search engine, comprising: search front end, search back end, and the aforementioned query intent determination device;
the front search end is used for receiving search sentences input by a user and transmitting the search sentences to the back search end;
the search back end is used for extracting query words from the search sentences and obtaining search results corresponding to the query words;
the query intention determining device is used for determining the intention strength of the query specified type information of the query word;
and the search back end is also used for sorting the search results according to the intention strength of the query term query appointed type information.
Optionally, when the intention strength of the query appointed type information is greater than a set value, the search back end ranks the documents corresponding to the appointed type information in the search result before other documents.
An electronic device, comprising: one or more processors, memory;
the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to implement the methods described above.
A readable storage medium having stored thereon instructions that are executed to implement the method described previously.
According to the query intention determining method and device provided by the embodiment of the invention, the intention identifying characteristic of the query word is obtained according to the historical query information of the query word, and then the intention strength of the query word for inquiring the specified type information is determined by utilizing the intention identifying characteristic of the query word and a pre-constructed specified type intention strength prediction model.
The change condition of the query times reflects the popularity of the query words from the side, so the scheme of the invention further counts the query times of the query words, takes the change condition of the query times as the intention recognition characteristic of the query words, and can better predict the intention of the query words for inquiring the appointed type information. Especially for some bursty news, if many media on the network do not get to release articles, too many news documents are not obtained by using the prior art, and the intention of inquiring news type information by the inquiry words can not be predicted; if the scheme of the invention is utilized, the query quantity of the user on the burst news is often large, so that the intention of the query word for querying news information can be accurately predicted.
Further, the intent strength of the query word based on the specific type information of the query of different unit time granularity can be respectively predicted according to a plurality of time periods of different granularity, and then the intent strength of the query word is determined by integrating the intent strengths of the different unit time granularity, so that the final prediction result is more accurate.
Further, the meaning strength of the specified type information is queried by the synonyms and/or the paraphraseology of the query words is integrated, the meaning strength of the specified type information is corrected when the query words are queried, more accurate prediction results can be obtained, and particularly, the influence of a single query word on the prediction results can be avoided when documents with the same specified type information have multiple semantic expression modes.
According to the search method and the search engine provided by the embodiment of the invention, the search results are ordered according to the intention strength of the query word query appointed type information, so that the user experience can be greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a flow chart of a query intent determination method in accordance with an embodiment of the present invention;
FIG. 2 is an example of a line graph corresponding to a list of query times in an embodiment of the present invention;
FIG. 3 is another flow chart of a query intent determination method in accordance with an embodiment of the present invention;
FIG. 4 is a flow chart of a search method according to an embodiment of the present invention;
FIG. 5 is a block diagram showing a construction of a query intention determining apparatus according to an embodiment of the present invention;
FIG. 6 is another block diagram of a query intent determination device in accordance with an embodiment of the present invention;
FIG. 7 is another block diagram of the query intention determining apparatus according to the embodiment of the present invention;
FIG. 8 is a block diagram of a search engine according to an embodiment of the present invention;
FIG. 9 is a block diagram illustrating an apparatus for a query intent determination method, according to an example embodiment;
fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the solution of the embodiment of the present invention better understood by those skilled in the art, the embodiment of the present invention is further described in detail below with reference to the accompanying drawings and embodiments.
The embodiment of the invention provides a query intention determining method and a query intention determining device, which are used for obtaining intention recognition features of query words according to historical query information of the query words, and then determining the intention strength of the query specified type information of the query words by utilizing the intention recognition features of the query words and a pre-constructed specified type intention strength prediction model.
Referring to fig. 1, a flowchart of a query intent determination method according to an embodiment of the present invention includes the following steps:
step 101, receiving query terms.
The query term may be all or part of text in a search sentence input by the user, where the search sentence may be input into a search field of the browser by multiple input methods provided by the user through the smart device, for example, the search sentence may be input by voice input, text input, handwriting input, and so on.
It should be noted that, the obtaining of the query term needs to perform a certain process on the search term, such as removing some non-keywords in the search term, performing appropriate transformation on some terms, and so on, so as to increase accuracy and comprehensiveness of the recalled search result. These processes may be performed using prior art techniques and are not described in detail herein.
Step 102, obtaining historical query information of the query words.
The historical query information of the query word mainly refers to the historical query times of the query word, and the change condition of the query times can reflect the heat change condition of the query word, so that the intention strength of the query word for querying the appointed type information can be reflected to a certain extent.
Specifically, the number of times of the query word in unit time can be counted in a last period of time, such as 48 hours, a week, a month, etc., so as to obtain a list of the number of times of the query. The unit time can be set to different granularity according to different statistical time periods, for example:
for the historical query information within 48 hours, respectively counting the query times per hour;
for historical query information in a week, respectively counting the query times of each day;
for historical query information in one month, the number of queries per week is counted.
It should be noted that the above granularity per unit time is merely an example, and in practical application, the granularity per unit time may be set according to needs, for example, for historical query information in one month, or the number of queries per day or per hour may be counted, which is not limited to this embodiment of the present invention.
And step 103, obtaining the intention recognition characteristics of the query words according to the historical query information of the query words.
Because the change condition of the query times reflects the popularity of the query words from the side, the change condition of the query times can be used as the intention recognition characteristic of the query words in the embodiment of the invention.
In order to more conveniently describe the change condition of the query times, the query times list can be converted into a line graph, wherein the horizontal axis of the line graph represents time, and the vertical axis of the line graph represents statistical times. And then, obtaining the change trend characteristic of the line graph, and taking the change trend characteristic as the intention recognition characteristic of the query word corresponding to the time period.
The trend characteristic of the line graph may specifically include, but is not limited to: the number of the wave peaks, the positions of the wave peaks, the width of the wave peaks and the height of the wave peaks of the line graph. Wherein the width of the peak may be the most recent peak width in time, and/or the widest peak width; the height of the peaks may be the most recent peak height in time, and/or the highest peak height. For example, for a query term, the number of queries per day over the last 15 days is as follows:
(53,105,85,106,77,32,10,0,0,0,0,0,0,0,0)。
corresponding to the statistical data, a corresponding line graph is obtained as shown in fig. 2.
The positions of the wave crests are 2 and 4, the number of the wave crests is 2, and the height and the width of the wave crests are shown in the figure.
The number of peaks is 2, the positions of the peaks are t=2 and t=4, the height of the peak is 105, and the width of the peak is 2.
And 104, determining the intention strength of the query term query specified type information by utilizing the intention recognition characteristic of the query term and a pre-constructed specified type intention strength prediction model.
The specified type information is mainly information with relatively high timeliness, such as news information, stock information, exchange rate information and the like. Timeliness refers to the fact that the same thing has great differences in properties or effects or value at different times.
The specified type of intent strength prediction model may employ a GBDT (Gradient Boosting Decision Tree, gradient iterative decision tree) model, which is a DT model trained using a GB strategy. And the input of the appointed type intention strength prediction model is the intention recognition characteristic of the query word, and the intention strength score of the appointed type information of the query word is output.
The training manner of the specific type intention strength prediction model is similar to that of a conventional GBDT, and is not repeated here.
When determining the intention strength of the query term query specified type information, inputting the intention recognition feature of the query term obtained in the step 103 into the specified type intention strength prediction model, and outputting according to the model to obtain the intention strength of the query term query specified type information.
For example, for a query term, the number of queries per hour, that is, the number of pv (page views) per hour, of the query term in the last 48 hours is counted, the change condition of the number of queries per hour of the query term in the last 48 hours can be seen according to the pv numbers, the intention recognition feature of the query term is determined by using the information, and the intention recognition feature and the specified type intention strength prediction model are utilized to obtain the intention strength of the query term in the level of 'hours'.
According to the query intention determining method provided by the embodiment of the invention, the intention judgment is not carried out depending on the document characteristics recalled by the query words, but the intention of the query words for inquiring the appointed type information is predicted based on the historical query information of the query words, so that the predicted result can reach higher accuracy, and especially for some query words related to burst news or hot events and the like, the intention of the query words for inquiring the type information can be reflected in time.
It should be noted that, in practical application, for a query term, a plurality of query times with different time periods and different unit time granularities may be counted at the same time to obtain the intent strength of the query term for querying the specified type information based on different granularity levels, and then the intent strength based on different granularity levels is comprehensively considered, for example, the final intent strength is determined through weighting calculation. The plurality of different time periods have an inclusion relationship, and the granularity per unit time within the different time periods is different.
For example, for the query term "Chongqing bus falls into river cause", the procedure of determining the intention intensity of the query news type information (for convenience of description, it will be referred to as news intention intensity hereinafter) is as follows:
(1) The number of times the query term was queried per hour in the last 48 hours is counted to form a query number list (18 25 29 26 31 21 11 5 0 0 29 6 20 19 21 32 34 44 16 21 10 34 31 0 0 0 0 0 0 0 2 26 0 12 10 32 3 3 5). The list shows that the word was searched 18 times in the past 0-1 hour, 25 times in the past 1-2 hours, and so on.
And drawing a line graph by using the information of the query times list, wherein the horizontal axis represents time and the vertical axis represents statistical times. By using the information of the peaks, the troughs and the like of the line graph, various characteristics such as the positions of the peaks, the number of the peaks, the width of the peaks, the length of the peaks and the like can be constructed.
These features are input into a news type intention strength prediction model to predict, so that the news intention strength of the query word on the level of 'hour' can be obtained.
(2) Also for this query term, the number of times it is queried per day for one month is counted, as in (219 78 43 25 26 59 37 105 14 11 20 10 6 7 7 7 9 9 15 16 5 14 7 5 0 3 15 12 12).
The news intent intensity of the query word "day" level can be obtained by adopting a similar method to the step (1).
(3) And continuously counting the number of times of being queried every week for the last 48 weeks of the query word, and obtaining the news intention strength of the query word 'week' level.
(4) And weighting the news intention intensities based on different granularity levels, and taking the weighted result as the news intention intensity of the query word.
For example, for a query term, if its news intent is strong at the hour level, the news may occur in the last few hours, and its overall news intent is strong. If the news intent is weak on the hour level and strong on the day level, the news may occur on the last 1-2 days, with the overall news intent also being stronger but weaker than the news intent on the hour level. If both the hour-level and day-level news intentions are weak and the week-level news intentions are strong, the news may occur in the last 1-2 weeks, and the overall news intentions may be somewhat weaker. And (3) giving different weights to the news intention intensities of the hour level, the day level and the week level, and adding to obtain the final news intention intensity of the query word.
Further, considering the richness of language expression, for the documents of the same appointed type information, the documents may be recalled by query words of a plurality of different expression modes, and for this case, in another embodiment of the method of the present invention, intent prediction may be further performed on related words of the query words, and the intent strength of the appointed type information of the query words is corrected according to the intent strength of the appointed type information of the related words query, so as to further improve the accuracy of the prediction result. Wherein, the related words of the query word may include: synonyms and/or paraphrasing of the query term.
As shown in fig. 3, another flowchart of a query intent determination method according to an embodiment of the present invention includes the following steps:
step 301, obtaining query words and related words thereof, wherein the related words comprise synonyms and/or paraphrasing.
The query word may be all or part of text in a search sentence input by a user, and the related word may be determined by querying a corresponding dictionary, for example, a synonym dictionary is queried to obtain synonyms of the query word, and a hyponym dictionary is queried to obtain hyponyms of the query word.
For a query term, there may be one or more synonyms or paraphraseology, and if there are multiple synonyms or paraphraseology, the synonyms or paraphraseology may be obtained simultaneously.
Step 302, acquiring historical query information of the query word and historical query information of the related word.
And step 303, respectively determining the intention strength of the query term query specified type information and the intention strength of the related term query specified type information by utilizing the historical query information and a specified type intention strength prediction model.
It should be noted that, for the prediction of the intent strength of the query term query specified type information, according to the method described above, the number of times of querying the query term in different time periods and different unit time granularities may be counted at the same time to obtain the intent strength of the query term based on different granularity levels, and then the intent strength of the query term query specified type information is determined by comprehensively considering the intent strengths based on different granularity levels.
And simultaneously counting the query times of the related words with different time periods and different unit time granularities according to the method for obtaining the intention strength of the related words for querying the appointed type information based on different granularity levels, and then comprehensively considering the intention strength based on different granularity levels to determine the intention strength of the related words for querying the appointed type information. It should be noted that, in the case that there are a plurality of related words corresponding to one query word, the intensity of intention of one or more or all of the related words in the query of the specified type information may be predicted respectively.
And step 304, correcting the intention strength of the query term query specified type information according to the intention strength of the related term query specified type information.
For example, the intention strength of the query term query specific type information may be weighted with the intention strength of the related term query specific type information, and the weighted result may be used as the intention strength of the query term query specific type information. The weight corresponding to each associated term may be determined, for example, based on the historical query times for that associated term. Of course, other modifications may be used, and the embodiments of the present invention are not limited thereto.
In the foregoing description, the related words may include synonyms and paraphraseology, and when the intention strength of the query term query specified type information is corrected in consideration of the difference between the synonyms and the paraphrasing with respect to the related words, different weights may be given to the intention strength of the synonym query specified type information and the intention strength of the paraphrasing query specified type information, so that the corrected intention strength of the query term query specified type information is more accurate.
According to the query intention determining method provided by the embodiment of the invention, the intention of the query specified type information of the query word is predicted according to the historical query information of the query word, the intention of the related word of the query word for the specified type information is predicted, the intention strength of the related word for the query of the specified type information is corrected according to the prediction result of the related word, a more accurate prediction result can be obtained, and particularly, the influence of a single query word on the prediction result can be avoided under the condition that the same document corresponds to multiple semantic expression modes.
Correspondingly, based on the prediction of the intention of the query term to query the specified type information, the embodiment of the invention also provides a searching method, as shown in fig. 4, which is a flow chart of the searching method, comprising the following steps:
Step 401, receiving a search sentence input by a user, and extracting a query word from the search sentence.
Step 402, obtaining search results corresponding to the query term.
The search results may be those obtained by a search engine using some existing search techniques, the content of which may include documents, pictures, audio, video, and the like.
Step 403, determining the intention strength of the query term query specified type information.
And step 404, sorting the search results according to the intention strength of the query term query specified type information.
Of course, when the search results are ranked, the intention strength of the information of the designated type of the query word query can be considered on the basis of the existing ranking rule, for example, if the intention strength of the news information of the query word query is greater than a set value, the news documents in the search results are ranked before other documents; otherwise, the news documents in the search results are arranged behind other documents.
For example, the query term is "Chongqing bus". If the event of 'Chongqing bus falling into river' occurs in the last hours or the last two days, the news intention can be judged to be strong, and the related news documents describing Chongqing bus should be discharged before other documents. If the event of "Chongqing bus falling into the river" occurs a few weeks ago, or there is no news related to Chongqing bus recently, the documents discharged to the front should be the documents of "Chongqing bus timetable", "Chongqing bus line inquiry", etc.
And returning and presenting the sorted search results to the user, so that the search experience of the user can be greatly improved.
Correspondingly, the embodiment of the invention also provides a query intention determining device, and as shown in fig. 5, the device is a structural block diagram of the query intention determining device.
In this embodiment, the apparatus comprises the following modules:
a query term acquisition module 501, configured to receive a query term;
a history information obtaining module 502, configured to obtain history query information of the query word;
a feature obtaining module 503, configured to obtain, according to the historical query information, an intent recognition feature of the query term;
an intention determining module 504, configured to determine an intention strength of the query term for the specified type of information by using an intention recognition feature of the query term and a pre-constructed specified type intention strength prediction model.
The historical query information of the query word mainly refers to the historical query times of the query word. The change condition of the query times can reflect the change condition of the heat of the query words, so that the intention strength of the query words for querying the specified type information can be reflected to a certain extent. Accordingly, the history information obtaining module 502 may specifically count the query times of the query term in at least one unit time within a preset time period, to obtain a query time list.
Accordingly, the feature obtaining module 503 may use the change of the number of queries as the intention recognition feature of the query term. A specific structure of the feature acquisition module 503 may include the following units:
the data conversion unit is used for converting the query times list into a line graph;
and the characteristic determining unit is used for acquiring the change trend characteristic of the line graph and taking the change trend characteristic as the intention recognition characteristic of the query word corresponding to the time period.
The trend characteristic of the line graph may specifically include, but is not limited to: the number of wave crests, the positions of the wave crests, the width of the wave crests and the height of the wave crests; the width of the wave crest comprises the nearest wave crest width in time and/or the widest wave crest width; the height of the peaks may include the most recent peak height in time, and/or the highest peak height.
The intention determining module 504 may input each feature obtained by the feature obtaining module 503 into the intention strength prediction model of the specified type, and obtain the intention strength of the query term for querying the information of the specified type according to the model output.
The model for predicting the strength of the intention of the specified type in the embodiment of the present invention may be pre-constructed by a model construction module (not shown), which may be a part of the apparatus of the present invention or may be independent of the apparatus, which is not limited thereto.
The specified type of intent strength prediction model may employ a GBDT model that inputs intent recognition features for query terms and outputs intent strength scores for the query terms to query specified types of information. The training manner of the specific type intention strength prediction model is similar to that of a conventional GBDT, and is not repeated here.
The query intention determining device provided by the embodiment of the invention does not rely on the document characteristics recalled by the query words to judge the intention, but predicts the intention of the query words for inquiring the appointed type information based on the historical query information of the query words, so that the predicted result can reach higher accuracy, and especially for some query words related to burst news or hot events and the like, the intention of the query words for inquiring the type information can be reflected in time.
It should be noted that, in practical application, for a query term, a plurality of query times with different time periods and different unit time granularities may be counted at the same time to obtain the intent strength of the query term for querying the specified type information based on different granularity levels, and then the intent strength based on different granularity levels is comprehensively considered, for example, the final intent strength is determined through weighting calculation.
FIG. 6 is another block diagram of the query intent determination device according to an embodiment of the present invention.
In comparison with the embodiment shown in fig. 5, in this embodiment, the history information obtaining module 502 needs to obtain the number of queries corresponding to the query term in a plurality of different time periods and different unit time granularities, for example, for one query term, count the number of times the query term is queried every hour in the last 48 hours, the number of times the query term is queried every day in the last week, the number of times the query term is queried every week in the last month, and so on.
Accordingly, the feature obtaining module 503 obtains the intention recognition feature of the query term in each time period according to the historical query information of the query term in the different time periods; the intent determination module 504 determines the intent strength of the query term to query the specified type of information using the intent recognition features of the query term and the specified type of intent strength prediction model over time periods.
In this embodiment, the apparatus further comprises: and the weighting processing module 505 is configured to weight the intent strength of the query specified type information based on different granularity levels obtained by the intent determination module, and use the weighted result as the intent strength of the query specified type information. That is, the intention strength of a plurality of different granularity levels is fused, and the final intention strength of the query word is determined, so that the final prediction result is more accurate.
FIG. 7 is another block diagram showing the structure of the query intention determining apparatus according to the embodiment of the present invention.
Unlike the embodiment shown in fig. 6, in this embodiment, the apparatus further comprises: a related word acquisition module 601 and a correction module 602.
The related word obtaining module 601 is configured to obtain related words of the query word, where the related words include synonyms and/or paraphrasing. For example, synonyms of the query term can be obtained by querying a synonym dictionary, and paraphrasing of the query term can be obtained by querying a paraphrasing dictionary. The synonyms and the paraphraseology may be one or more.
In addition, in this embodiment, the history information acquiring module 502 acquires not only the history query information of the query word but also the history query information of the related word; also, the feature obtaining module 503 needs to obtain the intent recognition feature of the query word according to the historical query information of the query word, and also needs to obtain the intent recognition feature of the associated word according to the historical query information of the associated word; the intention determining module 504 needs to determine the intention strength of the query term for the specified type of information and the intention strength of the related term for the specified type of information according to the intention recognition feature obtained by the feature obtaining module 503.
In this embodiment, the weighting module 505 is an optional module, that is, for a query term and its related term, the intent strength of the query term and its related term query specifying type information may be determined by respectively fusing the intent strengths of the query specifying type information of a plurality of different granularity levels; the historical query information in a single time period can be counted only, and the intention strength of the information of the specified type of each query can be determined; of course, it is also possible to count historical query information in a single time period or a plurality of time periods for only query words or related words, and use the historical query information to obtain the final intention strength of each query word or related word.
In this embodiment, the correction module 602 is configured to correct the intent strength of the query term query specified type information according to the intent strength of the related term query specified type information, for example, the intent strength of the query term query specified type information and the intent strength of the related term query specified type information may be weighted, and the weighted result is used as the intent strength of the query term query specified type information. Of course, other modifications may be used, and the embodiments of the present invention are not limited thereto.
According to the query intention determining device provided by the embodiment of the invention, the intention of the query specified type information of the query word is predicted according to the historical query information of the query word, the intention of the related word of the query word for the specified type information is predicted, the intention strength of the related word for the query of the specified type information is corrected according to the prediction result of the related word, a more accurate prediction result can be obtained, and particularly, the influence of a single query word on the prediction result can be avoided under the condition that the same document corresponds to multiple semantic expression modes.
Correspondingly, based on the query intention determining device of each embodiment, the embodiment of the invention also provides a search engine which can sort the search results according to the intention strength of the predicted query term query specified type information, and return and present the sorted search results to the user, so that the search experience of the user can be greatly improved.
As shown in fig. 8, a block diagram of a search engine according to an embodiment of the present invention includes: search front end 71, search back end 72, and query intent determination device 70 as previously described. Wherein:
the search front end 71 is configured to receive a search term input by a user, and transmit the search term to the search back end;
The search backend 72 is configured to extract a query term from the search sentence, and obtain a search result corresponding to the query term;
the query intention determining device 70 is configured to determine an intention strength of the query term query specifying type information;
the search back-end 72 is also operable to rank the search results according to the intent strength of the query term query specified type information. For example, when the intention strength of the query word for querying news class information is greater than a set value, the news class documents in the search result are arranged before other documents; otherwise, the news documents in the search results are arranged behind other documents.
In this way, the search back end 72 returns the ordered search results to the search front end 71, and the search front end 71 presents the search results to the user, so that the user can conveniently and rapidly find the document which the user wants to query, and the user experience is greatly improved.
FIG. 9 is a block diagram illustrating an apparatus 800 for a query intent determination method, according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 9, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various categories of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.
The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication part 816 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described key-miss-touch error correction method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
The invention also provides a non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform all or part of the steps in the method embodiments of the invention described above.
Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 1900 may vary considerably in configuration or performance and may include one or more central processing units (Central Processing Units, CPU) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) that store applications 1942 or data 1944. Wherein the memory 1932 and storage medium 1930 may be transitory or persistent. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1922 may be provided in communication with a storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A method of query intent determination, the method comprising:
receiving a query term;
counting the query times of the query words in at least one unit time within a preset time period to obtain a query times list;
Converting the query times list into a line graph; the horizontal axis of the line graph represents time, and the vertical axis of the line graph represents statistics times;
acquiring the change trend characteristic of the line graph, and taking the change trend characteristic as the intention recognition characteristic of the query word corresponding to the time period to obtain the intention recognition characteristic of the query word; the change trend characteristics of the line graph comprise the number of wave peaks of the line graph, the positions of the wave peaks, the widths of the wave peaks and the heights of the wave peaks;
and determining the intention strength of the query term for the specified type information by utilizing the intention recognition characteristic of the query term and a pre-constructed specified type intention strength prediction model.
2. The method according to claim 1, wherein the preset time period is a plurality of time periods, the different time periods have inclusion relation, and granularity of unit time in the different time periods is different; the method further comprises the steps of:
and weighting the intention strength of the query words based on the query specified type information of different granularity levels, and taking the weighted result as the intention strength of the query words.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
Acquiring related words of the query words, and determining the intention strength of the related words for querying the appointed type information, wherein the related words comprise synonyms and/or paraphrasing;
and correcting the intention strength of the appointed type information of the query word according to the intention strength of the appointed type information of the related word query.
4. A search method, comprising:
receiving a search sentence input by a user, and extracting a query word from the search sentence;
obtaining search results corresponding to the query terms;
determining the intensity of intent of the query term query specification type information using the method of any one of claims 1 to 3;
and sequencing the search results according to the intention strength of the query specified type information.
5. A query intent determination device, the device comprising:
the query term acquisition module is used for receiving the query terms;
the historical information acquisition module is used for counting the query times of the query words in at least one unit time within a preset time period to obtain a query times list;
the feature acquisition module is used for converting the query times list into a line graph; the horizontal axis of the line graph represents time, and the vertical axis of the line graph represents statistics times; acquiring the change trend characteristic of the line graph, and taking the change trend characteristic as the intention recognition characteristic of the query word corresponding to the time period to obtain the intention recognition characteristic of the query word; the change trend characteristics of the line graph comprise the number of wave peaks of the line graph, the positions of the wave peaks, the widths of the wave peaks and the heights of the wave peaks;
And the intention determining module is used for determining the intention strength of the query term for inquiring the specified type information by utilizing the intention recognition characteristic of the query term and a pre-constructed specified type intention strength prediction model.
6. A search engine, comprising: search front end, search back end, and query intent determination device as claimed in claim 5;
the front search end is used for receiving search sentences input by a user and transmitting the search sentences to the back search end;
the search back end is used for extracting query words from the search sentences and obtaining search results corresponding to the query words;
the query intention determining device is used for determining the intention strength of the query specified type information of the query word;
and the search back end is also used for sorting the search results according to the intention strength of the query term query appointed type information.
7. An electronic device, comprising: one or more processors, memory;
the memory is for storing computer executable instructions and the processor is for executing the computer executable instructions to implement the method of any one of claims 1 to 4.
8. A readable storage medium having stored thereon instructions that are executed to implement the method of any of claims 1 to 4.
CN201811523459.4A 2018-12-13 2018-12-13 Query intention determining method and device, searching method and searching engine Active CN111324805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811523459.4A CN111324805B (en) 2018-12-13 2018-12-13 Query intention determining method and device, searching method and searching engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811523459.4A CN111324805B (en) 2018-12-13 2018-12-13 Query intention determining method and device, searching method and searching engine

Publications (2)

Publication Number Publication Date
CN111324805A CN111324805A (en) 2020-06-23
CN111324805B true CN111324805B (en) 2024-02-13

Family

ID=71162869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811523459.4A Active CN111324805B (en) 2018-12-13 2018-12-13 Query intention determining method and device, searching method and searching engine

Country Status (1)

Country Link
CN (1) CN111324805B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076080B (en) * 2021-04-21 2022-05-17 百度在线网络技术(北京)有限公司 Model training method and device and intention recognition method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1755685A (en) * 2004-09-30 2006-04-05 微软公司 Query formulation
EP1705588A1 (en) * 2005-03-25 2006-09-27 Sony Corporation Content searching method, content list searching method, content searching apparatus, content list searching apparatus, and searching server
CN101246499A (en) * 2008-03-27 2008-08-20 腾讯科技(深圳)有限公司 Network information search method and system
CN101464897A (en) * 2009-01-12 2009-06-24 阿里巴巴集团控股有限公司 Word matching and information query method and device
CN104778176A (en) * 2014-01-13 2015-07-15 阿里巴巴集团控股有限公司 Data search processing method and device
CN105678229A (en) * 2015-12-29 2016-06-15 中国科学院深圳先进技术研究院 High spectral image retrieval method
CN105991699A (en) * 2015-02-06 2016-10-05 北京中搜网络技术股份有限公司 Distributed downloading system of Internet crawlers
CN106484671A (en) * 2015-08-25 2017-03-08 北京中搜网络技术股份有限公司 A kind of recognition methodss of ageing inquiry content
CN106599278A (en) * 2016-12-23 2017-04-26 北京奇虎科技有限公司 Identification method and method of application search intention

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1755685A (en) * 2004-09-30 2006-04-05 微软公司 Query formulation
EP1705588A1 (en) * 2005-03-25 2006-09-27 Sony Corporation Content searching method, content list searching method, content searching apparatus, content list searching apparatus, and searching server
CN101246499A (en) * 2008-03-27 2008-08-20 腾讯科技(深圳)有限公司 Network information search method and system
CN101464897A (en) * 2009-01-12 2009-06-24 阿里巴巴集团控股有限公司 Word matching and information query method and device
CN104778176A (en) * 2014-01-13 2015-07-15 阿里巴巴集团控股有限公司 Data search processing method and device
CN105991699A (en) * 2015-02-06 2016-10-05 北京中搜网络技术股份有限公司 Distributed downloading system of Internet crawlers
CN106484671A (en) * 2015-08-25 2017-03-08 北京中搜网络技术股份有限公司 A kind of recognition methodss of ageing inquiry content
CN105678229A (en) * 2015-12-29 2016-06-15 中国科学院深圳先进技术研究院 High spectral image retrieval method
CN106599278A (en) * 2016-12-23 2017-04-26 北京奇虎科技有限公司 Identification method and method of application search intention

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于用户查询意图的搜索排序算法;张美珍;王治莹;;天津理工大学学报(第03期);第48-53页 *

Also Published As

Publication number Publication date
CN111324805A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
KR101778784B1 (en) Method and device for training classifier, recognizing type
CN107608532B (en) Association input method and device and electronic equipment
WO2017092198A1 (en) Recommendation method and device, and device for recommendation
CN111291069B (en) Data processing method and device and electronic equipment
CN108121736B (en) Method and device for establishing subject term determination model and electronic equipment
CN108345612B (en) Problem processing method and device for problem processing
CN107315487B (en) Input processing method and device and electronic equipment
CN109815396B (en) Search term weight determination method and device
CN111382339B (en) Search processing method and device for search processing
CN111708943B (en) Search result display method and device for displaying search result
CN112784142A (en) Information recommendation method and device
CN110019885B (en) Expression data recommendation method and device
CN112307281A (en) Entity recommendation method and device
CN111324805B (en) Query intention determining method and device, searching method and searching engine
CN109521888B (en) Input method, device and medium
CN109918565B (en) Processing method and device for search data and electronic equipment
CN110110207A (en) A kind of information recommendation method, device and electronic equipment
CN110110046B (en) Method and device for recommending entities with same name
CN112052395B (en) Data processing method and device
CN107301188B (en) Method for acquiring user interest and electronic equipment
CN110020206B (en) Search result ordering method and device
CN112836026B (en) Dialogue-based inquiry method and device
CN110020153B (en) Searching method and device
CN112837813A (en) Automatic inquiry method and device
CN112883295B (en) Data processing method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant