CN107180093B - Information searching method and device and timeliness query word identification method and device - Google Patents

Information searching method and device and timeliness query word identification method and device Download PDF

Info

Publication number
CN107180093B
CN107180093B CN201710340129.0A CN201710340129A CN107180093B CN 107180093 B CN107180093 B CN 107180093B CN 201710340129 A CN201710340129 A CN 201710340129A CN 107180093 B CN107180093 B CN 107180093B
Authority
CN
China
Prior art keywords
query
word
timeliness
determining
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710340129.0A
Other languages
Chinese (zh)
Other versions
CN107180093A (en
Inventor
王天畅
陈英傑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201710340129.0A priority Critical patent/CN107180093B/en
Publication of CN107180093A publication Critical patent/CN107180093A/en
Application granted granted Critical
Publication of CN107180093B publication Critical patent/CN107180093B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The embodiment of the invention provides an information search method and device and a time-dependent query word identification method and device, wherein the information search method comprises the following steps: searching information according to the query words input by the user; judging whether the query word is a target query word or not, wherein the target query word is a query word of which the corresponding information search result has timeliness requirements; and if so, increasing the weight of the timeliness factor to a preset value, sequencing and outputting the searched information based on the preset value, wherein the target query word corresponds to the timeliness factor used for calculating the score of the searched information. According to the embodiment of the invention, the weight of the timeliness factor is increased by judging whether the query word is the target query word, and then the searched information is sequenced and output based on the increased weight, so that the searching requirement of the user on the timeliness information is met.

Description

Information searching method and device and timeliness query word identification method and device
Technical Field
The invention relates to the technical field of network information search, in particular to an information search method and device and a time-efficient query word identification method and device.
Background
As websites evolve, users may find desired information to search by entering query terms in the websites.
The existing information searching mode is as follows: the user inputs the query word, the website searches a plurality of pieces of information corresponding to the query word after receiving the query word, scores of the searched information are calculated based on three dimensions, and the information is sorted and output to the user according to the sequence of the scores from top to bottom.
Wherein the three dimensions include: the relevance dimension is the relevance degree of the searched information and the query word; the quality dimension, i.e. the quality of the found information, is for example: if the information to be searched is a video, the self attribute can be definition, duration and the like; the timeliness dimension, that is, timeliness corresponding to the searched information, for example: if the searched information is a video, the timeliness can be the morning and evening of the video uploading time.
However, the inventor finds that the prior art at least has the following problem in the process of implementing the invention, because the score of the searched information is calculated based on the weight of the dimension factors of three dimensions in the prior art, after the score is obtained, the information which is sorted and displayed to the user according to the order of the scores is displayed according to the order of the quality from high to low, and also displayed according to the order of the relevance or the timeliness from high to low. However, there is a need for a user to have a time-sensitive requirement for the information sought, such as: the query word input by the user is assumed to be a keyword of an entertainment event which occurs recently, which indicates that a video which the user wants to watch is a newer video, that is, a video which is closer to the current time is uploaded, that is, the user has a requirement on timeliness of the searched videos, so that if the score is high in quality at the moment, the searched videos are displayed to the user in the order from high to low in quality, and the user cannot know which of the displayed videos is the newer video.
Therefore, the existing information searching mode cannot meet the searching requirement of the user on the timeliness information.
Disclosure of Invention
The embodiment of the invention aims to provide an information searching method and device and a time-dependent query word identification method and device so as to meet the searching requirement of a user on time-dependent information. The specific technical scheme is as follows:
an information search method, comprising:
searching information according to the query words input by the user;
judging whether the query word is a target query word or not, wherein the target query word is a query word of which the corresponding information search result has timeliness requirements;
and if so, increasing the weight of the timeliness factor to a preset value, sequencing and outputting the searched information based on the preset value, wherein the target query word corresponds to the timeliness factor used for calculating the score of the searched information.
Optionally, the step of determining whether the query term is a target query term includes:
comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
and determining whether the query word is the target query word according to the comparison result.
A time-sensitive query word recognition method comprises the following steps:
obtaining a query word input by a user;
comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
and determining whether the query word is a time-efficient query word according to the comparison result.
Optionally, the process of establishing the timeliness query word bank includes:
obtaining a plurality of reference query words of which the search quantity is greater than a preset search quantity threshold value in a target time period, and calculating a search quantity change value corresponding to each reference query, wherein the target time period is a time period which is a preset time length away from the current time;
determining the reference query words with the search quantity variation value larger than a preset search quantity variation threshold value as candidate query words in the reference query words;
obtaining search result information corresponding to the candidate query words through an information search engine;
acquiring target result information of which the difference value between uploading time and current time is within a first preset range from the search result information;
determining candidate query words corresponding to target result information with the number larger than a preset information number threshold value as first-class query words, and determining the first-class query words as timeliness query words;
and generating a time-efficient query word bank according to the determined time-efficient query words.
Optionally, the step of calculating a search quantity variation value corresponding to each reference query includes:
determining the search quantity variation ratio of each obtained reference query term under different time dimensions;
and determining the search quantity variation value corresponding to each reference query term according to the search quantity variation ratio of each reference query term in different time dimensions.
Optionally, the step of generating a time-dependent query thesaurus according to the determined time-dependent query term includes:
performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments;
taking each reference word as a node, and connecting every two nodes to form an edge;
and taking the target graph which is composed of the plurality of reference participles and comprises a plurality of edges as a time-efficiency query word stock.
Optionally, before the step of determining the first type of query term as a time-sensitive query term, the method further includes:
obtaining a plurality of news titles corresponding to the first type of query words through a news search engine;
determining the first type query words corresponding to the plurality of news titles with the release time within a second preset range or within a third preset range as second type query words;
determining the similarity of a plurality of news titles corresponding to each second type of query word;
the step of determining the first type of query term as a time-sensitive query term includes:
and determining the second type query words corresponding to the news titles with the similarity exceeding the preset similarity threshold as the timeliness query words.
Optionally, the step of performing a word segmentation operation on each timeliness query word to obtain a plurality of reference word segmentations includes:
determining a plurality of news titles corresponding to the timeliness query words;
and performing word segmentation operation on each timeliness query word and the corresponding news titles to obtain a plurality of reference word segments.
Optionally, before the step of obtaining, by an information search engine, search result information corresponding to the candidate query term, the method further includes:
removing candidate query words which accord with a preset type from the candidate query words to obtain third type query words;
the step of obtaining the search result information corresponding to the candidate query term through the information search engine includes:
obtaining search result information corresponding to the third type of query words through an information search engine;
the step of determining the candidate query term corresponding to the target result information with the number larger than the preset information number threshold value as the first type query term includes:
and determining the third type query words corresponding to the target result information with the number larger than the preset information number threshold value as the first type query words.
Optionally, the step of comparing the query term with a pre-established time-dependent query word library includes:
performing word segmentation operation on the query word;
judging whether the number of target word segmentations obtained by word segmentation operation is 1 or not;
if so, judging whether the number of edges corresponding to the target word segmentation is larger than a preset edge threshold value or not according to the target graph;
if not, taking each target participle as a node, connecting every two nodes to form an edge, determining a query graph which is composed of a plurality of target participles and contains a plurality of edges as a graph corresponding to the query term, and calculating the proportion of the edge set contained in the query graph covered by the edge set contained in the target graph;
the step of determining whether the query term has timeliness according to the comparison result comprises the following steps:
when the number of the participles corresponding to the query word obtained by the participle operation is judged to be 1, and when the number of the edges of the target participles corresponding to the target image is greater than a preset edge threshold value, determining the query word to be a time-efficient query word;
and when the number of the participles corresponding to the query word obtained by the participle operation is judged not to be 1, and when the proportion is greater than a preset coverage threshold value, determining that the query word is a time-efficient query word.
An information search apparatus comprising:
the search module is used for searching information according to the query words input by the user;
the judging module is used for judging whether the query word is a target query word or not, wherein the target query word is a query word of which the corresponding information search result has timeliness requirements, and if so, the sequencing module is triggered;
the ranking module is used for increasing the weight of the timeliness factor to a preset value, ranking the searched information based on the preset value and outputting the ranked information, wherein the target query word corresponds to the timeliness factor used for calculating the score of the searched information.
Optionally, the determining module includes:
the comparison unit is used for comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
and the timeliness determining unit is used for determining whether the query word is the target query word according to the comparison result.
A time-dependent query term recognition apparatus comprising:
the obtaining module is used for obtaining the query words input by the user;
the comparison module is used for comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
and the timeliness determining module is used for determining whether the query word is a timeliness query word according to the comparison result.
Optionally, the apparatus further includes an establishing module, where the establishing module is configured to establish the time-based query thesaurus, and the establishing module includes:
the device comprises a calculating unit, a searching unit and a searching unit, wherein the calculating unit is used for obtaining a plurality of reference query words of which the searching amount is greater than a preset searching amount threshold value in a target time period, and calculating a searching amount change value corresponding to each reference query, and the target time period is a time period which is a preset time length away from the current time;
the candidate query term determining unit is used for determining the reference query terms of which the search quantity change values are larger than a preset search quantity change threshold value as the candidate query terms;
the information obtaining unit is used for obtaining search result information corresponding to the candidate query words through an information search engine;
the target result information acquisition unit is used for acquiring target result information of which the difference value between the uploading time and the current time is within a first preset range from the search result information;
the timeliness query word determining unit is used for determining candidate query words corresponding to the target result information of which the number is larger than a preset information number threshold value as a first type of query words and determining the first type of query words as timeliness query words;
and the generating unit is used for generating a timeliness query word bank according to the determined timeliness query words.
Optionally, the computing unit includes:
the search quantity change ratio determining subunit determines the obtained search quantity change ratio of each reference query term in different time dimensions;
and the search quantity change value determining subunit is used for determining the search quantity change value corresponding to each reference query word according to the search quantity change ratio of each reference query word in different time dimensions.
Optionally, the generating unit includes:
the word segmentation determining subunit is used for performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments;
the edge determining subunit is used for taking each reference word segmentation as a node, and every two nodes are connected to form an edge;
and the timeliness query word stock generation subunit is used for taking the target graph which is composed of the plurality of reference participles and comprises a plurality of edges as the timeliness query word stock.
Optionally, the apparatus further comprises:
the news title determining module is used for obtaining a plurality of news titles corresponding to the first type of query words through a news search engine before the first type of query words are determined as timeliness query words;
the second type query term determining module is used for determining the first type query terms corresponding to a plurality of news titles of which the difference values between the release time and the current time are within a second preset range or the release time is within a third preset range as second type query terms;
the similarity determining module is used for determining the similarity of a plurality of news titles corresponding to each second type of query word;
the timeliness query term determination unit is specifically configured to:
and determining the second type query words corresponding to the news titles with the similarity exceeding the preset similarity threshold as the timeliness query words.
Optionally, the word segmentation determining subunit is specifically configured to:
determining a plurality of news titles corresponding to the timeliness query words;
and performing word segmentation operation on each timeliness query word and the corresponding news titles to obtain a plurality of reference word segments.
Optionally, the apparatus further comprises:
the third type query term determining module is used for removing candidate query terms which accord with a preset type from the candidate query terms to obtain third type query terms before obtaining search result information corresponding to the candidate query terms through an information search engine;
the information obtaining unit is specifically configured to:
obtaining search result information corresponding to the third type of query words through an information search engine;
the timeliness query term determination unit is specifically configured to:
and determining the third type query words corresponding to the target result information with the number larger than the preset information number threshold value as the first type query words.
Optionally, the comparison module includes:
the word segmentation unit is used for performing word segmentation operation on the query word;
the first judgment unit is used for judging whether the number of the target word segmentation obtained by the word segmentation operation is 1, if so, the second judgment unit is triggered, and if not, the proportion determination unit is triggered;
the second judging unit is configured to judge whether the number of edges corresponding to the target word segmentation is greater than a preset edge threshold value according to the target graph;
the ratio determining unit is configured to take each target participle as a node, connect every two nodes to form an edge, determine a query graph including multiple edges and composed of multiple target participles as a graph corresponding to the query term, and calculate a ratio of an edge set included in the query graph to be covered by an edge set included in the target graph;
the timeliness determination module includes:
a first determining unit, configured to determine, when the number of the participles corresponding to the query word obtained by the participle operation is judged to be 1, that the query word is a time-efficient query word when the number of edges of the target participle corresponding to the target word in the target graph is greater than a preset edge threshold;
and the second determining unit is used for determining that the query word is a time-efficient query word when the ratio is greater than a preset coverage threshold value when the number of the participles corresponding to the query word obtained by the word segmentation operation is judged not to be 1.
In yet another aspect, the present invention also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform any one of the above-mentioned methods.
In yet another aspect, the present invention also provides a computer program product containing instructions, which when run on a computer, causes the computer to perform any of the methods described above.
In the embodiment of the invention, information search is carried out according to the query words input by a user, whether the query words are the target query words of which the corresponding information search results have timeliness requirements is judged, and if so, the weight of the timeliness factor is increased to the preset value, and the searched information is sorted and output based on the preset value, so that the weight of the timeliness factor is increased by judging whether the query words are the target query words, and then the searched information is sorted and output based on the increased weight, and the search requirements of the user on the timeliness information are met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic flowchart of an information search method according to an embodiment of the present invention;
FIG. 2 is a first flowchart illustrating a time-dependent query term recognition method according to an embodiment of the present invention;
FIG. 3 is a first flowchart illustrating the process of establishing a time-sensitive query lexicon according to an embodiment of the present invention;
FIG. 4 is a second flowchart illustrating the process of establishing a time-sensitive query lexicon according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a temporal query thesaurus;
FIG. 6 is a diagram illustrating a third process of establishing a temporal query thesaurus according to an embodiment of the present invention;
FIG. 7 is a fourth flowchart illustrating the process of establishing a temporal query thesaurus according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a query graph;
fig. 9 is a schematic structural diagram of an information search apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a time-dependent query term recognition apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In order to solve the problems in the prior art, the embodiment of the invention provides an information searching method and device and a time-dependent query word identification method and device.
First, an information search method provided by an embodiment of the present invention is described below.
As shown in fig. 1, an information searching method provided in an embodiment of the present invention may include:
s101: and searching information according to the query words input by the user.
At present, a user searches information to be searched in a mode of inputting a query word in a website, and after the user inputs the query word in the website, the user can obtain the query word input by the user and search information according to the query word, wherein the website can be a video website, a news website and the like.
S102: and judging whether the query word is the target query word, if so, executing the step S102, and if not, not performing any processing.
In order to determine whether the user has a requirement on timeliness for the searched information, it is necessary to determine whether the query term input by the user is a target query term, where the target query term is a query term for which the corresponding information search result has a requirement on timeliness.
The above-mentioned determining whether the query term is the target query term may be: and determining whether the query words are target query words or not by comparing the query words with a pre-established timeliness query word bank. S102 may include:
comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
and determining whether the query word is the target query word according to the comparison result.
Because the query words with corresponding information search results and timeliness requirements are stored in the pre-established timeliness query word library, after the query words are compared with the pre-established timeliness query words, whether the query words are target query words or not can be determined according to the comparison result.
In detail, the specific implementation manner for determining whether the query term is the target query term may refer to the effective query term identification method provided by the present invention, and is not described herein again.
S103: and increasing the weight of the timeliness factor to a preset value, sequencing and outputting the searched information based on the preset value, wherein the target query word corresponds to the timeliness factor used for calculating the score of the searched information.
After the query word is determined to be the target query word, the requirement of the user on timeliness of the searched information is indicated, and the target query word corresponds to a timeliness factor used for calculating the score of the searched information, so that the weight of the timeliness factor can be increased to a preset value, and the searched information is sequenced and output based on the preset value in order that the output information can be output according to the morning and evening of the time to meet the requirement of the user.
The searched information is sorted and output based on the preset value, and the method may include: and calculating the scores of the searched information based on the preset value, the weight of the quality factor and the weight of the correlation factor, and sequencing and outputting the searched information according to the sequence of the obtained scores from high to low.
In the embodiment of the invention, information search is carried out according to the query words input by a user, whether the query words are the target query words of which the corresponding information search results have timeliness requirements is judged, and if so, the weight of the timeliness factor is increased to the preset value, and the searched information is sorted and output based on the preset value, so that the weight of the timeliness factor is increased by judging whether the query words are the target query words, and then the searched information is sorted and output based on the increased weight, and the search requirements of the user on the timeliness information are met.
The following introduces a time-dependent query word recognition method provided by the embodiment of the present invention.
It should be noted that the method for identifying the timeliness query term provided by the embodiment of the present invention may be applied to the field of information search, so as to facilitate information search and meet the search requirement of a user on timeliness information, for example: the method is applied to the information searching method shown in FIG. 1.
As shown in fig. 2, a method for identifying a time-dependent query term according to an embodiment of the present invention may include:
s201: and obtaining the query words input by the user.
S202: comparing the query words with a pre-established timeliness query word bank;
s203: and determining whether the query word is a time-efficient query word according to the comparison result.
The method comprises the steps of storing query words with corresponding information search results having timeliness requirements in a pre-established timeliness query word bank, comparing the query words with the pre-established timeliness query word bank in order to identify whether the query words input by a user are timeliness query words, and determining whether the query words are timeliness query words according to comparison results, wherein the corresponding query words with the information search results having timeliness requirements are stored in the timeliness query word bank.
In the embodiment of the invention, whether the query word is the timeliness query word is determined by comparing the query word input by the user with the timeliness query word library established in advance.
Referring to fig. 3, the process of establishing the time-dependent query thesaurus may be:
s301: and obtaining a plurality of reference query words of which the search quantity is greater than a preset search quantity threshold value in a target time period, and calculating a search quantity change value corresponding to each reference query.
General timeliness query terms are time dependent, for example: the method includes the steps that a query word corresponding to a news event which occurs recently or a query word corresponding to a novel which is updated recently occurs recently, when a news event occurs, a large number of users use the query word related to the news event to search so as to obtain information of the news event, or after a novel is updated, a large number of users use the query word related to the novel to search so as to obtain information of the novel, and at the moment, the search amount of the query word used by the users is increased suddenly.
Therefore, in order to establish a time-dependent query word library, a plurality of reference query words with a search amount greater than a preset search amount threshold within a target time period may be obtained, where the target time period is a time period with a preset duration from the current time, for example: the target time period is the latest hour, and if the current time is 8:00 and the preset time duration is 1 hour, the target time period is 8:00-9: 00.
After obtaining a plurality of reference query terms, in order to determine whether the search volume of the reference query terms increases, a search volume variation value corresponding to each reference query needs to be calculated.
The calculating the search quantity variation value corresponding to each reference query may include:
determining the search quantity variation ratio of each obtained reference query term under different time dimensions;
and determining the search quantity variation value corresponding to each reference query term according to the search quantity variation ratio of each reference query term in different time dimensions.
Wherein, assuming that the target time period is the last hour, the different time dimensions may include: at least two of the last hour, the same hour of the previous day, and the same hour of the previous week, for example: assuming that the current time is 3 months, 8 days and 8:00, and the preset time duration is 1 hour, the target time period is 8:00-9:00, and the last hour is: 7:00-8:00, 3 months, 7 days, 8:00-9:00 in the same hour in the previous day, and 3 months, 1 day, 8:00-9:00 in the same hour in the previous week.
In one implementation, the search quantity variation value corresponding to each reference query term is calculated by the following formula:
imptrend=∑witri
wherein, imptrendI is a time dimension tr for a search variable value corresponding to a reference query wordiFor reference to a search volume variation ratio, w, of a query term in the i time dimensioniI is the weight corresponding to the time dimension.
Since the search volume generally changes from the previous hour or from the same hour as the previous day, possibly only due to the increase of the number of users who surf the internet, the weight corresponding to the time dimension of the same hour of the previous week is set to be the highest in order to improve the accuracy of the calculated search volume change value corresponding to the reference query word.
S302: and determining the reference query words with the search quantity variation value larger than a preset search quantity variation threshold value as candidate query words in the reference query words.
After calculating the search quantity change value corresponding to each reference query word, determining the reference query words with the search quantity change values larger than a preset search quantity change threshold value as candidate query words, namely determining the reference query words with the suddenly increased search quantity as the candidate query words.
S303: and obtaining search result information corresponding to the candidate query words through an information search engine.
After determining the candidate query term, because the determined candidate query term may include a non-timeliness query term, that is, the determined candidate query term is not related to any event that has recently occurred, the candidate query term needs to be filtered, and the manner of filtering the candidate query term may include: filtering is performed by the information search engine.
In detail, the candidate query term is input into an information search engine to obtain search result information corresponding to the candidate query term, where the information search engine may be: a video search engine or a novel search engine, etc.
In addition, since the number of search result information obtained by the information search engine is large, in order to increase the calculation speed, a first preset number of search result information ranked in the front may be extracted to perform the subsequent steps, wherein the first preset number may be 40 or 80, and the like.
S304: and acquiring target result information of which the difference value between the uploading time and the current time is within a first preset range in the search result information.
If no or few newer results appear in the search result information, it is indicated that the candidate query term does not have much newly uploaded information, and the candidate query term is not a time-efficient query term, so that after the search result information corresponding to the candidate query term is obtained, the target result information in the search result information, in which the difference between the uploading time and the current time is within the first preset range, is obtained.
For example: assuming that the current time is 3 months, 8 days and 8:00, the first preset range is 1 hour, the candidate query terms are A and B, the candidate query terms A and B are input into the information search engine, and the obtained query results are shown in the following table:
candidate query term A Search result information C Search result information D Search result information E
Candidate query term B Search result information F Search result information G Search result information H
The uploading time of each piece of search result information is shown in the following table:
Figure BDA0001295022390000131
Figure BDA0001295022390000141
then the target result information of the difference between the uploading time and the current time within 1 hour is obtained as follows: search result information C, search result information D, and search result information E.
S305: and determining candidate query words corresponding to the target result information with the number larger than a preset information number threshold value as the first type of query words.
Since no or few newer results appear in the search result information, it is indicated that the candidate query term does not have much newly uploaded information, and the candidate query term is not a time-efficient query term, therefore, after the target result information is obtained, the candidate query terms corresponding to the target result information with the quantity greater than the preset information quantity threshold are determined as the first type of query terms, that is, the candidate query terms with more newer results are determined as the first type of query terms.
For example: in connection with the above example, assuming that the preset information data threshold is 2, since the number of the target result information corresponding to the candidate query word a is 3 and the number of the target result information corresponding to the candidate query word B is 1, the candidate query word a is determined as the first type query word.
S306: and determining the first type of query words as time-efficient query words.
After the first type of query terms are determined, the first type of query terms are determined as timeliness query terms.
S307: and generating a time-efficient query word bank according to the determined time-efficient query words.
After the timeliness query words are determined, a timeliness query word bank can be generated according to the determined timeliness query words.
The mode of generating the timeliness query word bank according to the determined timeliness query word has multiple modes:
the first mode is as follows: and storing the determined timeliness query words into a database, and determining the database as a timeliness query word bank.
The second mode is as follows: and performing word segmentation operation on the determined timeliness query word, generating a graph model, and determining the graph model as a timeliness query word bank.
In the case of the second mode, referring to fig. 4, another flow chart illustrating the establishment of the time-dependent query thesaurus in the embodiment of the present invention is shown, where steps S301 to S306 are the same as those in fig. 3, and S307 may include:
s3071: performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments;
s3072: taking each reference word as a node, and connecting every two nodes to form an edge;
s3073: and taking a target graph which is composed of a plurality of reference participles and comprises a plurality of edges as a time-efficiency query word stock.
Since the query terms entered by different users may be different for the same event, for example: inquiring about the spring peak event, wherein the inquiry words input by the user A are as follows: when the user B returns home in spring, the query words input by the user B are as follows: the peak before the spring festival, therefore, in order to store the query words expressing the same event in the time-efficient query word bank, word segmentation operation needs to be carried out on each time-efficient query word to obtain a plurality of reference word segments, each reference word segment is used as a node, every two nodes are connected to form a side, and a target graph which is composed of the reference word segments and comprises a plurality of sides is used as the time-efficient query word bank.
For example: referring to fig. 5, assume that there are two time-sensitive query terms, which are: and a timeliness query word M: spring return home peak and timeliness query term N: spring festival peak;
performing word segmentation operation on the timeliness query word M and the timeliness query word N to obtain a plurality of reference word segments: spring transportation, home returning, peak, spring festival;
taking each reference participle as a node, 4 nodes exist, and every two nodes are connected to form an edge, so that 6 edges are total;
and taking a target graph consisting of 4 reference participles and containing 6 edges as a time-sensitive query word stock.
In addition, after the time-dependent query word is subjected to word segmentation operation, a word segmentation with poor importance may exist in the obtained multiple word segmentations, at this time, the word segmentation with poor importance can be filtered out, and a key word segmentation is obtained and used as a reference word segmentation, wherein the key word segmentation can be a proper noun, a name of a person, a place name or a dynamic noun, and the like.
For example: the timeliness query term is: performing word segmentation operation to obtain word segmentation at the peak of returning home in the last year: in the last year, home, peak, the "last year" and "peak" can be filtered out to obtain the word "home", "peak" and the word "home", "peak" as the reference word.
After determining the first category of query terms, in order to make the time-efficient query term library contain sufficient time-efficient query terms, further expanding the query term library by using news headlines, specifically referring to fig. 6, where steps S301-S305 and S3072-S3073 are the same as those in fig. 4, and before step S306 in fig. 6, further including:
s601: and obtaining a plurality of news titles corresponding to the first type of query words through a news search engine.
After the first type of query terms are determined, in order to determine more accurate timeliness query terms, the first type of query terms can be further filtered through a news search engine, and the first type of query terms are input into the news search engine, so that a plurality of news titles corresponding to the first type of query terms are obtained.
In addition, since the number of news headlines obtained by the news search engine is large, in order to increase the calculation speed, a second preset number of news headlines ranked in the front may be extracted to perform the subsequent steps, wherein the second preset number may be 10.
S602: and determining the first type query words corresponding to the plurality of news titles with the release time within a second preset range or within a third preset range as second type query words.
If the distribution time of the obtained plurality of news headlines is dispersed, for example: a news headline H, a news headline J, and a news headline J are obtained, wherein the release time of the news headline H is 3 months, 1 day, 8:00, the release time of news headlines J and are 1 month, 1 day, 8:00, the release time of the news title J is 9 months, 1 day, 8:00, the release time of the three news headlines is distributed in different months in a more dispersed manner, which indicates that the first type query words corresponding to the news headlines are not the query words related to hot news events;
alternatively, if the distribution time of the obtained plurality of news headlines is earlier, for example: current time is 3 months and 24 days 8 in 2017: 00, obtaining a news title H, a news title J and a news title J, wherein the release time of the news title H is 3 months, 1 days and 8 days in 2014: 00, the release time of news headlines J and J is 2014, 1 month, 1 day, 8:00, the release time of news headline J is 2014, 9, 1, 8: in 00 and 2014, it is described that the first-class query words corresponding to the multiple news titles are not the query words related to the popular news events, and therefore, after the multiple news titles corresponding to the first-class query words are obtained, the first-class query words corresponding to the multiple news titles with the release time within the second preset range or the first-class query words corresponding to the multiple news titles with the release time within the third preset range are determined as the second-class query words.
S603: and determining the similarity of a plurality of news titles corresponding to each second type of query word.
After the second type query term is determined, because the plurality of corresponding news titles may not be related to the determined second type query term, if the plurality of news titles are not related, it is indicated that the second type query term corresponding to the plurality of news titles is not the query term of the same news event, and therefore, after the second type query term is determined, the similarity of the plurality of news titles corresponding to each second type query term needs to be determined.
In one implementation, the similarity of the news headlines corresponding to each second-type query term is calculated by the following formula:
S=∑similarity(Di,Dj)
Figure BDA0001295022390000171
wherein S is the similarity of a plurality of news titles corresponding to the second type of query word, DiFor i news headlines, DjIs the j news headline, similarity (D)i,Dj) For the similarity of the i news headline and the j news headline,
Figure BDA0001295022390000172
direction after word segmentation operation for i news headlineThe expression of the quantity is carried out,
Figure BDA0001295022390000173
and carrying out vector expression after word segmentation operation on the j news title.
Step S306 in fig. 6 may include:
and determining the second type query words corresponding to the news titles with the similarity exceeding the preset similarity threshold as the timeliness query words.
After the similarity of the multiple news titles corresponding to each second-class query word is determined, if the determined similarity is high, the multiple news titles are related and are the query words of the same news event, and therefore the second-class query words corresponding to the multiple news titles with the similarity exceeding the preset similarity threshold value can be determined as time-efficient query words.
Step S3071 in fig. 6 may include:
determining a plurality of news titles corresponding to the timeliness query words;
and performing word segmentation operation on each timeliness query word and the corresponding news titles to obtain a plurality of reference word segments.
Because descriptions of the same event by users may vary widely, and input query terms also vary widely, other query terms of the event may not be effectively identified only by a time-efficient query word library generated by one or a few query terms, so that more descriptions of the event represented by the query terms are required.
When the second type of query word is determined as the time-efficient query word, the multiple news titles corresponding to the time-efficient query word are query words of the same news event, so that the multiple news titles corresponding to the time-efficient query word can be used as the expanded linguistic data, after the multiple news titles corresponding to the time-efficient query word are determined, the word segmentation operation is performed on each time-efficient query word and the corresponding multiple news titles to obtain multiple reference word segments, so that a time-efficient query word bank can be generated later, wherein the mode for subsequently generating the time-efficient query word bank is the same as that in fig. 4, and the description is omitted here.
After determining the candidate query term, in order to determine the first type of query term more accurately, the candidate query term may be further filtered by matching a preset type of query term, specifically referring to fig. 7, where steps S301 to S302, S304, and S306 to S3073 are all the same as those in fig. 6, and before step S303 in fig. 7, the method may further include:
s701: and removing the candidate query words which accord with the preset type from the candidate query words to obtain a third type of query words.
There may also be cases where the amount of search is increasing dramatically due to the specificity of certain types of query terms, such as: pornographic query terms, but query terms of this type are obviously not time-sensitive query terms.
Therefore, after the candidate query terms are determined, the candidate query terms conforming to the preset type need to be removed from the candidate query terms to obtain a third type of query terms, where a mode of determining the candidate query terms of the preset type from the candidate query terms may be fuzzy matching, and as fuzzy matching is the prior art, details are not repeated here, and in addition, a mode of determining the candidate query terms of the preset type may also be any matching mode in the prior art, and details are not repeated here.
Step S303 of fig. 7 may include:
and obtaining search result information corresponding to the third type of query words through an information search engine.
Step S305 of fig. 7 may include:
and determining the third type query words corresponding to the target result information with the number larger than the preset information number threshold value as the first type query words.
After the third type query term corresponding to the target result information with the number larger than the preset information number threshold is determined as the first type query term, the manner of subsequently generating the time-efficient query word library is the same as that in fig. 6, and is not repeated here.
After the time-based query word library is pre-established in the manner shown in fig. 7, the time-based query word identification method provided by the embodiment of the present invention, in step S202, may include:
performing word segmentation operation on the query word;
judging whether the number of target word segmentations obtained by word segmentation operation is 1 or not;
if so, judging whether the number of edges corresponding to the target word segmentation is larger than a preset edge threshold value or not according to the target graph;
if not, taking each target participle as a node, connecting every two nodes to form an edge, determining a query graph which is composed of a plurality of target participles and contains a plurality of edges as a graph corresponding to the query term, and calculating the proportion of the edge set contained in the query graph covered by the edge set contained in the target graph;
step S203 may include:
when the number of the participles corresponding to the query word obtained by the participle operation is judged to be 1, and when the number of the edges of the target participles corresponding to the target image is greater than a preset edge threshold value, determining the query word as a time-efficient query word;
and when the number of the participles corresponding to the query word obtained by the participle operation is judged not to be 1 and the proportion is larger than a preset coverage threshold value, determining the query word as a time-efficient query word.
Under the condition of generating a time-efficient query word library in a graph model mode, after the query words input by a user are obtained, word segmentation operation can be carried out on the query words.
In the target graph, after the word segmentation operation is performed on each timeliness query word, each obtained reference word is used as a node, and every two nodes are connected to form a side, so that if a certain query word is formed by only one target word, and the number of the sides connected with the node where the target word is located is determined to be large according to the target graph, the query word formed by the target word is indicated to be the timeliness query word.
Therefore, after the word segmentation operation is performed on the query word, whether the number of the target word segments obtained by the word segmentation operation is 1 or not is judged, if yes, whether the number of edges corresponding to the target word segments is larger than a preset edge threshold or not is judged according to the target graph, and if yes, the query word is determined to be a time-efficient query word.
For example: assuming that a pre-established time-dependent query word library is shown in fig. 5, a query word input by a user is a peak, and a preset edge threshold is 2;
performing word segmentation operation on the peak to obtain a target word segmentation peak, determining the number of the target word segmentation to be 1, judging whether the number of edges corresponding to the target word segmentation peak is greater than a preset edge threshold value 2 according to a target graph shown in fig. 5, and determining the query word peak to be a time-efficient query word because the number of edges corresponding to the target word segmentation peak is 3.
When the number of the obtained target participles is judged to be not 1, each target participle can be used as a node, every two nodes are connected to form an edge, a query graph which is composed of a plurality of target participles and contains a plurality of edges is determined as a graph corresponding to the query term, and if the proportion of the edge set contained in the query graph covered by the edge set contained in the target graph is large, the query term corresponding to the query graph is indicated to be a time-efficient query term.
Therefore, after the query graph is obtained, the proportion of the edge set included in the query graph covered by the edge set included in the target graph is calculated, and when the proportion is greater than a preset covering threshold value, the query term is determined to be a time-efficient query term, wherein the preset covering threshold value can be 0.5.
For example: assuming that a pre-established time-efficiency query word library is shown in fig. 5, the query word input by the user is 'peak of going home in spring', and the preset coverage threshold is 0.5;
performing word segmentation operation on the 'peak of going back to home in spring', obtaining target words 'spring transportation', 'home' and 'peak', determining that the number of the target words is not 1, referring to fig. 8, taking each target word 'spring transportation', 'home' and 'peak' as a node, connecting every two nodes to form an edge, and determining a query graph which is composed of a plurality of target words and comprises a plurality of edges as a graph corresponding to the query word;
according to fig. 5 and 8, calculating the proportion of the edge set contained in the query graph covered by the edge set contained in the target graph;
the query graph comprises 3 edges, and the formed edge set is as follows:
"spring transportation-peak", "spring transportation-home" and "home-peak";
the target graph contains 6 edges, and the set of edges formed is:
"spring transportation-peak", "spring transportation-returning home", "returning home-peak", "spring transportation-spring festival", "returning home-spring festival" and "spring festival-peak";
the ratio of the edge set included in the query graph to the edge set included in the target graph is 3/6-0.5, so that the query word "spring return home peak" is determined not to be a time-efficient query word.
In addition, after the word segmentation operation is performed on the query word input by the user, a word segmentation with poor importance may exist in the obtained multiple word segmentations, at this time, the word segmentation with poor importance may be filtered out, and a key word segmentation is obtained as the target word segmentation, wherein the key word segmentation may be a proper noun, a name of a person, a name of a place, a dynamic noun, or the like.
With respect to the above method embodiment, as shown in fig. 9, an embodiment of the present invention further provides an information search apparatus, where the apparatus may include:
a search module 901, configured to perform information search according to a query word input by a user;
a judging module 902, configured to judge whether the query term is a target query term, where the target query term is a query term for which a corresponding information search result has a timeliness requirement, and if so, trigger the sorting module;
the sorting module 903 is configured to increase the weight of the timeliness factor to a preset value, sort and output the searched information based on the preset value, where the target query word corresponds to the timeliness factor used for calculating the score of the searched information.
In the embodiment of the invention, information search is carried out according to the query words input by a user, whether the query words are the target query words of which the corresponding information search results have timeliness requirements is judged, and if so, the weight of the timeliness factor is increased to the preset value, and the searched information is sorted and output based on the preset value, so that the weight of the timeliness factor is increased by judging whether the query words are the target query words, and then the searched information is sorted and output based on the increased weight, and the search requirements of the user on the timeliness information are met.
In an implementation manner, the determining module 902 may include:
the comparison unit is used for comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
and the timeliness determining unit is used for determining whether the query word is the target query word according to the comparison result.
With respect to the foregoing method embodiment, as shown in fig. 10, an embodiment of the present invention further provides a device for identifying a time-dependent query term, where the device may include:
an obtaining module 1001, configured to obtain a query term input by a user;
a comparison module 1002, configured to compare the query term with a pre-established timeliness query word library, where a query term whose corresponding information search result has timeliness requirements is stored in the timeliness query word library;
and a timeliness determining module 1003, configured to determine whether the query term is a timeliness query term according to the comparison result.
In the embodiment of the invention, whether the query word is the timeliness query word is determined by comparing the query word input by the user with the timeliness query word library established in advance.
In one implementation manner, the apparatus further includes an establishing module, where the establishing module is configured to establish the time-dependent query thesaurus, and the establishing module may include:
the device comprises a calculating unit, a searching unit and a searching unit, wherein the calculating unit is used for obtaining a plurality of reference query words of which the searching amount is greater than a preset searching amount threshold value in a target time period, and calculating a searching amount change value corresponding to each reference query, and the target time period is a time period which is a preset time length away from the current time;
the candidate query term determining unit is used for determining the reference query terms of which the search quantity change values are larger than a preset search quantity change threshold value as the candidate query terms;
the information obtaining unit is used for obtaining search result information corresponding to the candidate query words through an information search engine;
the target result information acquisition unit is used for acquiring target result information of which the difference value between the uploading time and the current time is within a first preset range from the search result information;
the timeliness query word determining unit is used for determining candidate query words corresponding to the target result information of which the number is larger than a preset information number threshold value as a first type of query words and determining the first type of query words as timeliness query words;
and the generating unit is used for generating a timeliness query word bank according to the determined timeliness query words.
In one implementation, the computing unit may include:
the search quantity change ratio determining subunit determines the obtained search quantity change ratio of each reference query term in different time dimensions;
and the search quantity change value determining subunit is used for determining the search quantity change value corresponding to each reference query word according to the search quantity change ratio of each reference query word in different time dimensions.
In one implementation, the generating unit includes:
the word segmentation determining subunit is used for performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments;
the edge determining subunit is used for taking each reference word segmentation as a node, and every two nodes are connected to form an edge;
and the timeliness query word stock generation subunit is used for taking the target graph which is composed of the plurality of reference participles and comprises a plurality of edges as the timeliness query word stock.
In an implementation manner, the time-based query term recognition apparatus provided in the embodiment of the present invention may further include:
the news title determining module is used for obtaining a plurality of news titles corresponding to the first type of query words through a news search engine before the first type of query words are determined as timeliness query words;
the second type query term determining module is used for determining the first type query terms corresponding to a plurality of news titles of which the difference values between the release time and the current time are within a second preset range or the release time is within a third preset range as second type query terms;
the similarity determining module is used for determining the similarity of a plurality of news titles corresponding to each second type of query word;
the timeliness query term determination unit is specifically configured to:
and determining the second type query words corresponding to the news titles with the similarity exceeding the preset similarity threshold as the timeliness query words.
In an implementation manner, the word segmentation determining subunit may be specifically configured to:
determining a plurality of news titles corresponding to the timeliness query words;
and performing word segmentation operation on each timeliness query word and the corresponding news titles to obtain a plurality of reference word segments.
In an implementation manner, the time-based query term recognition apparatus provided in the embodiment of the present invention may further include:
the third type query term determining module is used for removing candidate query terms which accord with a preset type from the candidate query terms to obtain third type query terms before obtaining search result information corresponding to the candidate query terms through an information search engine;
the information obtaining unit is specifically configured to:
obtaining search result information corresponding to the third type of query words through an information search engine;
the timeliness query term determination unit is specifically configured to:
and determining the third type query words corresponding to the target result information with the number larger than the preset information number threshold value as the first type query words.
In one implementation, the comparison module 1002 may include:
the word segmentation unit is used for performing word segmentation operation on the query word;
the first judgment unit is used for judging whether the number of the target word segmentation obtained by the word segmentation operation is 1, if so, the second judgment unit is triggered, and if not, the proportion determination unit is triggered;
the second judging unit is configured to judge whether the number of edges corresponding to the target word segmentation is greater than a preset edge threshold value according to the target graph;
the ratio determining unit is configured to take each target participle as a node, connect every two nodes to form an edge, determine a query graph including multiple edges and composed of multiple target participles as a graph corresponding to the query term, and calculate a ratio of an edge set included in the query graph to be covered by an edge set included in the target graph;
the timeliness determination module 1002 may include:
a first determining unit, configured to determine, when the number of the participles corresponding to the query word obtained by the participle operation is judged to be 1, that the query word is a time-efficient query word when the number of edges of the target participle corresponding to the target word in the target graph is greater than a preset edge threshold;
and the second determining unit is used for determining that the query word is a time-efficient query word when the ratio is greater than a preset coverage threshold value when the number of the participles corresponding to the query word obtained by the word segmentation operation is judged not to be 1.
In yet another embodiment, a computer-readable storage medium is provided, having stored thereon instructions, which, when executed on a computer, cause the computer to perform the method of any of the above embodiments.
In a further embodiment provided by the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the methods described above.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (12)

1. An information search method, comprising:
searching information according to the query words input by the user;
judging whether the query word is a target query word or not, wherein the target query word is a query word of which the corresponding information search result has timeliness requirements;
if so, increasing the weight of the timeliness factor to a preset value, calculating the score of the searched information based on the preset value, the weight of the quality factor and the weight of the correlation factor, and sequencing and outputting the searched information according to the sequence of the obtained scores from high to low, wherein the target query word corresponds to the timeliness factor used for calculating the score of the searched information;
the step of judging whether the query term is a target query term comprises:
comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
determining whether the query word is a target query word according to the comparison result;
the process of establishing the timeliness query word bank comprises the following steps:
obtaining a plurality of reference query words of which the search quantity is greater than a preset search quantity threshold value in a target time period, and calculating a search quantity change value corresponding to each reference query, wherein the target time period is a time period which is a preset time length away from the current time;
determining the reference query words with the search quantity variation value larger than a preset search quantity variation threshold value as candidate query words in the reference query words;
obtaining search result information corresponding to the candidate query words through an information search engine;
acquiring target result information of which the difference value between uploading time and current time is within a first preset range from the search result information;
determining candidate query words corresponding to target result information with the number larger than a preset information number threshold value as first-class query words, and determining the first-class query words as timeliness query words;
generating a timeliness query word bank according to the determined timeliness query words;
the step of generating a time-dependent query thesaurus according to the determined time-dependent query words comprises the following steps:
performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments;
taking each reference word as a node, and connecting every two nodes to form an edge;
using a target graph which is composed of the plurality of reference participles and comprises a plurality of edges as a time-efficiency query word stock;
the step of performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments comprises the following steps:
determining a plurality of news titles corresponding to the timeliness query words;
and performing word segmentation operation on each timeliness query word and the corresponding news titles to obtain a plurality of reference word segments.
2. A time-based query word recognition method is characterized by comprising the following steps:
obtaining a query word input by a user;
comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
determining whether the query word is a time-efficient query word according to the comparison result;
the process of establishing the timeliness query word bank comprises the following steps:
obtaining a plurality of reference query words of which the search quantity is greater than a preset search quantity threshold value in a target time period, and calculating a search quantity change value corresponding to each reference query, wherein the target time period is a time period which is a preset time length away from the current time;
determining the reference query words with the search quantity variation value larger than a preset search quantity variation threshold value as candidate query words in the reference query words;
obtaining search result information corresponding to the candidate query words through an information search engine;
acquiring target result information of which the difference value between uploading time and current time is within a first preset range from the search result information;
determining candidate query words corresponding to target result information with the number larger than a preset information number threshold value as first-class query words, and determining the first-class query words as timeliness query words;
generating a timeliness query word bank according to the determined timeliness query words;
the step of generating a time-dependent query thesaurus according to the determined time-dependent query words comprises the following steps:
performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments;
taking each reference word as a node, and connecting every two nodes to form an edge;
using a target graph which is composed of the plurality of reference participles and comprises a plurality of edges as a time-efficiency query word stock;
the step of performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments comprises the following steps:
determining a plurality of news titles corresponding to the timeliness query words;
and performing word segmentation operation on each timeliness query word and the corresponding news titles to obtain a plurality of reference word segments.
3. The method of claim 2, wherein the step of calculating the search volume variation value corresponding to each reference query comprises:
determining the search quantity variation ratio of each obtained reference query term under different time dimensions;
and determining the search quantity variation value corresponding to each reference query term according to the search quantity variation ratio of each reference query term in different time dimensions.
4. The method of claim 2, wherein prior to the step of determining the first type of query term as a time-sensitive query term, the method further comprises:
obtaining a plurality of news titles corresponding to the first type of query words through a news search engine;
determining the first type query words corresponding to the plurality of news titles with the release time within a second preset range or within a third preset range as second type query words;
determining the similarity of a plurality of news titles corresponding to each second type of query word;
the step of determining the first type of query term as a time-sensitive query term includes:
and determining the second type query words corresponding to the news titles with the similarity exceeding the preset similarity threshold as the timeliness query words.
5. The method according to any one of claims 2-4, wherein before the step of obtaining, by an information search engine, search result information corresponding to the candidate query term, the method further comprises:
removing candidate query words which accord with a preset type from the candidate query words to obtain third type query words;
the step of obtaining the search result information corresponding to the candidate query term through the information search engine includes:
obtaining search result information corresponding to the third type of query words through an information search engine;
the step of determining the candidate query term corresponding to the target result information with the number larger than the preset information number threshold value as the first type query term includes:
and determining the third type query words corresponding to the target result information with the number larger than the preset information number threshold value as the first type query words.
6. The method of claim 2, wherein the step of comparing the query term to a pre-established time-sensitive query thesaurus comprises:
performing word segmentation operation on the query word;
judging whether the number of target word segmentations obtained by word segmentation operation is 1 or not;
if so, judging whether the number of edges corresponding to the target word segmentation is larger than a preset edge threshold value or not according to the target graph;
if not, taking each target participle as a node, connecting every two nodes to form an edge, determining a query graph which is composed of a plurality of target participles and contains a plurality of edges as a graph corresponding to the query term, and calculating the proportion of the edge set contained in the query graph covered by the edge set contained in the target graph;
the step of determining whether the query term has timeliness according to the comparison result comprises the following steps:
when the number of the participles corresponding to the query word obtained by the participle operation is judged to be 1, and when the number of the edges of the target participles corresponding to the target image is greater than a preset edge threshold value, determining the query word to be a time-efficient query word;
and when the number of the participles corresponding to the query word obtained by the participle operation is judged not to be 1, and when the proportion is greater than a preset coverage threshold value, determining that the query word is a time-efficient query word.
7. An information search apparatus, comprising:
the search module is used for searching information according to the query words input by the user;
the judging module is used for judging whether the query word is a target query word or not, wherein the target query word is a query word of which the corresponding information search result has timeliness requirements, and if so, the sequencing module is triggered;
the ranking module is used for increasing the weight of the timeliness factor to a preset value, calculating the score of the searched information based on the preset value, the weight of the quality factor and the weight of the correlation factor, ranking the searched information according to the sequence of the obtained scores from high to low, and outputting the ranked information, wherein the target query word corresponds to the timeliness factor used for calculating the score of the searched information;
the judging module comprises:
the comparison unit is used for comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
the timeliness determining unit is used for determining whether the query word is a target query word according to the comparison result;
the establishing module is used for establishing the timeliness query word bank, and comprises:
the device comprises a calculating unit, a searching unit and a searching unit, wherein the calculating unit is used for obtaining a plurality of reference query words of which the searching amount is greater than a preset searching amount threshold value in a target time period, and calculating a searching amount change value corresponding to each reference query, and the target time period is a time period which is a preset time length away from the current time;
the candidate query term determining unit is used for determining the reference query terms of which the search quantity change values are larger than a preset search quantity change threshold value as the candidate query terms;
the information obtaining unit is used for obtaining search result information corresponding to the candidate query words through an information search engine;
the target result information acquisition unit is used for acquiring target result information of which the difference value between the uploading time and the current time is within a first preset range from the search result information;
the timeliness query word determining unit is used for determining candidate query words corresponding to the target result information of which the number is larger than a preset information number threshold value as a first type of query words and determining the first type of query words as timeliness query words;
the generating unit is used for generating a timeliness query word bank according to the determined timeliness query words;
the generation unit includes:
the word segmentation determining subunit is used for performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments;
the edge determining subunit is used for taking each reference word segmentation as a node, and every two nodes are connected to form an edge;
the timeliness query word stock generation subunit is used for taking a target graph which is composed of the plurality of reference participles and comprises a plurality of edges as a timeliness query word stock;
the word segmentation determining subunit is specifically configured to:
determining a plurality of news titles corresponding to the timeliness query words;
and performing word segmentation operation on each timeliness query word and the corresponding news titles to obtain a plurality of reference word segments.
8. A time-dependent query word recognition apparatus, comprising:
the obtaining module is used for obtaining the query words input by the user;
the comparison module is used for comparing the query words with a pre-established timeliness query word bank, wherein the timeliness query word bank stores the query words of which the corresponding information search results have timeliness requirements;
the timeliness determining module is used for determining whether the query word is a timeliness query word according to the comparison result;
the establishing module is used for establishing the timeliness query word bank, and comprises:
the device comprises a calculating unit, a searching unit and a searching unit, wherein the calculating unit is used for obtaining a plurality of reference query words of which the searching amount is greater than a preset searching amount threshold value in a target time period, and calculating a searching amount change value corresponding to each reference query, and the target time period is a time period which is a preset time length away from the current time;
the candidate query term determining unit is used for determining the reference query terms of which the search quantity change values are larger than a preset search quantity change threshold value as the candidate query terms;
the information obtaining unit is used for obtaining search result information corresponding to the candidate query words through an information search engine;
the target result information acquisition unit is used for acquiring target result information of which the difference value between the uploading time and the current time is within a first preset range from the search result information;
the timeliness query word determining unit is used for determining candidate query words corresponding to the target result information of which the number is larger than a preset information number threshold value as a first type of query words and determining the first type of query words as timeliness query words;
the generating unit is used for generating a timeliness query word bank according to the determined timeliness query words;
the generation unit includes:
the word segmentation determining subunit is used for performing word segmentation operation on each timeliness query word to obtain a plurality of reference word segments;
the edge determining subunit is used for taking each reference word segmentation as a node, and every two nodes are connected to form an edge;
the timeliness query word stock generation subunit is used for taking a target graph which is composed of the plurality of reference participles and comprises a plurality of edges as a timeliness query word stock;
the word segmentation determining subunit is specifically configured to:
determining a plurality of news titles corresponding to the timeliness query words;
and performing word segmentation operation on each timeliness query word and the corresponding news titles to obtain a plurality of reference word segments.
9. The apparatus of claim 8, wherein the computing unit comprises:
the search quantity change ratio determining subunit determines the obtained search quantity change ratio of each reference query term in different time dimensions;
and the search quantity change value determining subunit is used for determining the search quantity change value corresponding to each reference query word according to the search quantity change ratio of each reference query word in different time dimensions.
10. The apparatus of claim 8, further comprising:
the news title determining module is used for obtaining a plurality of news titles corresponding to the first type of query words through a news search engine before the first type of query words are determined as timeliness query words;
the second type query term determining module is used for determining the first type query terms corresponding to a plurality of news titles of which the difference values between the release time and the current time are within a second preset range or the release time is within a third preset range as second type query terms;
the similarity determining module is used for determining the similarity of a plurality of news titles corresponding to each second type of query word;
the timeliness query term determination unit is specifically configured to:
and determining the second type query words corresponding to the news titles with the similarity exceeding the preset similarity threshold as the timeliness query words.
11. The apparatus according to any one of claims 8-10, further comprising:
the third type query term determining module is used for removing candidate query terms which accord with a preset type from the candidate query terms to obtain third type query terms before obtaining search result information corresponding to the candidate query terms through an information search engine;
the information obtaining unit is specifically configured to:
obtaining search result information corresponding to the third type of query words through an information search engine;
the timeliness query term determination unit is specifically configured to:
and determining the third type query words corresponding to the target result information with the number larger than the preset information number threshold value as the first type query words.
12. The apparatus of claim 8, wherein the comparison module comprises:
the word segmentation unit is used for performing word segmentation operation on the query word;
the first judgment unit is used for judging whether the number of the target word segmentation obtained by the word segmentation operation is 1, if so, the second judgment unit is triggered, and if not, the proportion determination unit is triggered;
the second judging unit is configured to judge whether the number of edges corresponding to the target word segmentation is greater than a preset edge threshold value according to the target graph;
the ratio determining unit is configured to take each target participle as a node, connect every two nodes to form an edge, determine a query graph including multiple edges and composed of multiple target participles as a graph corresponding to the query term, and calculate a ratio of an edge set included in the query graph to be covered by an edge set included in the target graph;
the timeliness determination module includes:
a first determining unit, configured to determine, when the number of the participles corresponding to the query word obtained by the participle operation is judged to be 1, that the query word is a time-efficient query word when the number of edges of the target participle corresponding to the target word in the target graph is greater than a preset edge threshold;
and the second determining unit is used for determining that the query word is a time-efficient query word when the ratio is greater than a preset coverage threshold value when the number of the participles corresponding to the query word obtained by the word segmentation operation is judged not to be 1.
CN201710340129.0A 2017-05-15 2017-05-15 Information searching method and device and timeliness query word identification method and device Active CN107180093B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710340129.0A CN107180093B (en) 2017-05-15 2017-05-15 Information searching method and device and timeliness query word identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710340129.0A CN107180093B (en) 2017-05-15 2017-05-15 Information searching method and device and timeliness query word identification method and device

Publications (2)

Publication Number Publication Date
CN107180093A CN107180093A (en) 2017-09-19
CN107180093B true CN107180093B (en) 2020-05-19

Family

ID=59831222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710340129.0A Active CN107180093B (en) 2017-05-15 2017-05-15 Information searching method and device and timeliness query word identification method and device

Country Status (1)

Country Link
CN (1) CN107180093B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241486A (en) * 2018-09-14 2019-01-18 拉扎斯网络科技(上海)有限公司 Data analysing method, device, equipment and computer storage medium
CN111309999B (en) * 2018-12-11 2023-05-16 阿里巴巴集团控股有限公司 Method and device for generating interactive scene content
CN111310069B (en) * 2018-12-11 2023-09-26 阿里巴巴集团控股有限公司 Evaluation method and device for timeliness search
CN111310018B (en) * 2018-12-11 2024-03-01 阿里巴巴集团控股有限公司 Method for determining timeliness search vocabulary and search engine
CN111310017B (en) * 2018-12-11 2023-05-12 阿里巴巴集团控股有限公司 Method and device for generating time-efficient scene content
CN111488516A (en) * 2019-01-28 2020-08-04 北京字节跳动网络技术有限公司 Searching method and device based on aging words
CN110427381A (en) * 2019-08-07 2019-11-08 北京嘉和海森健康科技有限公司 A kind of data processing method and relevant device
CN110489525B (en) * 2019-08-09 2022-02-25 腾讯科技(深圳)有限公司 Search result acquisition method and device, storage medium and electronic device
CN111291171B (en) * 2020-01-21 2023-05-16 南方电网能源发展研究院有限责任公司 Dangerous engineering risk data searching method
CN111881170B (en) * 2020-07-14 2023-10-27 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for mining timeliness query content field
CN112084774B (en) 2020-09-08 2021-07-20 百度在线网络技术(北京)有限公司 Data search method, device, system, equipment and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073684A (en) * 2010-12-22 2011-05-25 百度在线网络技术(北京)有限公司 Method and device for excavating search log and page search method and device
CN105653705A (en) * 2015-12-30 2016-06-08 北京奇艺世纪科技有限公司 Hot event searching method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8605956B2 (en) * 2009-11-18 2013-12-10 Google Inc. Automatically mining person models of celebrities for visual search applications
CN102004792B (en) * 2010-12-07 2012-10-10 百度在线网络技术(北京)有限公司 Method and system for generating hot-searching word
CN106484671B (en) * 2015-08-25 2019-05-28 北京中搜云商网络技术有限公司 A kind of recognition methods of timeliness inquiry content

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073684A (en) * 2010-12-22 2011-05-25 百度在线网络技术(北京)有限公司 Method and device for excavating search log and page search method and device
CN105653705A (en) * 2015-12-30 2016-06-08 北京奇艺世纪科技有限公司 Hot event searching method and device

Also Published As

Publication number Publication date
CN107180093A (en) 2017-09-19

Similar Documents

Publication Publication Date Title
CN107180093B (en) Information searching method and device and timeliness query word identification method and device
US10423648B2 (en) Method, system, and computer readable medium for interest tag recommendation
US9846744B2 (en) Media discovery and playlist generation
KR101098703B1 (en) System and method for identifying related queries for languages with multiple writing systems
CN110569496B (en) Entity linking method, device and storage medium
US20160070803A1 (en) Conceptual product recommendation
US20070220037A1 (en) Expansion phrase database for abbreviated terms
US20080294628A1 (en) Ontology-content-based filtering method for personalized newspapers
CN109033101B (en) Label recommendation method and device
US20110145226A1 (en) Product similarity measure
JP2017508214A (en) Provide search recommendations
JP2008542951A (en) Relevance network
US10152478B2 (en) Apparatus, system and method for string disambiguation and entity ranking
KR20150036117A (en) Query expansion
US8838618B1 (en) System and method for identifying feature phrases in item description information
CN115905489B (en) Method for providing bidding information search service
CN111475725A (en) Method, apparatus, device, and computer-readable storage medium for searching for content
KR20080037413A (en) On line context aware advertising apparatus and method
JP5952711B2 (en) Prediction server, program and method for predicting future number of comments in prediction target content
JP2006318398A (en) Vector generation method and device, information classifying method and device, and program, and computer readable storage medium with program stored therein
CN103226601A (en) Method and device for image search
CN108140034B (en) Selecting content items based on received terms using a topic model
CN114003799A (en) Event recommendation method, device and equipment
CN108509449B (en) Information processing method and server
CN116610853A (en) Search recommendation method, search recommendation system, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant