CN115757923B - Method and device for determining search hotword, computer equipment and storage medium - Google Patents

Method and device for determining search hotword, computer equipment and storage medium Download PDF

Info

Publication number
CN115757923B
CN115757923B CN202310024195.2A CN202310024195A CN115757923B CN 115757923 B CN115757923 B CN 115757923B CN 202310024195 A CN202310024195 A CN 202310024195A CN 115757923 B CN115757923 B CN 115757923B
Authority
CN
China
Prior art keywords
search
platform
browser
hot word
hotword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310024195.2A
Other languages
Chinese (zh)
Other versions
CN115757923A (en
Inventor
朱建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Developer Technology Co ltd
Beijing Innovation Lezhi Network Technology Co ltd
Original Assignee
Changsha Developer Technology Co ltd
Beijing Innovation Lezhi Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Developer Technology Co ltd, Beijing Innovation Lezhi Network Technology Co ltd filed Critical Changsha Developer Technology Co ltd
Priority to CN202310024195.2A priority Critical patent/CN115757923B/en
Publication of CN115757923A publication Critical patent/CN115757923A/en
Application granted granted Critical
Publication of CN115757923B publication Critical patent/CN115757923B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, computer equipment and a storage medium for determining search hotwords, wherein the method comprises the following steps: generating search hot words according to each historical search data to obtain a search hot word set of the platform; generating search hotwords according to each jump keyword data set to obtain a third-party search hotword set; generating search hot words according to each first browser record and each second browser record to obtain a browser search hot word set, wherein the first browser record and the second browser record comprise: platform identification, browser caching and browsing history; and carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set in sequence to obtain the target searching hot word set. Therefore, the method and the device realize the determination of the search hotword based on the jump of the platform and the third-party platform and the browser record of the browser of the user, improve the comprehensiveness of the search hotword and provide a basis for improving the search efficiency.

Description

Method and device for determining search hotword, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for determining a search hotword, a computer device, and a storage medium.
Background
With the development of computer technology, the demand for search services has increased. In order to improve the searching efficiency, the searching results corresponding to the searching hot words are cached in advance. The current method for determining the search hotword is extracted according to the historical search data of the platform (the platform for short) of the organization, and only the platform is concerned with the method, so that the comprehensiveness of the search hotword is poor, and the search efficiency is influenced.
Disclosure of Invention
Based on the above, only the platform needs to be focused on aiming at the current method for determining the search hotword, which results in the technical problems that the comprehensiveness of the search hotword is poor and the search efficiency is affected, and a method, a device, computer equipment and a storage medium for determining the search hotword are provided.
The application provides a method for determining search hotwords, which comprises the following steps:
acquiring historical search data corresponding to each platform user, and generating search hot words according to each historical search data to obtain a platform search hot word set;
acquiring a skip keyword data set sent by each third party platform, and generating search hotwords according to each skip keyword data set to obtain a third party search hotword set;
Obtaining a first browser record corresponding to each user of the platform, obtaining a second browser record corresponding to each user of the third party platform, and generating a search hot word according to each first browser record and each second browser record to obtain a browser search hot word set, wherein the first browser record and the second browser record both comprise: platform identification, browser caching and browsing history;
and carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set in sequence to obtain a target searching hot word set.
Further, the step of generating a search hotword according to each first browser record and each second browser record to obtain a browser search hotword set includes:
inputting each single record data in each first browser record into a preset classification model for classification prediction to obtain a first classification result, searching the first classification result from all the first classification results according to preset classification configuration, taking the first classification result as a first hit result set, extracting keywords from each single record data corresponding to the first hit result set to obtain a first data set, extracting subjects from each single record data corresponding to the first hit result set to obtain a second data set, and sequentially carrying out aggregation and deduplication processing on the first data set and the second data set to obtain first attribute information;
Inputting each single record data in each second browser record into the classification model for classification prediction to obtain a second classification result, searching the second classification result from each second classification result according to the classification configuration, and taking the second classification result as a second hit result set, extracting keywords from each single record data corresponding to the second hit result set to obtain a third data set, extracting subjects from each single record data corresponding to the second hit result set to obtain a fourth data set, and sequentially carrying out aggregation and deduplication processing on the third data set and the fourth data set to obtain second attribute information;
and generating a search hotword according to the first attribute information and the second attribute information to obtain the browser search hotword set.
Further, after the steps of sequentially performing aggregation and deduplication processing on the platform search hot word set, the third party search hot word set and the browser search hot word set to obtain a target search hot word set, the method further includes:
taking any one of the third party platforms as a target platform;
extracting each search hotword corresponding to the target platform from the third-party search hotword set to be used as a first search hotword set;
Deleting each search hotword corresponding to the first search hotword set in the target search hotword set to obtain a second search hotword set;
and sending the second search hot word set to the target platform.
Further, the step of sending the second search hot word set to the target platform includes:
acquiring a service tag set corresponding to the target platform;
labeling each search hotword in the second search hotword set according to a preset label mapping table;
finding out each search hotword with a label in the service label set from the second search hotword set to be used as a third search hotword set;
and synchronizing the target platform according to the third searching hot word set.
Further, the step of sequentially performing aggregation and deduplication processing on the platform search hot word set, the third party search hot word set and the browser search hot word set to obtain a target search hot word set includes:
sequentially carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set to obtain a searching hot word set to be processed;
judging whether each search hotword in the to-be-processed search hotword set is positioned in each corresponding article label in a preset article database;
Taking each searching hotword in the corresponding article labels in the article database as a hit hotword;
and taking each hit hot word as the target search hot word set.
Further, after the steps of sequentially performing aggregation and deduplication processing on the platform search hot word set, the third party search hot word set and the browser search hot word set to obtain a target search hot word set, the method further includes:
carrying out data clearing processing on a preset high-frequency request identifier;
updating the target search hot word set to the high frequency request identifier;
in a preset target cache, carrying out clearing processing on associated data of search hot words which do not contain the target search hot word set;
responding to a search request according to the high-frequency request identifier and the target cache.
Further, the step of responding to a search request according to the high frequency request identifier and the target cache includes:
acquiring the search request, wherein the search request carries request content;
inputting the request content into the high-frequency request identifier to identify whether the high-frequency request is high-frequency or not, and obtaining an identification result;
If the identification result is negative, a preset search service is called according to the request content to search a preset service database, and a target search result is obtained;
if the identification result is yes, searching a search hotword from the target cache according to the request content to obtain a search result, if the search result is successful, taking a search result corresponding to the search result in the target cache as the target search result, and if the search result is failed, calling the search service to search the service database according to the request content to obtain the target search result, and updating the request content and the target search result as associated data into the target cache;
and sending the target search result to a calling object corresponding to the search request.
The application also provides a device for determining search hotwords, which comprises:
the platform searching hot word set determining module is used for acquiring historical searching data corresponding to each platform user, and generating searching hot words according to each historical searching data to obtain the platform searching hot word set;
the third-party searching hot word set determining module is used for obtaining the skip keyword data set sent by each third-party platform, generating searching hot words according to each skip keyword data set and obtaining a third-party searching hot word set;
The browser searching hot word set determining module is used for obtaining first browser records corresponding to users of the platform, obtaining second browser records corresponding to users of the third party platform, generating searching hot words according to the first browser records and the second browser records, and obtaining a browser searching hot word set, wherein the first browser records and the second browser records comprise: platform identification, browser caching and browsing history;
and the target searching hot word set determining module is used for sequentially carrying out collection and de-duplication processing on the platform searching hot word set, the third-party searching hot word set and the browser searching hot word set to obtain the target searching hot word set.
The application also proposes a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
acquiring historical search data corresponding to each platform user, and generating search hot words according to each historical search data to obtain a platform search hot word set;
Acquiring a skip keyword data set sent by each third party platform, and generating search hotwords according to each skip keyword data set to obtain a third party search hotword set;
obtaining a first browser record corresponding to each user of the platform, obtaining a second browser record corresponding to each user of the third party platform, and generating a search hot word according to each first browser record and each second browser record to obtain a browser search hot word set, wherein the first browser record and the second browser record both comprise: platform identification, browser caching and browsing history;
and carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set in sequence to obtain a target searching hot word set.
The present application also proposes a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring historical search data corresponding to each platform user, and generating search hot words according to each historical search data to obtain a platform search hot word set;
Acquiring a skip keyword data set sent by each third party platform, and generating search hotwords according to each skip keyword data set to obtain a third party search hotword set;
obtaining a first browser record corresponding to each user of the platform, obtaining a second browser record corresponding to each user of the third party platform, and generating a search hot word according to each first browser record and each second browser record to obtain a browser search hot word set, wherein the first browser record and the second browser record both comprise: platform identification, browser caching and browsing history;
and carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set in sequence to obtain a target searching hot word set.
According to the method for determining the search hot words, the target search hot word set is obtained by sequentially carrying out aggregation and deduplication processing on the local platform search hot word set obtained based on historical search data of the local platform, the third party search hot word set obtained based on the jump keyword data set of the third party platform and the browser search hot word set obtained based on the browser record of the user, so that the jump of the local platform and the third party platform and the browser record of the browser of the user are realized, the search hot words are determined, the comprehensiveness of the search hot words is improved, and a foundation is provided for improving the search efficiency.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
FIG. 1 is a flow diagram of a method of determining a search hotword in one embodiment;
FIG. 2 is a block diagram of a device for determining a search hotword in one embodiment;
FIG. 3 is a block diagram of a computer device in one embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 1, in one embodiment, a method of determining a search hotword is provided. The method for determining the search hotword specifically comprises the following steps:
S1: acquiring historical search data corresponding to each platform user, and generating search hot words according to each historical search data to obtain a platform search hot word set;
the user of the platform is the user of the platform of the organization.
The historical search data is search data in a preset duration by taking the current time as the end time. The historical search data includes: user identification, search time, and requested content. The requested content is a combination of keywords that the search request wants to search. The keyword combination includes one keyword or a combination of a plurality of keywords.
The search hotword may be directly used as the request content, or may be used as a part of the request content.
The platform searches the hot word set comprises: and searching hot words and platform identification. The platform identification may be a platform name, a platform ID, etc. that uniquely identifies a platform.
Specifically, the historical search data corresponding to each own platform user input by the user can be obtained, and the historical search data corresponding to each own platform user can also be obtained from a preset storage space.
The method comprises the steps of performing word segmentation on historical search data, generating word frequencies of each phrase according to each phrase obtained by word segmentation, extracting N word frequencies which are ranked at the front from each word frequency, taking the phrase corresponding to each extracted word frequency as a search hot word, taking N search hot words as a search hot word set of the platform, and N is an integer larger than 0.
S2: acquiring a skip keyword data set sent by each third party platform, and generating search hotwords according to each skip keyword data set to obtain a third party search hotword set;
the third party platform is a platform of an organization other than the organization. The user searches on the third-party platform, the third-party platform calls the third-party platform to acquire a search result, the user clicks the search result displayed by the third-party platform, and the third-party platform jumps to the page of the third-party platform. When the third party platform jumps to the page of the platform, the keyword combination corresponding to the page is fed back to the platform and searched, and the keyword combination is used as jump keyword data.
The third party searching the hot word set includes: a subset of hotwords and platform identifications is searched, each subset of platform identifications containing at least one platform identification.
The jump key data set contains at least 0 jump key data.
Specifically, the skip keyword data set sent by each third party platform and input by the user can be obtained, and the skip keyword data set sent by each third party platform can also be obtained from a preset storage space.
The method comprises the steps of performing word segmentation on each skip keyword data set, generating word frequencies of each phrase according to each phrase obtained by word segmentation, extracting M word frequencies which are ranked at the front from each word frequency, taking the phrase corresponding to each extracted word frequency as a search hot word, taking M search hot words as a third party search hot word set, and M is an integer larger than 0.
S3: obtaining a first browser record corresponding to each user of the platform, obtaining a second browser record corresponding to each user of the third party platform, and generating a search hot word according to each first browser record and each second browser record to obtain a browser search hot word set, wherein the first browser record and the second browser record both comprise: platform identification, browser caching and browsing history;
the browser searching the hot word set includes: the hotword and platform identification subset are searched.
The first browser record and the second browser record are both browser records. The browser records include a platform identification, a browser cache, and a browsing history.
The Browser cache, also called Browser Caching, is used to accelerate browsing, where the Browser stores recently requested documents on the user disk, and when the visitor requests the page again, the Browser can display the documents from the local disk, so that the browsing of the page can be accelerated.
The browsing history refers to temporary storage information of websites once browsed by the browser in the computer, and the retention time of the history in the browser can be changed through attribute setting of the browser. By looking at the history records, it is possible to know which websites the user has visited, list the history records in a time-ordered, name-ordered, address-ordered, letter-ordered manner, and even arrange the history records according to the number of visits. Parents can see which websites children have visited at home by accessing the history. By clicking on the history icon on the toolbar above the browser, the history may be browsed. The history of the netscape company browser may be viewed by selecting "history" in the "tools".
Specifically, a first browser record corresponding to each user of the present platform and sent by each third party platform input by the user may be obtained, or a first browser record corresponding to each user of the present platform may be obtained from a preset storage space. The second browser record corresponding to each user corresponding to each third party platform can be obtained, and the second browser record corresponding to each user corresponding to each third party platform can be obtained from a preset storage space.
The method comprises the steps of segmenting all data corresponding to each first browser record and each second browser record, generating word frequencies of each phrase according to each phrase obtained through segmentation, extracting J word frequencies which are ranked at the front from each word frequency, taking the phrase corresponding to each extracted word frequency as a search hotword, taking J search hotwords as a browser search hotword set, and J is an integer larger than 0.
S4: and carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set in sequence to obtain a target searching hot word set.
Specifically, the platform search hot word set, the third party search hot word set and the browser search hot word set are integrated to obtain a to-be-processed set, each search hot word in the to-be-processed set is subjected to de-duplication processing, and the to-be-processed set after de-duplication processing is used as a target search hot word set.
According to the embodiment, the target search hot word set is obtained by sequentially carrying out aggregation and deduplication on the historical search hot word set obtained based on the historical search data of the platform, the third party search hot word set obtained based on the jump keyword data set of the third party platform and the browser search hot word set obtained based on the browser record of the user, so that the jump of the platform and the third party platform and the browser record of the browser of the user are used for determining the search hot word, the comprehensiveness of the search hot word is improved, and a foundation is provided for improving the search efficiency.
In one embodiment, the step of generating a search hotword according to each of the first browser records and each of the second browser records to obtain a browser search hotword set includes:
s31: inputting each single record data in each first browser record into a preset classification model for classification prediction to obtain a first classification result, searching the first classification result from all the first classification results according to preset classification configuration, taking the first classification result as a first hit result set, extracting keywords from each single record data corresponding to the first hit result set to obtain a first data set, extracting subjects from each single record data corresponding to the first hit result set to obtain a second data set, and sequentially carrying out aggregation and deduplication processing on the first data set and the second data set to obtain first attribute information;
The classification model is a two-class or multi-class model. The classification model is a model based on neural network training. The network structure and training method of the classification model may be selected from the prior art, and will not be described in detail herein.
Specifically, inputting each single record data in each first browser record into a preset classification model for classification prediction, and taking a classification label corresponding to a vector element with the largest value in the predicted vector as a first classification result, wherein the first classification result is a classification result aiming at one single record data; searching the first classification results from all the first classification results according to preset classification configuration, and taking the first classification results as a first hit result set, so that first classification results conforming to the classification of the platform are selected; extracting keywords from the single record data corresponding to the first hit result set based on preset keyword extraction configuration, and taking each extracted keyword as a first data set; inputting each single record data corresponding to the first hit result set into a preset topic extraction model to perform topic extraction, and taking all topics obtained by extraction as a second data set; and carrying out aggregation on the first data set and the second data set, and carrying out de-duplication processing on the aggregation obtained by the aggregation to obtain first attribute information.
Keywords in the first data set are words related to the business of the platform.
The topic extraction model is used for extracting topics in the text, wherein the topics express the semantics of the text. The topic extraction model is a model based on neural network training.
S32: inputting each single record data in each second browser record into the classification model for classification prediction to obtain a second classification result, searching the second classification result from each second classification result according to the classification configuration, and taking the second classification result as a second hit result set, extracting keywords from each single record data corresponding to the second hit result set to obtain a third data set, extracting subjects from each single record data corresponding to the second hit result set to obtain a fourth data set, and sequentially carrying out aggregation and deduplication processing on the third data set and the fourth data set to obtain second attribute information;
specifically, inputting each single record data in each second browser record into a preset classification model for classification prediction, and taking a vector element corresponding classification label with the largest value in the predicted vector as a second classification result, wherein the second classification result is a classification result aiming at one single record data; searching the second classification results from the second classification results according to preset classification configuration to be used as a second hit result set, so that the second classification results conforming to the classification of the platform are selected; extracting keywords from the single record data corresponding to the second hit result set based on preset keyword extraction configuration, and taking each extracted keyword as a third data set; inputting each single record data corresponding to the second hit result set into a preset topic extraction model to perform topic extraction, and taking all topics obtained by extraction as a fourth data set; and sequentially integrating the third data set and the fourth data set, and performing de-duplication processing on the integrated set to obtain second attribute information.
S33: and generating a search hotword according to the first attribute information and the second attribute information to obtain the browser search hotword set.
Specifically, the first attribute information and the second attribute information are segmented, word frequencies of each phrase are generated according to each phrase obtained through word segmentation, J word frequencies which are ranked at the top are extracted from each word frequency, the phrase corresponding to each extracted word frequency is used as a searching hot word, J searching hot words are used as a searching hot word set of the browser, and J is an integer larger than 0.
The embodiment realizes that the single record data conforming to the classification of the platform is screened out based on classification configuration, so that the finally determined browser searching hot word set is the searching hot word conforming to the classification of the platform, and the browser searching hot word set conforms to the service requirement of the platform; and keyword extraction and theme extraction are respectively carried out on the screened single record data conforming to the classification of the platform, so that the comprehensiveness of the browser in searching the hot word set is improved.
In one embodiment, after the steps of sequentially performing the aggregation and deduplication processing on the platform search hot word set, the third party search hot word set and the browser search hot word set to obtain a target search hot word set, the method further includes:
S51: taking any one of the third party platforms as a target platform;
specifically, any one of the third party platforms is taken as a target platform.
S52: extracting each search hotword corresponding to the target platform from the third-party search hotword set to be used as a first search hotword set;
and searching a platform identifier subset containing the platform identifier corresponding to the target platform in the browser searching hot word set, taking the searching hot words corresponding to each searched platform identifier subset in the browser searching hot word set as first searching hot words, and taking all the first searching hot words as the first searching hot word set.
S53: deleting each search hotword corresponding to the first search hotword set in the target search hotword set to obtain a second search hotword set;
specifically, each search hotword corresponding to the first search hotword set is deleted in the target search hotword set, and the target search hotword set after deletion is used as a second search hotword set.
S54: and sending the second search hot word set to the target platform.
According to the method and the device, each search hotword which is not skipped by the target platform in the target search hotword set is sent to the target platform, so that the search hotword library of the target platform is enriched, and the search performance of the target platform is improved.
In one embodiment, the step of sending the second search hot word set to the target platform includes:
s541: acquiring a service tag set corresponding to the target platform;
specifically, the service tag set corresponding to the target platform input by the user may be obtained, the service tag set corresponding to the target platform may be obtained from a preset storage space, and the service tag set sent by the target platform may be obtained.
The service tag set includes at least one service tag. The business label is used for expressing the business scope of the platform.
S542: labeling each search hotword in the second search hotword set according to a preset label mapping table;
the tag mapping table includes: search for hot words and tags.
Specifically, according to each search hotword in the second search hotword set, a label is obtained from a label mapping table, and the search hotword is marked according to the obtained label.
S543: finding out each search hotword with a label in the service label set from the second search hotword set to be used as a third search hotword set;
specifically, each search hotword with a label in the service label set is found out from the second search hotword set, and the found each search hotword is used as a third search hotword set.
S544: and synchronizing the target platform according to the third searching hot word set.
Optionally, the third search hotword set is sent to the target platform.
Optionally, sending the search result corresponding to each search hotword in the third search hotword set to the target platform. So as to achieve the effect of rapid pushing. Therefore, the probability that the target platform searches articles of the platform of the organization can be improved, and the flow of the platform of the organization is increased.
The search results include: one or more result profile data. The result profile data includes: result identification, title, description, highlighting, target address, and picture address. The user can open the detailed page of the search result by clicking on the interface displayed based on the result profile data.
According to the embodiment, through finding out each search hotword with the tag located in the service tag set from the second search hotword set to serve as a third search hotword set, search hotwords conforming to the service range of the target platform are screened out, and the success rate of the target platform for searching the target platform according to the third search hotword set is improved; the third searching hot word set enriches the searching hot word library of the target platform, and is beneficial to improving the searching performance of the target platform.
In one embodiment, the step of sequentially performing aggregation and deduplication processing on the platform search hot word set, the third party search hot word set and the browser search hot word set to obtain a target search hot word set includes:
s41: sequentially carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set to obtain a searching hot word set to be processed;
specifically, the platform search hot word set, the third party search hot word set and the browser search hot word set are integrated to obtain a to-be-processed set, each search hot word in the to-be-processed set is subjected to de-duplication processing, and the to-be-processed set after de-duplication processing is used as the to-be-processed search hot word set.
S42: judging whether each search hotword in the to-be-processed search hotword set is positioned in each corresponding article label in a preset article database;
specifically, whether each search hotword in the to-be-processed search hotword set is located in each corresponding article tag in a preset article database is judged, so that whether the search hotword in the to-be-processed search hotword set has article association is judged.
S43: taking each searching hotword in the corresponding article labels in the article database as a hit hotword;
specifically, each search hotword in the corresponding respective article tag in the article database is taken as a hit hotword, that is, the hit hotword has an article association in the article tag.
S44: and taking each hit hot word as the target search hot word set.
In the embodiment, each search hotword in each corresponding article tag in the article database is used as a hit hotword, and each hit hotword is used as the target search hotword set, so that each search hotword in the target search hotword set has article association in the article tag, and a basis is provided for improving the success rate of search.
In one embodiment, after the steps of sequentially performing the aggregation and deduplication processing on the platform search hot word set, the third party search hot word set and the browser search hot word set to obtain a target search hot word set, the method further includes:
s61: carrying out data clearing processing on a preset high-frequency request identifier;
Specifically, the data clearing process is performed on the preset high-frequency request identifier, so that the high-frequency request identifier is initialized.
S62: updating the target search hot word set to the high frequency request identifier;
specifically, the target search hot word set is updated to the high-frequency request identifier after data clearing processing, so that the high-frequency request identifier only keeps the search hot words corresponding to the target search hot word set.
S63: in a preset target cache, carrying out clearing processing on associated data of search hot words which do not contain the target search hot word set;
the target cache includes: at least 0 association data, wherein the association data comprises: search hotwords and search results.
S64: responding to a search request according to the high-frequency request identifier and the target cache.
The embodiment provides a basis for subsequently improving the searching efficiency by updating the high-frequency request identifier and the target cache according to the target searching hot word set.
In one embodiment, the step of responding to the search request according to the high frequency request identifier and the target cache includes:
s641: acquiring the search request, wherein the search request carries request content;
Specifically, the search request sent by the calling object is obtained.
S642: inputting the request content into the high-frequency request identifier to identify whether the high-frequency request is high-frequency or not, and obtaining an identification result;
specifically, searching the request content in the high-frequency request identifier for a search hotword, if the search is successful, determining that the identification result is yes, and if the search is failed, determining that the identification result is no.
S643: if the identification result is negative, a preset search service is called according to the request content to search a preset service database, and a target search result is obtained;
specifically, if the identification result is no, this means that the search request is not a high frequency request, and therefore, a preset service database is searched by calling a preset search service according to the request content, and the search result obtained by the search is used as a target search result.
S644: if the identification result is yes, searching a search hotword from the target cache according to the request content to obtain a search result, if the search result is successful, taking a search result corresponding to the search result in the target cache as the target search result, and if the search result is failed, calling the search service to search the service database according to the request content to obtain the target search result, and updating the request content and the target search result as associated data into the target cache;
Specifically, if the identification result is yes, this means that the search request is a high-frequency request, so, according to the request content, searching a search hotword from the target cache, if searching a search hotword with the same text, determining that the search result is successful, and if not searching a search hotword with the same text, determining that the search result is failed; if the search result is successful, a cache is already in the target cache, so that a search result corresponding to the search result in the target cache is used as the target search result; if the search result is failed, no cache is in the target cache at this time, so the search service is called according to the request content to search the service database, the target search result is obtained, the request content and the target search result are updated into the target cache as associated data, and a basis is provided for searching the search result corresponding to the request content from the target cache next time.
S645: and sending the target search result to a calling object corresponding to the search request.
Since a large number of long-tail requests exist in the search requests, a large number of long-tail requests are only queried for several times, even only one time, in a day or even longer, a large number of search results are written into the target cache, so that part of high-frequency content is replaced, and the search of the high-frequency content generates repeated calling of the search service, the search efficiency is reduced, the workload of the search service is increased, and in order to solve the problem, the embodiment filters out the search requests which are not high-frequency requests through the high-frequency request identifier, so that only the search results corresponding to the search hot words in the high-frequency request identifier are cached in the target cache, and the high-frequency requests are prevented from being evicted by the long-tail/low-frequency request cache after the cache is full.
As shown in fig. 2, the present application further provides a device for determining a search hotword, where the device includes:
the platform searching hot word set determining module 801 is configured to obtain historical searching data corresponding to each platform user, and generate searching hot words according to each historical searching data to obtain a platform searching hot word set;
the third party search hotword set determining module 802 is configured to obtain a skip keyword data set sent by each third party platform, generate search hotwords according to each skip keyword data set, and obtain a third party search hotword set;
the browser search hotword set determining module 803 is configured to obtain a first browser record corresponding to each user of the present platform, and obtain a second browser record corresponding to each user corresponding to each third party platform, generate a search hotword according to each first browser record and each second browser record, and obtain a browser search hotword set, where the first browser record and the second browser record each include: platform identification, browser caching and browsing history;
the target search hot word set determining module 804 is configured to sequentially perform a set combining and a duplication removing process on the local platform search hot word set, the third party search hot word set, and the browser search hot word set, to obtain a target search hot word set.
According to the embodiment, the target search hot word set is obtained by sequentially carrying out aggregation and deduplication on the historical search hot word set obtained based on the historical search data of the platform, the third party search hot word set obtained based on the jump keyword data set of the third party platform and the browser search hot word set obtained based on the browser record of the user, so that the jump of the platform and the third party platform and the browser record of the browser of the user are used for determining the search hot word, the comprehensiveness of the search hot word is improved, and a foundation is provided for improving the search efficiency.
FIG. 3 illustrates an internal block diagram of a computer device in one embodiment. The computer device may specifically be a terminal or a server. As shown in fig. 3, the computer device includes a processor, a memory, and a network interface connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program that, when executed by a processor, causes the processor to implement a method of determining search hotwords. The internal memory may also have stored therein a computer program which, when executed by the processor, causes the processor to perform a method of determining a search hotword. It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is presented comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of:
acquiring historical search data corresponding to each platform user, and generating search hot words according to each historical search data to obtain a platform search hot word set;
acquiring a skip keyword data set sent by each third party platform, and generating search hotwords according to each skip keyword data set to obtain a third party search hotword set;
obtaining a first browser record corresponding to each user of the platform, obtaining a second browser record corresponding to each user of the third party platform, and generating a search hot word according to each first browser record and each second browser record to obtain a browser search hot word set, wherein the first browser record and the second browser record both comprise: platform identification, browser caching and browsing history;
and carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set in sequence to obtain a target searching hot word set.
According to the embodiment, the target search hot word set is obtained by sequentially carrying out aggregation and deduplication on the historical search hot word set obtained based on the historical search data of the platform, the third party search hot word set obtained based on the jump keyword data set of the third party platform and the browser search hot word set obtained based on the browser record of the user, so that the jump of the platform and the third party platform and the browser record of the browser of the user are used for determining the search hot word, the comprehensiveness of the search hot word is improved, and a foundation is provided for improving the search efficiency.
In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring historical search data corresponding to each platform user, and generating search hot words according to each historical search data to obtain a platform search hot word set;
acquiring a skip keyword data set sent by each third party platform, and generating search hotwords according to each skip keyword data set to obtain a third party search hotword set;
obtaining a first browser record corresponding to each user of the platform, obtaining a second browser record corresponding to each user of the third party platform, and generating a search hot word according to each first browser record and each second browser record to obtain a browser search hot word set, wherein the first browser record and the second browser record both comprise: platform identification, browser caching and browsing history;
And carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set in sequence to obtain a target searching hot word set.
According to the embodiment, the target search hot word set is obtained by sequentially carrying out aggregation and deduplication on the historical search hot word set obtained based on the historical search data of the platform, the third party search hot word set obtained based on the jump keyword data set of the third party platform and the browser search hot word set obtained based on the browser record of the user, so that the jump of the platform and the third party platform and the browser record of the browser of the user are used for determining the search hot word, the comprehensiveness of the search hot word is improved, and a foundation is provided for improving the search efficiency.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (8)

1. A method of determining a search hotword, the method comprising:
acquiring historical search data corresponding to each platform user, and generating search hot words according to each historical search data to obtain a platform search hot word set;
acquiring a skip keyword data set sent by each third party platform, wherein the skip keyword data refers to keyword combinations corresponding to the searched pages fed back to the third party platform when the third party platform skips to the pages of the third party platform, and generating search hotwords according to each skip keyword data set to obtain a third party search hotword set, and the method comprises the following steps: dividing words of each skip keyword data set, generating word frequency of each phrase according to each phrase obtained by dividing words, extracting M word frequencies which are ranked at the front from each word frequency, taking the phrase corresponding to each extracted word frequency as a search hot word, taking M search hot words as a third party search hot word set, and M is an integer larger than 0;
Obtaining a first browser record corresponding to each user of the platform, obtaining a second browser record corresponding to each user of the third party platform, and generating a search hot word according to each first browser record and each second browser record to obtain a browser search hot word set, wherein the first browser record and the second browser record both comprise: platform identification, browser caching and browsing history;
sequentially carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set to obtain a target searching hot word set;
the step of sequentially performing aggregation and deduplication processing on the platform search hot word set, the third party search hot word set and the browser search hot word set to obtain a target search hot word set further comprises the following steps:
carrying out data clearing processing on a preset high-frequency request identifier;
updating the target search hot word set to the high frequency request identifier;
in a preset target cache, carrying out clearing processing on associated data of search hot words which do not contain the target search hot word set;
Responding to a search request according to the high frequency request identifier and the target cache, including: acquiring the search request, wherein the search request carries request content; inputting the request content into the high-frequency request identifier to identify whether the high-frequency request is high-frequency or not, and obtaining an identification result; if the identification result is negative, a preset search service is called according to the request content to search a preset service database, and a target search result is obtained; if the identification result is yes, searching a search hotword from the target cache according to the request content to obtain a search result, if the search result is successful, taking a search result corresponding to the search result in the target cache as the target search result, and if the search result is failed, calling the search service to search the service database according to the request content to obtain the target search result, and updating the request content and the target search result as associated data into the target cache; and sending the target search result to a calling object corresponding to the search request.
2. The method for determining a search hotword according to claim 1, wherein the step of generating a search hotword from each of the first browser records and each of the second browser records to obtain a browser search hotword set includes:
Inputting each single record data in each first browser record into a preset classification model for classification prediction to obtain a first classification result, screening one or more first classification results from all the first classification results according to preset classification configuration, taking the first classification result as a first hit result set, extracting keywords from each single record data corresponding to the first hit result set to obtain a first data set, extracting subjects from each single record data corresponding to the first hit result set to obtain a second data set, and sequentially carrying out aggregation and deduplication processing on the first data set and the second data set to obtain first attribute information;
inputting each single record data in each second browser record into the classification model for classification prediction to obtain a second classification result, screening one or more second classification results from the second classification results according to the classification configuration, taking the second classification result as a second hit result set, extracting keywords from the single record data corresponding to the second hit result set to obtain a third data set, extracting subjects from the single record data corresponding to the second hit result set to obtain a fourth data set, and sequentially carrying out aggregation and de-aggregation on the third data set and the fourth data set to obtain second attribute information;
And generating a search hotword according to the first attribute information and the second attribute information to obtain the browser search hotword set.
3. The method for determining the search hotword according to claim 2, wherein after the steps of sequentially performing aggregation and deduplication on the platform search hotword set, the third party search hotword set and the browser search hotword set to obtain a target search hotword set, the method further comprises:
taking any one of the third party platforms as a target platform;
extracting each search hotword corresponding to the target platform from the third-party search hotword set to be used as a first search hotword set;
deleting each search hotword corresponding to the first search hotword set in the target search hotword set to obtain a second search hotword set;
and sending the second search hot word set to the target platform.
4. The method of claim 3, wherein the step of sending the second set of search hotwords to the target platform comprises:
acquiring a service tag set corresponding to the target platform;
labeling each search hotword in the second search hotword set according to a preset label mapping table;
Finding out each search hotword with a label in the service label set from the second search hotword set to be used as a third search hotword set;
and synchronizing the target platform according to the third searching hot word set.
5. The method for determining the search hotword according to claim 1, wherein the step of sequentially performing aggregation and deduplication on the platform search hotword set, the third party search hotword set and the browser search hotword set to obtain a target search hotword set includes:
sequentially carrying out aggregation and deduplication processing on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set to obtain a searching hot word set to be processed;
judging whether each search hotword in the to-be-processed search hotword set is positioned in each corresponding article label in a preset article database;
taking each searching hotword in the corresponding article labels in the article database as a hit hotword;
and taking each hit hot word as the target search hot word set.
6. A device for determining a search hotword, the device comprising:
the platform searching hot word set determining module is used for acquiring historical searching data corresponding to each platform user, and generating searching hot words according to each historical searching data to obtain the platform searching hot word set;
The third party searching hot word set determining module is configured to obtain a skip keyword data set sent by each third party platform, where skip keyword data refers to a keyword combination corresponding to a searched page fed back to the third platform when the third platform skips to the page, and generate a searching hot word according to each skip keyword data set, so as to obtain a third party searching hot word set, and the third party searching hot word set includes: dividing words of each skip keyword data set, generating word frequency of each phrase according to each phrase obtained by dividing words, extracting M word frequencies which are ranked at the front from each word frequency, taking the phrase corresponding to each extracted word frequency as a search hot word, taking M search hot words as a third party search hot word set, and M is an integer larger than 0;
the browser searching hot word set determining module is used for obtaining first browser records corresponding to users of the platform, obtaining second browser records corresponding to users of the third party platform, generating searching hot words according to the first browser records and the second browser records, and obtaining a browser searching hot word set, wherein the first browser records and the second browser records comprise: platform identification, browser caching and browsing history;
The target searching hot word set determining module is used for sequentially carrying out set combination and duplication removal on the platform searching hot word set, the third party searching hot word set and the browser searching hot word set to obtain a target searching hot word set, and then carrying out data clearing processing on a preset high-frequency request identifier; updating the target search hot word set to the high frequency request identifier; in a preset target cache, carrying out clearing processing on associated data of search hot words which do not contain the target search hot word set; responding to a search request according to the high frequency request identifier and the target cache, including: acquiring the search request, wherein the search request carries request content; inputting the request content into the high-frequency request identifier to identify whether the high-frequency request is high-frequency or not, and obtaining an identification result; if the identification result is negative, a preset search service is called according to the request content to search a preset service database, and a target search result is obtained; if the identification result is yes, searching a search hotword from the target cache according to the request content to obtain a search result, if the search result is successful, taking a search result corresponding to the search result in the target cache as the target search result, and if the search result is failed, calling the search service to search the service database according to the request content to obtain the target search result, and updating the request content and the target search result as associated data into the target cache; and sending the target search result to a calling object corresponding to the search request.
7. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 5.
8. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 5.
CN202310024195.2A 2023-01-09 2023-01-09 Method and device for determining search hotword, computer equipment and storage medium Active CN115757923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310024195.2A CN115757923B (en) 2023-01-09 2023-01-09 Method and device for determining search hotword, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310024195.2A CN115757923B (en) 2023-01-09 2023-01-09 Method and device for determining search hotword, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115757923A CN115757923A (en) 2023-03-07
CN115757923B true CN115757923B (en) 2023-05-23

Family

ID=85348353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310024195.2A Active CN115757923B (en) 2023-01-09 2023-01-09 Method and device for determining search hotword, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115757923B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760521A (en) * 2016-02-29 2016-07-13 百度在线网络技术(北京)有限公司 Information input method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164424B (en) * 2011-12-13 2017-05-10 阿里巴巴集团控股有限公司 Method and device for acquiring time-efficient words
CN104572846B (en) * 2014-12-12 2018-10-16 百度在线网络技术(北京)有限公司 A kind of hot word recommendation methods, devices and systems
CN105574176A (en) * 2015-12-21 2016-05-11 北京奇虎科技有限公司 Hot word recommending method and device with combination of multiple data sources
CN112000865B (en) * 2020-07-22 2024-01-23 北京达佳互联信息技术有限公司 Hotword generation method, device, server and storage medium
CN113596352B (en) * 2021-07-29 2023-07-25 北京达佳互联信息技术有限公司 Video processing method, processing device and electronic equipment
CN113918799A (en) * 2021-10-28 2022-01-11 深圳供电局有限公司 Hot searching list sorting method based on digital historical information system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760521A (en) * 2016-02-29 2016-07-13 百度在线网络技术(北京)有限公司 Information input method and device

Also Published As

Publication number Publication date
CN115757923A (en) 2023-03-07

Similar Documents

Publication Publication Date Title
US8554759B1 (en) Selection of documents to place in search index
US6985950B1 (en) System for creating a space-efficient document categorizer for training and testing of automatic categorization engines
EP1862916A1 (en) Indexing Documents for Information Retrieval based on additional feedback fields
JP4542195B2 (en) Database query system and method
CN110321408B (en) Searching method and device based on knowledge graph, computer equipment and storage medium
WO2021098648A1 (en) Text recommendation method, apparatus and device, and medium
US8423885B1 (en) Updating search engine document index based on calculated age of changed portions in a document
US10282358B2 (en) Methods of furnishing search results to a plurality of client devices via a search engine system
CN110909120B (en) Resume searching/delivering method, device and system and electronic equipment
CN111339244A (en) Tax policy and regulation inquiry method, computer equipment and storage medium
CN112328548A (en) File retrieval method and computing device
US20150206101A1 (en) System for determining infringement of copyright based on the text reference point and method thereof
US20150339387A1 (en) Method of and system for furnishing a user of a client device with a network resource
CN112685475A (en) Report query method and device, computer equipment and storage medium
CN114222000A (en) Information pushing method and device, computer equipment and storage medium
CN110955855A (en) Information interception method, device and terminal
CN115757923B (en) Method and device for determining search hotword, computer equipment and storage medium
CN107590233A (en) A kind of file management method and device
CA2703132A1 (en) Methods and system for information storage enabling fast information retrieval
CN115269765A (en) Account identification method and device, electronic equipment and storage medium
JP3531344B2 (en) Information retrieval device
CN110399451B (en) Full-text search engine caching method, system and device based on nonvolatile memory and readable storage medium
CN107818091B (en) Document processing method and device
CN116010588B (en) Real-time and offline combined document recommendation method, device, equipment and medium
CN115438236B (en) Unified hybrid search method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant