WO2023142448A1 - 热点信息的处理方法、装置、服务器和可读存储介质 - Google Patents

热点信息的处理方法、装置、服务器和可读存储介质 Download PDF

Info

Publication number
WO2023142448A1
WO2023142448A1 PCT/CN2022/113119 CN2022113119W WO2023142448A1 WO 2023142448 A1 WO2023142448 A1 WO 2023142448A1 CN 2022113119 W CN2022113119 W CN 2022113119W WO 2023142448 A1 WO2023142448 A1 WO 2023142448A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
hotspot
item
preset
hot
Prior art date
Application number
PCT/CN2022/113119
Other languages
English (en)
French (fr)
Inventor
张雄伟
陶通
李勇
包勇军
颜伟鹏
周明龙
赫阳
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2023142448A1 publication Critical patent/WO2023142448A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the technical field of big data analysis, and in particular to a method, device, server and readable storage medium for processing hotspot information.
  • the online mall can push various items to the user.
  • items in the online shopping mall There are many types of items in the online shopping mall, and some items that the user is interested in are usually selected from them and pushed to the user, for example, based on some current hot topics/hot events on the Internet, relevant items are recommended for the user.
  • the present disclosure provides a hotspot information processing method, device, server and readable storage medium, which are used to solve the problem of high labor cost in the existing item pushing process.
  • an embodiment of the present disclosure provides a method for processing hotspot information, including:
  • a potential item requirement is obtained, and the potential item requirement is used to indicate an item that the user is interested in on the first website.
  • the acquisition of hot items on the first website includes:
  • the historical behavior of the user on the items in the first website including at least one of browsing behavior, searching behavior, ordering behavior and collection behavior;
  • hot items are selected from the items on the first website.
  • the determining the application scenario information corresponding to the hot item includes:
  • the application scene information of the hot item is obtained.
  • the acquiring the application scenario information of the hot items according to the title information and attribute information includes:
  • Scene participle is extracted from the title information, and the scene participle is a word describing the application scene of the hot item;
  • the determining the target hotspot information associated with the application scene information in the hotspot information includes:
  • target hotspot information associated with the application scene information is determined from the hotspot information.
  • the training of the preset model according to the hotspot information and the associated information to obtain the first target model includes:
  • the positive sample and negative sample are used as the training data of the preset model
  • the preset model is trained to obtain a first target model.
  • performing vectorization on the hotspot information to obtain a representation vector of the hotspot information includes:
  • a representation vector of the hotspot information is obtained according to a word segmentation vector corresponding to each hotspot word.
  • the vectorization of the application scenario information to obtain a representation vector of the application scenario information includes:
  • the characterization vector of the application scene information is obtained.
  • the determining the positive samples and negative samples of the preset model according to the distance includes:
  • Selecting hotspot information whose distance is less than or equal to a preset distance is used as a negative sample of the preset model.
  • the training of the preset model according to the training data of the preset model to obtain the first target model includes:
  • the preset model is trained to obtain an initial model
  • Hotspot information with a score greater than a preset score threshold from hotspot information whose distance is less than or equal to a preset distance, and updating it into the positive sample;
  • the initial model is trained according to the updated positive samples and negative samples to obtain the first target model.
  • the acquiring potential item demand according to the target hotspot information and the application scenario information includes:
  • the application scenario information acquiring a set of hotspot information associated with the application scenario information on the second website;
  • an apparatus for processing hotspot information including:
  • An item acquisition module configured to acquire hot items on the first website, and determine application scenario information corresponding to the hot items, where the hot items are items in the first website whose degree of user attention reaches a preset threshold;
  • An information association module configured to obtain hotspot information from a second website, and determine target hotspot information associated with the application scene information in the hotspot information;
  • a demand acquiring module configured to acquire potential item demands according to the target hotspot information and the application scene information, and the potential item demands are used to indicate items that the user is interested in on the first website.
  • the item acquisition module when acquiring hot items on the first website, is specifically configured to:
  • the historical behavior of the user on the items in the first website including at least one of browsing behavior, searching behavior, ordering behavior and collection behavior;
  • the hot item is selected from the items on the first website.
  • the item acquisition module when determining the application scenario information corresponding to the hot item, is specifically configured to:
  • the application scene information of the hot item is obtained.
  • the item acquisition module when acquiring the application scenario information of the hot item according to the title information and attribute information, is specifically configured to:
  • Scene participle is extracted from the title information, and the scene participle is a word describing the application scene of the hot item;
  • the information association module is specifically configured to:
  • target hotspot information associated with the application scene information is determined from the hotspot information.
  • an embodiment of the present disclosure provides a server, including: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer-executable instructions
  • the processor executes the computer-implemented instructions stored in the memory to implement the method as described above.
  • an embodiment of the present disclosure provides a readable storage medium, wherein computer instructions are stored in the readable storage medium, and the computer instructions are used to implement the above method when executed by a processor.
  • an embodiment of the present disclosure provides a program product, including computer instructions, which implement the above method when executed by a processor.
  • the hotspot information processing method, device, server, and readable storage medium provided by the embodiments of the present disclosure obtain hot items with a high degree of attention on the first website, and obtain hotspot information from other network sites, and find out in these hotspot information
  • Target hotspots with a high degree of correlation with the shopping scene of hot items can analyze the potential demand for items, without manual real-time monitoring of hotspot information on other network sites and selection of hot item collections based on subjective guesswork on shopping websites, reducing labor costs.
  • FIG. 1 is a schematic diagram of a scene of a method for processing hotspot information provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of Embodiment 1 of a method for processing hotspot information provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of Embodiment 2 of a method for processing hotspot information provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an apparatus for processing hotspot information provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of a server provided by an embodiment of the present disclosure.
  • Hotspots Refers to news or information that is relatively popular or popular with the general public, or refers to places or issues that attract attention in a certain period. Such as “social hot spot”, "a certain place has become a tourist hotspot”.
  • FIG. 1 is a schematic diagram of a scene of a method for processing hotspot information provided by an embodiment of the present disclosure.
  • the application scenario of the disclosed solution may be a scenario where a user is shopping on a website.
  • users can log in to the website through a mobile terminal 10 (such as a personal computer), browse, click, bookmark and place an order for items displayed on the website.
  • Users pay different attention to different items according to their personal interests. For example, users will pay attention to hot information in current life. If users find that there is a potential demand for some items in the hot information, they may go to the website to place an order for these items.
  • the website can also actively search for hot information in current life, and then find some items that users may need and display them on the page.
  • the embodiments of the present disclosure provide a hotspot information processing method, device, server and readable storage medium, which utilizes users' attention to different items on the website to find hot items with higher attention and the hotspot
  • the application scenario information corresponding to the item and then obtain hotspot information from other websites, determine which hotspot information is the target hotspot information with a high degree of correlation with the application scenario information, and finally analyze the user’s potential based on the target hotspot information and application scenario information Item demand, the whole process can eliminate the workload of manual mining hotspot information, realize the automatic identification of target hotspot information, reduce the cost of manual screening of related items, enhance the interpretability of hotspot information, and finally infer the user's potential item demand , to improve the user experience on the website.
  • FIG. 2 is a schematic flowchart of Embodiment 1 of a method for processing hotspot information provided by an embodiment of the present disclosure.
  • the method can be applied to a local computer device, and can also be applied to a cloud server in practical applications. As shown in Figure 2, the method may specifically include the following steps:
  • the hot item is an item on the first website whose degree of user attention reaches a preset threshold.
  • the first website may refer to a website for users to browse, search, and place an order, such as some existing shopping websites. These websites usually contain a large amount of item information. Due to the limitation of the display interface, the website generally only selects some items from the massive items and displays them on the display interface. If the items displayed on the display interface are not Items that the user is interested in, the user needs to browse, search, etc. to find the item that the user is interested in.
  • the server may take a period of time (for example, one month) as a time period, and count the degree of attention of each item in each time period, so as to determine which items in the time period are hot items.
  • hot items usually carry item information for the convenience of users to view and understand the item, specifically title information and attribute information, etc. may be included.
  • the title information is usually a textual introduction to the important features of the item
  • the attribute information is a textual introduction to the detailed attributes of the item.
  • the item information of hot items can be as follows in Table 1:
  • the attribute information includes the gross weight of the item, the origin of the item, the identification of the item, the category to which it belongs, and the applicable event.
  • the application scene information can be extracted from title information and attribute information.
  • the application scenario information can be "student_outdoor_return to work, student_outdoor_start of school, student_outdoor_epidemic prevention, student_outdoor_first aid".
  • S202 Acquire hotspot information from the second website, and determine target hotspot information associated with the application scene information in the hotspot information.
  • the second website may be some network sites other than the first website, such as social network sites, news network sites, game forums and so on.
  • the hotspot information is usually a topic that has attracted widespread attention and discussion in the society recently, and the hotspot information usually consists of a simple sentence.
  • the hotspot information can be # ⁇ #, # ⁇ #, # ⁇ #.
  • some hotspot information can be associated with application scene information, and these hotspot information will be used as target hotspot information.
  • the application scenario information and the hotspot information may be vectorized respectively, and then the vector distance between the application scenario information and the hotspot information is obtained, and then according to the vector distance, which hotspot information is associated with the application scenario information is determined.
  • the hotspot information # ⁇ # has an association relationship with the application scenario information "replenishing water_outdoor_warming", while # ⁇ # has no application scenario information related to it.
  • the potential item demand is used to indicate the item that the user is interested in on the first website.
  • the item that the user is interested in may be an outdoor backpack, a mask, and the like.
  • target hotspot information can be associated with application scenario information, and some application scenario information with strong correlation can be found, from which potential item demand can be deduced.
  • application scenario information such as "boys_autumn winter_outdoor_travel" as an example, items such as outdoor backpacks, autumn winter coats, travel tents, etc. can be analyzed from the application scene information.
  • the present disclosure by obtaining the hot items and the application scene information corresponding to the hot items from the first website, and then screening the target hot spots information associated with the application scene information from the second website, there is no need for manual mining and screening of hot spots, which reduces the Labor cost. At the same time, it can also deduce potential item demand based on target hotspot information and application scenario information, accurately find items that users are interested in, and finally achieve accurate push of items and improve the purchase conversion rate of items.
  • the "acquiring hot items on the first website" in the above step S201 can be specifically implemented through the following steps: acquiring the historical behavior of the user on the items in the first website; Select hot items from items on a website.
  • the historical behavior includes at least one of browsing behavior, searching behavior, order placing behavior and collection behavior.
  • the historical behavior may be the user's recent (for example, one month) behavior on the first website
  • the hot item refers to the item whose historical behavior of the user reaches a certain frequency or more.
  • the preset threshold may be an empirical value. Exemplarily, items that have been browsed more than one million times by all users in the past month may be regarded as hot items.
  • the embodiments of the present disclosure use the historical behavior of users on the website to dig out which items in the website are hot items, avoiding the use of manual mining and artificial subjective guessing of hot items, making hot items interpretable and reducing labor costs. At the same time, it can also improve the mining efficiency of hot items.
  • the "determining the application scenario information corresponding to the hot item" in the above step S201 can be specifically implemented through the following steps: acquiring the item information of the hot item; acquiring the application scenario information of the hot item according to the title information and attribute information.
  • item information includes title information and attribute information.
  • the attribute information may also include attributes such as applicable seasons, applicable people, and applicable events of the item, and different attributes correspond to different attribute values.
  • the attribute information in the above Table 1 includes applicable events, wherein the attribute values corresponding to the applicable events are epidemic prevention and first aid.
  • item information is divided into title information and attribute information, and application scenario information of popular items is obtained from the title information and attribute information, so that the obtained application scenario information can be more accurate.
  • the above-mentioned step of "obtaining the application scene information of the hot items according to the title information and attribute information" can be realized through the following steps: extracting scene word segmentation from the title information; Obtain the preset attribute in the attribute information, determine the attribute value corresponding to the preset attribute; combine the scene word segmentation and the attribute value to obtain the application scenario information.
  • the scene participle is the word describing the application scene of the hot item.
  • sequence labeling models include but are not limited to probabilistic graphical models and deep learning models.
  • the probabilistic graphical model can be a conditional random field algorithm (CRF), hidden Markov model (Hidden Markov Model, HMM), and the deep learning model can be a long-term short-term memory artificial neural network model (Bi-directional Long Short -Term Memory conditional random field algorithm, BiLSTM-CRF).
  • the part-of-speech category corresponding to the sequence tagging model can be defined into four types: applicable event, applicable location, applicable population and applicable time. Through the four parts of speech, the scene participle is extracted from the title information.
  • the scene participle and the attribute value each contain at least one word.
  • the scene participle in Table 2 includes five words.
  • the format of application scenario information can be defined as applicable crowd_applicable place_applicable event. You can fill in the corresponding scene participle and attribute value according to the format of the application scene, that is, the application scene information is obtained.
  • the scene word corresponding to the applicable crowd is student
  • the scene word corresponding to the applicable location is outdoor
  • the scene word corresponding to the applicable event is return to work
  • start of school epidemic prevention
  • the attribute values corresponding to the applicable event in Table 1 are epidemic prevention and first aid
  • the application scenario information obtained by permutation and combination includes: student_outdoor_return to work, student_outdoor_start of school, student_outdoor_epidemic prevention, student_outdoor_first aid.
  • the "determining the target hotspot information associated with the application scenario information from the hotspot information" in the above step S202 can be specifically implemented through the following steps: according to the hotspot information and the application scenario information, the preset model is trained to obtain The first target model: according to the first target model, determine the target hotspot information associated with the application scene information from the hotspot information.
  • the preset model may be a scene-based machine learning binary classification model, including but not limited to methods such as logistic regression and naive Bayesian.
  • the training data of the preset model may be hotspot information obtained from the second website.
  • data enhancement may also be performed on the hotspot information based on the obtained hotspot information.
  • the relevant information can be topics related to the hotspot information and discussion content under the topic.
  • relevant topics can be discussion information related to this hot information, such as "# ⁇ # Friends ⁇ Let’s share some cold-proof equipment", “Overlord The super cold wave is coming! What should I do if the skin is severely dehydrated?".
  • the hotspot information can be automatically marked, and the hotspot information related to the application scene information can be found as a positive sample, and other remaining hotspot information can be used as a negative sample, so that the subsequent first target model train.
  • the first objective model can be used to automatically identify the hot information related to the item from any information, without manually crawling hot information from the website and determining which hot information is related to the item , reduce labor costs, and at the same time improve the response speed to hot information and avoid outdated hot information.
  • the above-mentioned step of "training the preset model to obtain the first target model according to the hotspot information and associated information” can be specifically implemented through the following steps: Vectorize to obtain the representation vector of hotspot information; vectorize the application scenario information to obtain the representation vector of application scenario information; obtain the distance between the representation vector of hotspot information and the representation vector of application scenario information; determine the preset model based on the distance positive samples and negative samples; according to the training data of the preset model, the preset model is trained to obtain the first target model. Among them, the positive samples and negative samples are used as the training data of the preset model.
  • the hotspot information may be represented using a semantic vector.
  • a corresponding semantic vector may be generated for the hotspot information based on a vector tool such as fasttext, so as to represent the hotspot information.
  • application scene information can also be represented using semantic vectors.
  • the vector distance between the characterization vector of the hotspot information and the characterization vector of the application scene information may be calculated.
  • the vector distance includes but not limited to a cosine distance, a Euclidean distance, and the like.
  • a distance threshold for example, 0.95
  • hotspot information whose vector distance is greater than the distance threshold is selected as the information strongly related to the application scene
  • hotspot information whose vector distance is less than or equal to the distance threshold is selected as the hotspot information related to the application.
  • Scene information is weakly correlated.
  • the hotspot information that is strongly related to the application scenario information is a positive sample
  • the hotspot information that is weakly related to the application scenario information is a negative sample.
  • the embodiment of the present disclosure selects the positive sample and the negative sample as the training data of the preset model by using the vector distance, and trains to obtain the first target model, which can realize the automatic identification of sudden public opinion, and greatly improves the time-sensitive hotspot information of the shopping website
  • the response speed is fast, and items related to hot information are pushed to users to improve the purchase conversion rate of items.
  • the above step of "vectorizing the hotspot information to obtain the representation vector of the hotspot information” can be specifically implemented through the following steps: performing word segmentation on the hotspot information to obtain at least one hotspot Segmentation; vectorize each hotspot word to obtain the word segmentation vector corresponding to each hotspot word; obtain the representation vector of hotspot information according to the word segmentation vector corresponding to each hotspot word.
  • the hotspot information is a short sentence.
  • the hotspot information needs to be segmented to obtain several hotspot words, and the word segmentation vector corresponding to each hotspot word is calculated, and finally the word segmentation vector of each hotspot word is integrated to obtain the hotspot information
  • word segmentation tools such as stammering word segmentation can be used, and then the corresponding semantic vectors for each word segmentation can be used to obtain the representation vector of hot information.
  • T ⁇ w 1 , w 2 ,,,w n ⁇
  • w n the nth hot word (n is a positive integer not less than 1).
  • Vec T is the representation vector of hotspot information.
  • the above step of "vectorizing the application scenario information to obtain the representation vector of the application scenario information” can be specifically implemented through the following steps: obtain the scenario name and application of the application scenario information The item information contained in the scene information; vectorize the scene name to obtain the representation vector of the scene name; vectorize the item information to obtain the representation vector of the item information; according to the representation vector of the scene name and the representation vector of the item information, get the application A representation vector of scene information.
  • the application scene information includes at least one piece of item information. Exemplarily, take the following Table 4 as an example.
  • the scene name of the application scene information in Table 4 is student_outdoor_epidemic prevention, and the corresponding three items can be obtained by querying the scene name of the application scene information.
  • the word segmentation set of the scene name and the word segmentation set of the item name can be obtained first, and then the word segmentation set of the scene name and the item name set is vectorized to obtain the representation vector of the scene name and the representation vector of the item information, and then Combined to obtain the representation vector of the application scene information.
  • the word segmentation set of the scene name is ⁇ student, outdoor, epidemic prevention ⁇ .
  • a set of word segmentation vectors corresponding to each scene name can be obtained Represents the word segmentation vector of the nth scene name, then the representation vector of the scene name is:
  • the word segmentation set of any item information can be set as S sku , Among them, S n is the nth participle of any item information.
  • the word segmentation vector set corresponding to any item information is Exemplarily, the word segmentation set of any item information is derived from the title name of the item and the attribute information of the item. From this, the vector representation of all item information can be obtained as:
  • k means that the application scene information contains k item information, Indicates the set of word segmentation vectors corresponding to the jth item information, Indicates the vector corresponding to the i-th participle corresponding to the j-th product in the shopping scene.
  • the characterization vector of the application scene information can be obtained by combining the characterization vector of the aforementioned scene name and the characterization vector of the item information as
  • the above-mentioned step of "determining the positive samples and negative samples of the preset model according to the distance” can be specifically implemented through the following steps: obtain hotspot information whose distance is greater than the preset distance, as The positive sample of the preset model; the hotspot information whose distance is less than or equal to the preset distance is selected as the negative sample of the preset model.
  • the cosine distance can be taken as an example, and the calculation formula of the vector distance is as follows:
  • dis represents the distance between the representation vector of hotspot information and the representation vector of application scene information
  • Vec scene represents the representation vector of application scene information
  • Vec T represents the representation vector of hotspot information
  • the preset distance can be set to 0.95, and when the characterization vector of the hotspot information and the characterization vector of the application scene information are greater than the preset distance, the hotspot information will be taken as a positive sample. The remaining hotspot information will be used as negative samples.
  • the above step of "training the preset model to obtain the first target model according to the training data of the preset model” can be specifically implemented through the following steps: according to the positive samples and negative samples of the preset model , train the preset model to obtain the initial model; according to the initial model, score the hotspot information whose distance is less than or equal to the preset distance, and obtain the corresponding score of each hotspot information whose distance is less than or equal to the preset distance; from the distance less than or equal to Select the hotspot information with a score greater than the preset score threshold from the hotspot information with a preset distance, and update it to the positive sample; select the hotspot information with a score less than or equal to the preset score from the hotspot information with a distance less than or equal to the preset distance, update to the negative sample; according to the updated positive sample and negative sample, train the initial model to obtain the first target model.
  • the first version of the initial model is obtained after training the first target model through positive samples and negative samples.
  • the initial model can be used to predict the correlation score from the original negative sample, and then select a higher score and add it to the original positive sample to update the original positive sample to obtain the updated positive sample, and use the rest as negative samples. Sample, update the original negative sample. Then, the initial model is trained through the updated positive samples and negative samples to obtain the first target model.
  • the correlation score refers to the prediction result of the initial model, which represents the degree of correlation between the hotspot information and the application scenario information.
  • hotspot information is related to the application scenario information "warm_replenishing water” and can be added to the positive samples.
  • n times of iterative training can be performed on the initial model, that is, after each iterative training, the previous positive samples and negative samples are updated, and after n times of iterative training, a batch of annotated positive samples can be obtained. samples and negative samples. Among them, the value of n can be determined by empirical threshold. After obtaining the most annotated positive and negative samples, based on the model structure of the initial model, the latest version of the model is retrained to obtain the final first target model.
  • the embodiment of the present disclosure can enhance the generalization ability of the first target model by updating the positive samples and negative samples after each iterative training, so that the first target model can be applied to a wider range of data scenarios, and the other Aspects can also be used to mine potential item demand for hot information.
  • the above step S203 can be specifically implemented through the following steps: according to the application scenario information, obtain the hotspot information set associated with the application scenario information on the second website; according to the hotspot information set, train the preset classification model to obtain the first Two-target model: According to the target hotspot information, application scene information and the second target model, the potential item demand is obtained.
  • a search may be performed on the second website by means of a search term to obtain hotspot information related to each application scenario information.
  • the search term is "boys autumn and winter outdoor travel", and it can be searched on the second website to obtain # ⁇ related to the application scenario information Hot information such as Shengge Outdoor Travel Network #, # ⁇ #, etc., are used as training data to train the preset classification model.
  • the training data that is, hotspot information
  • the training data needs to be vectorized to obtain a representation vector, and then used as an input of a preset classification model to train to obtain a second target model.
  • any hotspot information can be crawled from the second website to predict whether it contains potential item demand.
  • the preset classification model may be a machine learning multi-classification model, such as a naive Bayesian model, a decision tree, etc., or may be based on deep learning, such as a common twin-tower model.
  • FIG. 3 is a schematic flowchart of Embodiment 2 of the method for processing hotspot information provided by an embodiment of the present disclosure. As shown in FIG. 3 , the method includes steps: S301, building a hotspot event database. S302. Relevant hot spots are identified. S303. Deduce potential item demand based on hotspot information.
  • the application scene information contained in the item library of the shopping website and the user's recent behavior log can be used to construct a hot event library.
  • Related hotspots may refer to hotspot information related to application scene information.
  • the workload of manually mining hotspots can be eliminated, and on the other hand, hotspots can be interpreted.
  • Relevant hotspot identification can identify hotspot information related to shopping websites from hotspot information from any data source, effectively reducing manual screening costs.
  • Potential item demand reasoning can realize the automatic association of hot information and items, accurately find the item needs behind users, and improve the item click rate and conversion rate of shopping websites.
  • FIG. 4 is a schematic structural diagram of a device for processing hotspot information provided by an embodiment of the present disclosure.
  • the device for processing hotspot information may be integrated on a server, or may be independent from the server and cooperate with the server to implement this solution.
  • the hotspot information processing device 40 includes an item acquisition module 41 , an information association module 42 and a demand acquisition module 43 .
  • the item acquisition module 41 is configured to acquire hot items on the first website, and determine application scenario information corresponding to the hot items.
  • the information association module 42 is configured to obtain hotspot information from the second website, and determine target hotspot information associated with the application scene information in the hotspot information.
  • the demand acquisition module 43 is used to acquire potential item demand according to target hotspot information and application scene information.
  • the hot item is an item whose attention degree of the user reaches a preset threshold in the first website, and the potential item demand is used to indicate the item that the user is interested in in the first website.
  • the above item acquisition module 41 can be specifically used for:
  • hot items are selected from the items on the first website.
  • the historical behavior includes at least one of browsing behavior, searching behavior, order placing behavior and collection behavior.
  • the above item acquisition module 41 can be specifically used for:
  • the application scene information of the hot item is obtained.
  • item information includes title information and attribute information.
  • the above-mentioned item acquisition module 41 can be specifically used for:
  • the scene participle is a word describing the application scene of the hot item.
  • the information association module 42 can be specifically used for:
  • the target hotspot information associated with the application scene information is determined from the hotspot information.
  • the information association module 42 can be specifically used to:
  • the preset model is trained to obtain the first target model.
  • the positive samples and negative samples are used as the training data of the preset model.
  • the information association module 42 can be specifically used to:
  • the information association module 42 can be specifically used to:
  • the representation vector of the scene name and the representation vector of the item information is obtained.
  • the application scene information includes at least one piece of item information.
  • the information association module 42 can be specifically used to:
  • the information association module 42 can be specifically used to:
  • the preset model is trained to obtain the initial model
  • the hotspot information whose distance is less than or equal to the preset distance is scored, and the score corresponding to each hotspot information whose distance is less than or equal to the preset distance is obtained;
  • the initial model is trained to obtain the first target model.
  • the requirements acquisition module 43 can be specifically used for:
  • the hotspot information collection associated with the application scenario information is acquired on the second website;
  • the preset classification model is trained to obtain the second target model
  • target hotspot information application scenario information and the second target model
  • potential item demand is obtained.
  • each module of the above device is only a division of logical functions, and may be fully or partially integrated into a physical entity or physically separated during actual implementation.
  • these modules can all be implemented in the form of software called by processing elements.
  • the item acquisition module can be stored in the memory of the above-mentioned device in the form of program code, and a certain processing element of the above-mentioned device can call and execute the function of the above item acquisition module. .
  • the implementation of other modules is similar.
  • a computer program product includes one or more computer instructions.
  • Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • DSL digital subscriber line
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media.
  • Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (SSD)) and the like.
  • FIG. 5 is a schematic structural diagram of a server provided by an embodiment of the present disclosure.
  • the server 50 includes: at least one processor 51 , a memory 52 , a bus 53 and a communication interface 54 .
  • the processor 51 , the communication interface 54 and the memory 52 communicate with each other through the bus 53 .
  • the communication interface 54 is used for communicating with other devices. Exemplarily, the communication interface 54 may communicate with the server of the second website, so as to obtain hotspot information from the second website.
  • the processor 51 is configured to execute the computer-executed instructions stored in the memory 52, and may specifically execute relevant steps in the methods described in the above-mentioned embodiments.
  • the processor may be a central processing unit.
  • the one or more processors included in the server may be of the same type, such as one or more CPUs, or may be of different types, such as one or more CPUs and one or more ASICs.
  • Memory used to store computer-executable instructions.
  • the memory may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
  • This embodiment also provides a readable storage medium, in which computer instructions are stored, and when at least one processor of the server executes the computer instructions, the server executes the hotspot information processing method provided by the above-mentioned various implementations .
  • This embodiment also provides a program product, the program product includes computer instructions, and the computer instructions are stored in a readable storage medium. At least one processor of the server may read the computer instructions from the readable storage medium, and the at least one processor executes the computer instructions so that the server implements the hotspot information processing method provided in the above-mentioned various implementations.
  • “at least one” means one or more, and “plurality” means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship; in the formula, the character “/” indicates that the contextual objects are a “division” relationship.
  • “At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein, a, b, c can be single or multiple indivual.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供一种热点信息的处理方法、装置、服务器和可读存储介质,涉及数字营销技术领域,其中,该方法包括:获取第一网站的热点物品,确定所述热点物品对应的应用场景信息,从第二网站中获取热点信息,在所述热点信息中确定出与所述应用场景信息相关联的目标热点信息,根据所述目标热点信息和所述应用场景信息,获取潜在物品需求。该技术方案中,通过获取第一网站中关注度较高的热门物品,并从其他网络站点获取热点信息,在这些热点信息中找到与热点物品的购物场景关联度高的目标热点,分析出潜在的物品需求,不需要人工实时监测其他网络站点的热点信息以及在购物网站中凭主观臆测选择热点物品集合,降低人工成本。

Description

热点信息的处理方法、装置、服务器和可读存储介质
本公开要求于2022年01月26日提交中国专利局、申请号为202210092682.8、申请名称为“热点信息的处理方法、装置、服务器和可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及大数据分析技术领域,尤其涉及一种热点信息的处理方法、装置、服务器和可读存储介质。
背景技术
随着互联网技术的发展,越来越多的用户在网上商城购物。用户在浏览网上商城时,网上商城可以推送各种各样的物品给用户。网上商城的物品种类很多,通常会从中选择出一些用户感兴趣的物品,推送给用户,例如基于当前网络上的一些热议话题/热点事件,为用户推荐相关的物品。
现有技术中,在基于网络热议话题/热点事件进行物品推送时,主要是由人工预先获取网络上的热点信息,基于热点信息筛选出相关的物品集合,然后再从物品集合中选取出物品推送给用户。
但是,现有技术的这种方式需要人工实时监测热点信息,并由人工筛选物品,整个过程费时费力,涉及到的人工成本高。
发明内容
本公开提供一种热点信息的处理方法、装置、服务器和可读存储介质,用于解决现有物品推送过程中,人工成本高的问题。
第一方面,本公开实施例提供一种热点信息的处理方法,包括:
获取第一网站的热点物品,确定所述热点物品对应的应用场景信息,所述热点物品为所述第一网站中用户关注度达到预设阈值的物品;
从第二网站中获取热点信息,在所述热点信息中确定出与所述应用场景 信息相关联的目标热点信息;
根据所述目标热点信息和所述应用场景信息,获取潜在物品需求,所述潜在物品需求用于指示用户在所述第一网站中感兴趣的物品。
在第一方面的一种可能设计中,所述获取第一网站的热点物品,包括:
获取所述用户对所述第一网站中物品的历史行为,所述历史行为包括浏览行为、搜索行为、下单行为和收藏行为中的至少一种;
根据所述历史行为的次数和所述预设阈值,从所述第一网站的物品中选取出热点物品。
在第一方面的另一种可能设计中,所述确定所述热点物品对应的应用场景信息,包括:
获取所述热点物品的物品信息,所述物品信息包括标题信息和属性信息;
根据所述标题信息和属性信息,获取所述热点物品的应用场景信息。
在第一方面的再一种可能设计中,所述根据所述标题信息和属性信息,获取所述热点物品的应用场景信息,包括:
从所述标题信息中提取得到场景分词,所述场景分词为描述所述热点物品的应用场景的词语;
获取所述属性信息中的预设属性,确定所述预设属性对应的属性值;
将所述场景分词与所述属性值进行组合,得到所述应用场景信息。
在第一方面的又一种可能设计中,所述在所述热点信息中确定出与所述应用场景信息相关联的目标热点信息,包括:
根据所述热点信息和所述应用场景信息,对预设模型进行训练得到第一目标模型;
根据所述第一目标模型,从所述热点信息中确定出与所述应用场景信息关联的目标热点信息。
在第一方面的又一种可能设计中,所述根据所述热点信息和所述关联信息,对预设模型进行训练得到第一目标模型,包括:
对所述热点信息进行向量化,得到所述热点信息的表征向量;
对所述应用场景信息进行向量化,得到应用场景信息的表征向量;
获取所述热点信息的表征向量与所述应用场景信息的表征向量的距离;
根据所述距离,确定所述预设模型的正样本和负样本,所述正样本和负 样本用于作为所述预设模型的训练数据;
根据所述预设模型的训练数据,对所述预设模型进行训练得到第一目标模型。
在第一方面的又一种可能设计中,所述对所述热点信息进行向量化,得到所述热点信息的表征向量,包括:
对所述热点信息进行分词得到至少一个热点分词;
对每个热点分词进行向量化,得到每个热点分词对应的分词向量;
根据每个热点分词对应的分词向量,获取所述热点信息的表征向量。
在第一方面的又一种可能设计中,所述对所述应用场景信息进行向量化,得到应用场景信息的表征向量,包括:
获取所述应用场景信息的场景名称和所述应用场景信息中包含的物品信息,所述应用场景信息中包含有至少一个物品信息;
对所述场景名称进行向量化,得到所述场景名称的表征向量;
对所述物品信息进行向量化,所述物品信息的表征向量;
根据所述场景名称的表征向量和所述物品信息的表征向量,得到所述应用场景信息的表征向量。
在第一方面的又一种可能设计中,所述根据所述距离,确定所述预设模型的正样本和负样本,包括:
获取所述距离大于预设距离的热点信息,作为所述预设模型的正样本;
选取所述距离小于或等于预设距离的热点信息,作为所述预设模型的负样本。
在第一方面的又一种可能设计中,所述根据所述预设模型的训练数据,对所述预设模型进行训练得到第一目标模型,包括:
根据所述预设模型的正样本和负样本,对所述预设模型进行训练得到初始模型;
根据所述初始模型对距离小于或等于预设距离的热点信息进行打分,得到每个距离小于或等于预设距离的热点信息对应的得分;
从距离小于或等于预设距离的热点信息中选取出得分大于预设分数阈值的热点信息,更新至所述正样本中;
从距离小于或等于预设距离的热点信息中选取出得分小于或等于预设分 数的热点信息,更新至所述负样本中;
根据所述更新之后的正样本和负样本,对所述初始模型进行训练,得到所述第一目标模型。
在第一方面的又一种可能设计中,所述根据所述目标热点信息和所述应用场景信息,获取潜在物品需求,包括:
根据所述应用场景信息,在所述第二网站获取与所述应用场景信息关联的热点信息集合;
根据所述热点信息集合,对预设分类模型进行训练得到第二目标模型;
根据所述目标热点信息、所述应用场景信息和所述第二目标模型,获取所述潜在物品需求。
第二方面,本公开实施例提供一种热点信息的处理装置,包括:
物品获取模块,用于获取第一网站的热点物品,确定所述热点物品对应的应用场景信息,所述热点物品为所述第一网站中用户关注度达到预设阈值的物品;
信息关联模块,用于从第二网站中获取热点信息,在所述热点信息中确定出与所述应用场景信息相关联的目标热点信息;
需求获取模块,用于根据所述目标热点信息和所述应用场景信息,获取潜在物品需求,所述潜在物品需求用于指示用户在所述第一网站中感兴趣的物品。
在第二方面的一种可能设计中,在获取第一网站的热点物品时,所述物品获取模块具体用于:
获取所述用户对所述第一网站中物品的历史行为,所述历史行为包括浏览行为、搜索行为、下单行为和收藏行为中的至少一种;
根据所述历史行为的次数和所述预设阈值,从所述第一网站的物品中选取出所述热点物品。
在第二方面的另一种可能设计中,在确定所述热点物品对应的应用场景信息时,所述物品获取模块具体用于:
获取所述热点物品的物品信息,所述物品信息包括标题信息和属性信息;
根据所述标题信息和属性信息,获取所述热点物品的应用场景信息。
在第二方面的又一种可能设计中,在所述根据所述标题信息和属性信息, 获取所述热点物品的应用场景信息时,所述物品获取模块具体用于:
从所述标题信息中提取得到场景分词,所述场景分词为描述所述热点物品的应用场景的词语;
获取所述属性信息中的预设属性,确定所述预设属性对应的属性值;
将所述场景分词与所述属性值进行组合,得到所述应用场景信息。
在第二方面的又一种可能设计中,在所述热点信息中确定出与所述应用场景信息相关联的目标热点信息时,所述信息关联模块具体用于:
根据所述热点信息和所述应用场景信息,对预设模型进行训练得到第一目标模型;
根据所述第一目标模型,从所述热点信息中确定出与所述应用场景信息关联的目标热点信息。
第三方面,本公开实施例提供一种服务器,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上所述的方法。
第四方面,本公开实施例提供一种可读存储介质,所述可读存储介质中存储有计算机指令,所述计算机指令被处理器执行时用于实现如上述的方法。
第五方面,本公开实施例提供一种程序产品,包括计算机指令,该计算机指令被处理器执行时实现如上所述的方法。
本公开实施例提供的热点信息的处理方法、装置、服务器和可读存储介质,通过获取第一网站中关注度较高的热门物品,并从其他网络站点获取热点信息,在这些热点信息中找到与热点物品的购物场景关联度高的目标热点,分析出潜在的物品需求,不需要人工实时监测其他网络站点的热点信息以及在购物网站中凭主观臆测选择热点物品集合,降低人工成本。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理;
图1为本公开实施例提供的热点信息的处理方法的场景示意图;
图2为本公开实施例提供的热点信息的处理方法实施例一流程示意图;
图3为本公开实施例提供的热点信息的处理方法实施例二的流程示意图;
图4为本公开实施例提供的热点信息的处理装置的结构示意图;
图5为本公开实施例提供的服务器的结构示意图。
通过上述附图,已示出本公开明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本公开构思的范围,而是通过参考特定实施例为本领域技术人员说明本公开的概念。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
首先对本公开所涉及的名词进行解释:
热点:指的是比较受广大群众关注,或者欢迎的新闻或者信息,或指某时期引人注目的地方或问题。如“社会热点”、“某地成为旅游的热点”。
图1为本公开实施例提供的热点信息的处理方法的场景示意图。本公开方案的应用场景可以是用户在网站购物所处场景。如图1所示,用户可以通过移动终端10(例如个人电脑)登录到网站上,浏览、点击、收藏和下单网站上展示的物品。用户根据其个人的兴趣会对不同的物品产生不同的关注度。例如用户会关注当前生活中的热点信息,如果用户发现热点信息中潜在有一些物品需求,就可能会到网站下单这些物品。而网站为了能够提高用户的体验,也可以主动的去搜索当前生活中的热点信息,然后从中找到一些用户可能需要的物品,展示在页面上。
但是在实际生活应用中,需要网站的维护人员人工去获取热点信息,然后根据主观臆测的方式,结合热点信息从网站中筛选出一些用户可能需求的物品,组成物品集合上传到网站的服务器11,然后由服务器11将这些物品分配推送给各个用户。这种方式需要大量的人工成本,而且对热点信息的响应速度不够快。为了能够降低人工成本,还有另外一种方法,其主要是通过算 法从其它网站服务器12中爬取舆情信息,然后输入到预测模型中预测出哪些舆情属于热点舆情,哪些属于非热点舆情。这种方法通过算法挖掘热点舆情不具备可解释性,容易导致历史经验无法复用,而且只能够确定出解决热点舆情的人工获取过程,无法解决后续从热点信息中找打关联的物品过程。
针对上述问题,本公开实施例提供了一种热点信息的处理方法、装置、服务器和可读存储介质,利用用户对网站上不同的物品的关注度,找到关注度较高的热点物品和该热点物品对应的应用场景信息,然后再从其它网站获取热点信息,确定出哪些热点信息是与应用场景信息关联度较高的目标热点信息,最后根据目标热点信息和应用场景信息,分析出用户的潜在物品需求,整个过程可以免除掉人工挖掘热点信息的工作量,能够实现目标热点信息的自动识别,并且降低人工筛选关联物品的成本,增强热点信息的可解释性,最终推理出用户潜在的物品需求,提高用户对网站的使用体验。
下面,通过具体实施例对本公开的技术方案进行详细说明。需要说明的是,下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。
图2为本公开实施例提供的热点信息的处理方法实施例一流程示意图,该方法可以应用于本地计算机设备,在实际应用中也可以应用于云端的服务器中。如图2所示,该方法具体可以包括如下步骤:
S201、获取第一网站的热点物品,确定热点物品对应的应用场景信息。
其中,热点物品为第一网站中用户关注度达到预设阈值的物品。
在本实施例中,第一网站可以是指供用户浏览、搜索、下单物品的网站,例如现有的一些购物网站。这些网站中通常包含有海量的物品信息,网站由于受显示界面的限制,一般只会从海量的物品中选取出部分物品,展示在显示界面上,而如果这些展示在显示界面上的物品都不是用户感兴趣的物品,则用户还需要再通过浏览、搜索等方式才能够找到自己感兴趣的物品。
其中,当第一网站的用户群体较大时,如果不同的用户均对某一个物品有关注,例如均对某一个物品产生了点击、搜索、下单等行为,则该物品的关注度就会超过预设阈值,该物品就可以称为热点物品。示例性的,服务器可以以一段时间(例如一个月)作为时间周期,每一个时间周期内统计一次各个物品的关注度,来确定出该时间周期内的哪些物品属于热点物品。
在本实施例中,热点物品通常都携带有物品信息以方便用户查阅和了解该物品,具体可以是标题信息和属性信息等。其中,标题信息通常是对物品的重要特征进行文字介绍,而属性信息则是对物品的详细属性进行文字介绍。
示例性的,热点物品的物品信息可以如下表1:
Figure PCTCN2022113119-appb-000001
表1
其中,属性信息就包含有物品毛重、物品产地、物品标识、所属类别和适用事件。
在本实施例中,应用场景信息可以从标题信息和属性信息中提取得到。例如以表1为例,应用场景信息可以为“学生_户外_复工、、学生_户外_开学、学生_户外_防疫、学生_户外_急救”。
S202、从第二网站中获取热点信息,在热点信息中确定出与应用场景信息相关联的目标热点信息。
示例性的,第二网站可以是除第一网站之外的一些网络站点,例如社交网络站点、资讯网络站点、游戏论坛等等。
在本实施例中,热点信息通常是近期引起社会广泛关注和讨论的话题,热点信息通常由简单的一句话组成。示例性的,热点信息可以是#最强寒潮来了#、#户外新旅行#、#霍金去世#。
其中,有些热点信息能够与应用场景信息产生关联关系,则这些热点信息会被作为目标热点信息。具体的,可以将应用场景信息和热点信息分别进行向量化,然后得出应用场景信息与热点信息之间的向量距离,然后根据向量距离来确定哪些热点信息与应用场景信息存在关联关系。
示例性的,热点信息#最强寒潮来了#与应用场景信息“补水_户外_御寒”存在关联关系,而#霍金去世#则没有与之相关的应用场景信息。
S203、根据目标热点信息和应用场景信息,获取潜在物品需求。
其中,潜在物品需求用于指示用户在第一网站中感兴趣的物品,示例性的,用户感兴趣的物品可以是户外背包、口罩等等。
在本实施例中,可以将目标热点信息与应用场景信息相关联,找到一些关联性较强的应用场景信息,从中推理得到潜在物品需求。示例性的,以关联的应用场景信息为“男生_秋冬季_户外_旅行”为例,则可以从该应用场景信息中分析得到户外背包、秋冬季外套、旅行帐篷等物品。
进一步的,当得到用于指示用户感兴趣的物品的潜在物品需求之后,可以将这些用户感兴趣的物品展示到第一网站的页面上,供用户查阅。如此就能够实现根据当前的热点信息,来推送与之相关的物品给用户,提高物品的下单成功率。
本公开实施例通过从第一网站获取热点物品和热点物品对应的应用场景信息,然后从第二网站中筛选得到与应用场景信息相关联的目标热点信息,不需要人工挖掘和筛选热点,降低了人工成本。同时还能够根据目标热点信息与应用场景信息推理得到潜在物品需求,精准的找到用户感兴趣的物品,最终能够实现物品的准确推送,提高物品的购买转化率。
在一些实施例中,上述步骤S201中“获取第一网站的热点物品”具体可以通过如下步骤实现:获取用户对第一网站中物品的历史行为;根据历史行为的次数和预设阈值,从第一网站的物品中选取出热点物品。其中,历史行为包括浏览行为、搜索行为、下单行为和收藏行为中的至少一种。
在本实施例中,历史行为可以是用户近期(例如一个月)在第一网站上的行为,热点物品是指用户的历史行为的次数达到某一个频次以上的物品。其中,预设阈值可以是经验值。示例性的,可以将所有用户近一个月总浏览超过一百万次的物品作为热点物品。
本公开实施例通过利用用户在网站上的历史行为,挖掘出网站中的哪些物品为热点物品,避免使用人工挖掘以及人工主观臆测热点物品,使得热点物品具有可解释性,同时也减少人工成本的同时还能够提高热点物品的挖掘效率。
在一些实施例中,上述步骤S201中“确定热点物品对应的应用场景信息”,具体可以通过如下步骤实现:获取热点物品的物品信息;根据标题信息和属性信息,获取热点物品的应用场景信息。其中,物品信息包括标题信息和属性信息。
在本实施例中,属性信息还可以包括物品的适用季节、适用人群、适用事件等属性,不同的属性对应有不同的属性值。示例性的,参考上表1,上述表1中属性信息包括有适用事件,其中,适用事件对应的属性值为防疫、急救。
本公开实施例通过对物品信息拆分为标题信息和属性信息,通过标题信息和属性信息中来得到热门物品的应用场景信息,能够使得到的应用场景信息更加的准确。
进一步的,在上述实施例的基础上,在一些实施例中,上述步骤“根据标题信息和属性信息,获取热点物品的应用场景信息”可以通过如下步骤实现:从标题信息中提取得到场景分词;获取属性信息中的预设属性,确定预设属性对应的属性值;将场景分词与属性值进行组合,得到应用场景信息。中,场景分词为描述热点物品的应用场景的词语。
在本实施例中,对于标题信息,可以基于序列标注模型识别标题信息中的场景分词。示例性的,序列标注模型包括但不限于概率图模型和深度学习模型。其中,概率图模型可以是条件随机场(conditional random field algorithm,CRF)、隐马尔可夫模型(Hidden Markov Model,HMM),深度学习模型可以是长短期记忆人工神经网络模型(Bi-directional Long Short-Term Memory conditional random field algorithm,BiLSTM-CRF)。
其中,序列标注模型对应的词性类别可以定义为四种:适用事件、适用地点、适用人群和适用时间。通过四个词性,从标题信息中提取得到场景分词。
示例性的,以表1中的标题“AL-NASR/阿尔纳斯防疫包复工防疫套装 学生开学防疫包户外杀菌套装便携随身防疫用品学生消毒套餐便携防疫包(家庭包)”为例,其提取得到的场景分词如下表2所示:
场景分词 词性
复工 适用事件
学生 适用人群
开学 适用事件
户外 适用地点
防疫 适用事件
表2
在本实施例中,在提取得到场景分词和属性值之后,通常场景分词和属性值至少都分别会包含有一个词语,示例性的,表2中的场景分词就包括有5个词语。示例性的,应用场景信息的格式可以定义为适用人群_适用地点_适用事件。可以按照应用场景的格式,将对应的场景分词和属性值填入进去,即得到了应用场景信息。
示例性的,以适用人群对应的场景分词为学生,适用地点对应的场景分词为户外,适用事件对应的场景分词为复工、开学、防疫,以及表1中适用事件对应的属性值为防疫、急救为例,则排列组合得到的应用场景信息包括:学生_户外_复工、学生_户外_开学、学生_户外_防疫、学生_户外_急救。
本公开实施例通过提取标题信息中的场景分词和预设属性对应的属性值,能够组合得到更加准确的应用场景信息,方便后续准确的找到潜在物品需求。
在一些实施例中,上述步骤S202中“在热点信息中确定出与应用场景信息相关联的目标热点信息”具体可以通过如下步骤实现:根据热点信息和应用场景信息,对预设模型进行训练得到第一目标模型;根据第一目标模型,从热点信息中确定出与应用场景信息关联的目标热点信息。
在本实施例中,预设模型可以是基于场景的机器学习二分类模型,包括但不限于逻辑回归、朴素贝叶斯等方法。预设模型的训练数据可以是从第二网站中获取的热点信息。
示例性的,在获取热点信息的过程中,还可以基于已经得到的热点信息,对热点信息进行数据增强。具体的,基于已经得到的热点信息,从第二网站继续查找与该热点信息的相关信息,其中,相关信息可以是与热点信息相关 的话题以及话题下的讨论内容。例如以#最强寒潮来了#为热点信息为例,相关话题可以是与该热点信息相关的讨论信息,例如“#最强寒潮来了#朋友们~来分享一下御寒装备呀”、“霸王级寒潮来了!皮肤严重缺水怎么办?”。
在本实施例中,在得到热点信息之后,可以对热点信息进行自动打标,找到与应用场景信息相关的热点信息作为正样本,其它剩余热点信息作为负样本,以对后续第一目标模型的训练。
本公开实施例通过对热点信息进行数据增强,能够找到热点信息对应的潜在物品需求,同时为后续第一目标模型的训练可以积累更多的训练数据。提高第一目标模型的预测效果,能够利用第一目标模型自动从任意信息中甄别出与物品相关的热点信息,不需要人工从网络站点来爬取热点信息,并确定出哪些热点信息与物品相关,减少人工成本,同时也能够提高对热点信息的反应速度,避免热点信息过时。
进一步的,在上述实施例的基础上,在一些实施例中,上述步骤“根据热点信息和关联信息,对预设模型进行训练得到第一目标模型”具体可以通过如下步骤实现:对热点信息进行向量化,得到热点信息的表征向量;对应用场景信息进行向量化,得到应用场景信息的表征向量;获取热点信息的表征向量与应用场景信息的表征向量的距离;根据距离,确定预设模型的正样本和负样本;根据预设模型的训练数据,对预设模型进行训练得到第一目标模型。其中,正样本和负样本用于作为预设模型的训练数据。
在本实施例中,可以将热点信息使用语义向量进行表征,示例性的,可以基于快速文本(fasttext)等向量工具为热点信息生成对应的语义向量,以此来表征热点信息。同理,应用场景信息也可以使用语义向量进行表征。
在本实施例中,可以计算热点信息的表征向量与应用场景信息的表征向量的向量距离,示例性的,向量距离包括但不限于余弦距离、欧式距离等。具体的,以余弦距离为例,可以设置距离阈值(例如0.95),选取向量距离大于距离阈值的热点信息作为与应用场景信息强相关的,而向量距离小于或等于距离阈值的热点信息作为与应用场景信息弱相关的。
其中,与应用场景信息强相关的热点信息为正样本,与应用场景信息弱相关的热点信息为负样本。
本公开实施例通过利用向量距离,选取出正样本和负样本作为预设模型 的训练数据,训练得到第一目标模型,能够实现突发舆情自动识别,极大地提升了购物网站针对时效的热点信息的反应速度,为用户推送与热点信息相关的物品,以提高物品的购买转化率。
进一步的,在上述实施例的基础上,在一些实施例中,上述步骤“对热点信息进行向量化,得到热点信息的表征向量”,具体可以通过如下步骤实现:热点信息进行分词得到至少一个热点分词;对每个热点分词进行向量化,得到每个热点分词对应的分词向量;根据每个热点分词对应的分词向量,获取热点信息的表征向量。
在本实施例中,热点信息为一段简短的句子,热点信息需要分词得到若干个热点分词,并计算每一个热点分词对应的分词向量,最后将每一个热点分词的分词向量整合,得到该热点信息的表征向量。
示例性的,以热点信息为#最强寒潮来了#为例,其进行数据增强之后得到了相关信息“#最强寒潮来了#朋友们~来分享一下御寒装备呀”和“霸王级寒潮来了!皮肤严重缺水怎么办?”。对其进行分词的过程可以参见表3:
Figure PCTCN2022113119-appb-000002
表3
其中,在对热点信息进行分词的过程中可以采用结巴分词等分词工具,然后为各个分词生活曾对应的语义向量,最终得到热点信息的表征向量。
示例性的,若一个热点信息T包含若干热点分词,即T={w 1,w 2,,,w n},其中w n为第n个热点分词(n为不小于1的正整数)。则可以得到分词向量
Figure PCTCN2022113119-appb-000003
Figure PCTCN2022113119-appb-000004
其中,
Figure PCTCN2022113119-appb-000005
为第n个热点分词对应分词向量。热点信息的表征向量为:
Figure PCTCN2022113119-appb-000006
上式中,Vec T为热点信息的表征向量。
在上述实施例的基础上,在一些实施例中,上述步骤“对应用场景信息进行向量化,得到应用场景信息的表征向量”,具体可以通过如下步骤实现:获取应用场景信息的场景名称和应用场景信息中包含的物品信息;对场景名称进行向量化,得到场景名称的表征向量;对物品信息进行向量化,物品信息的表征向量;根据场景名称的表征向量和物品信息的表征向量,得到应用场景信息的表征向量。其中,应用场景信息中包含有至少一个物品信息。示例性的,以下表4为例,表4中的应用场景信息的场景名称为学生_户外_防疫,可以通过应用场景信息的场景名称查询得到对应的三个物品。
Figure PCTCN2022113119-appb-000007
表4
在本实施例中,可以先获取场景名称的分词集合和物品名称的分词集合,然后对场景名称和物品名称集的分词集合进行向量化,得到场景名称的表征向量和物品信息的表征向量,然后再结合得到应用场景信息的表征向量。
示例性的,若应用场景信息的场景名称为学生_户外_防疫,则场景名称的分词集合为{学生,户外,防疫}。
示例性的,用S name表示场景名称的分词集合,其包括分词S name={C 1,C 2,,,C n},C n表示第n个场景名称的分词。可以得到每个场景名称对应的分词向量集合
Figure PCTCN2022113119-appb-000008
Figure PCTCN2022113119-appb-000009
表示第n个场景名称的分词向量,则场景名称的表征向量为:
Figure PCTCN2022113119-appb-000010
上式中,
Figure PCTCN2022113119-appb-000011
表示场景名称的表征向量。
在获取物品信息的表征向量时,可以设任意物品信息的分词集合为S sku
Figure PCTCN2022113119-appb-000012
其中,S n为任意物品信息的第n个分词。任意物品信息对应的分词向量集合为
Figure PCTCN2022113119-appb-000013
示例性的,任意物品信息的分词集合来源于物品的标题名称和物品的属性信息。由此可以得到所有物品信息的向量表征为:
Figure PCTCN2022113119-appb-000014
上式中,k表示应用场景信息中包含有k个物品信息,
Figure PCTCN2022113119-appb-000015
表示第j个物品信息对应的分词向量集合,
Figure PCTCN2022113119-appb-000016
表示购物场景下第j个商品对应的第i个分词对应的向量。
本实施例中,通过结合前述场景名称的表征向量和物品信息的表征向量可以得到应用场景信息的表征向量为
Figure PCTCN2022113119-appb-000017
在上述实施例的基础上,在一些实施例中,上述步骤“根据距离,确定预设模型的正样本和负样本”,具体可以通过如下步骤实现:获取距离大于预设距离的热点信息,作为预设模型的正样本;选取距离小于或等于预设距离的热点信息,作为预设模型的负样本。
在本实施例中,可以以余弦距离为例,向量距离的计算公式如下:
Figure PCTCN2022113119-appb-000018
上式中,dis表示热点信息的表征向量与应用场景信息的表征向量的距离,Vec scene表示应用场景信息的表征向量,Vec T表示热点信息的表征向量。
示例性的,可以设置预设距离为0.95,当热点信息的表征向量与应用场景信息的表征向量大于预设距离,则该热点信息会作为正样本。而剩余的热点信息则会作为负样本。
进一步的,在一些实施例中,上述步骤“根据预设模型的训练数据,对预设模型进行训练得到第一目标模型”,具体可以通过如下步骤实现:根据预设模型的正样本和负样本,对预设模型进行训练得到初始模型;根据初始 模型对距离小于或等于预设距离的热点信息进行打分,得到每个距离小于或等于预设距离的热点信息对应的得分;从距离小于或等于预设距离的热点信息中选取出得分大于预设分数阈值的热点信息,更新至正样本中;从距离小于或等于预设距离的热点信息中选取出得分小于或等于预设分数的热点信息,更新至负样本中;根据更新之后的正样本和负样本,对初始模型进行训练,得到第一目标模型。
在本实施例中,在通过正样本和负样本对第一目标模型进行训练得到了初版的初始模型。后续可以利用该初始模型从原负样本预测出相关性分数,然后选取出分数较高的添加至原正样本中,以对原正样本进行更新得到更新之后的正样本,而将剩余的作为负样本,对原负样本进行更新。之后通过更新之后的正样本和负样本,对初始模型进行训练,得到第一目标模型。
示例性的,选取出分数超过预设分数阈值的添加至原正样本中。其中,相关性分数是指初始模型的预测结果,其表征的是热点信息与应用场景信息之间的相关性程度。
示例性的,以“最强寒潮来了”以及相关讨论“#最强寒潮来了#朋友们~来分享一下御寒装备呀”、“霸王级寒潮来了!皮肤严重缺水怎么办?”作为热点信息为例,这些热点信息都与应用场景信息“御寒_补水”相关,可以加入到正样本中。
本实施例中,可以对初始模型进行n次迭代训练,即每一次迭代训练之后,都对上一次的正样本和负样本进行更新,在n次迭代训练之后,得到一批带有批注的正样本和负样本。其中,n值可由经验阈值确定。在得到最带有批注的正负样本后,基于初始模型的模型结构,重新训练最新版本的模型,从而得到最终的第一目标模型。
本公开实施例通过在每一次迭代训练之后,对正样本和负样本进行一次更新,能够增强第一目标模型的泛化能力,使得第一目标模型可以应用于更加广泛的数据场景中,另一方面也可以用来挖掘热点信息的潜在物品需求。
在一些实施例中,上述步骤S203具体可以通过如下步骤实现:根据应用场景信息,在第二网站获取与应用场景信息关联的热点信息集合;根据热点信息集合,对预设分类模型进行训练得到第二目标模型;根据目标热点信息、应用场景信息和第二目标模型,获取潜在物品需求。
在本实施例中,可以基于应用场景信息,以搜索词的方法在第二网站上进行检索,得到每个应用场景信息相关的热点信息。示例性的,以应用场景信息为“男生_秋冬季_户外_旅行”为例,则搜索词为“男生秋冬季户外旅行”,可以在第二网站上检索得到与应用场景信息相关的#漠河生哥户外旅行网#、#我的旅行装备#等热点信息,作为训练数据来训练预设分类模型。
其中,训练数据即热点信息需要进行向量化,得到表征向量,之后作为预设分类模型的输入,训练得到第二目标模型。在得到第二目标模型之后,可以从第二网站上爬取任意热点信息,来预测其是否包含有潜在物品需求。
示例性的,预设分类模型可以是机器学习多分类模型,比如朴素贝叶斯模型、决策树等,也可以基于深度学习,比如常见的双塔模型。
图3为本公开实施例提供的热点信息的处理方法实施例二的流程示意图,如图3所示,该方法包括步骤:S301、热点事件库构建。S302、相关热点识别。S303、基于热点信息推理潜在物品需求。
在本实施例中,可以利用购物网站的物品库中包含的应用场景信息和用户近期的行为日志,构建得到热点事件库。相关热点可以是指与应用场景信息相关的热点信息。通过构建热点事件库,一方面可以免去人工挖掘热点的工作量,另一方面使得热点具有可解释性。相关热点识别则可以从任意数据来源的热点信息中甄别出与购物网站相关的热点信息,有效的降低人工筛选成本。潜在物品需求推理则可以实现热点信息与物品的自动化关联,准确的找到用户背后的物品需求,提升购物网站的物品点击率和转化率。
下述为本公开装置实施例,可以用于执行本公开方法实施例。对于本公开装置实施例中未披露的细节,请参照本公开方法实施例。
图4为本公开实施例提供的热点信息的处理装置的结构示意图,该热点信息的处理装置可以集成在服务器上,也可以独立于服务器且与服务器协同实现本方案。如图4所示,该热点信息的处理装置40包括物品获取模块41、信息关联模块42和需求获取模块43。
其中,物品获取模块41用于获取第一网站的热点物品,确定热点物品对应的应用场景信息。信息关联模块42用于从第二网站中获取热点信息,在热点信息中确定出与应用场景信息相关联的目标热点信息。需求获取模块43用于根据目标热点信息和应用场景信息,获取潜在物品需求。
其中,热点物品为第一网站中用户关注度达到预设阈值的物品,潜在物品需求用于指示用户在第一网站中感兴趣的物品。
在一些实施例中,上述物品获取模块41具体可以用于:
获取用户对第一网站中物品的历史行为;
根据历史行为的次数和预设阈值,从第一网站的物品中选取出热点物品。
其中,历史行为包括浏览行为、搜索行为、下单行为和收藏行为中的至少一种。
在一些实施例中,上述物品获取模块41具体可以用于:
获取热点物品的物品信息;
根据标题信息和属性信息,获取热点物品的应用场景信息。
其中,物品信息包括标题信息和属性信息。
可选的,在一些实施例中,上述物品获取模块41具体可以用于:
从标题信息中提取得到场景分词;
获取属性信息中的预设属性,确定预设属性对应的属性值;
将场景分词与属性值进行组合,得到应用场景信息。
其中,场景分词为描述热点物品的应用场景的词语。
在一些实施例中,信息关联模块42具体可以用于:
根据热点信息和应用场景信息,对预设模型进行训练得到第一目标模型;
根据第一目标模型,从热点信息中确定出与应用场景信息关联的目标热点信息。
可选的,在一些实施例中,信息关联模块42具体可以用于:
对热点信息进行向量化,得到热点信息的表征向量;
对应用场景信息进行向量化,得到应用场景信息的表征向量;
获取热点信息的表征向量与应用场景信息的表征向量的距离;
根据距离,确定预设模型的正样本和负样本;
根据预设模型的训练数据,对预设模型进行训练得到第一目标模型。
其中,正样本和负样本用于作为预设模型的训练数据。
可选的,在一些实施例中,信息关联模块42具体可以用于:
对热点信息进行分词得到至少一个热点分词;
对每个热点分词进行向量化,得到每个热点分词对应的分词向量;
根据每个热点分词对应的分词向量,获取热点信息的表征向量。
可选的,在一些实施例中,信息关联模块42具体可以用于:
获取应用场景信息的场景名称和应用场景信息中包含的物品信息;
对场景名称进行向量化,得到场景名称的表征向量;
对物品信息进行向量化,物品信息的表征向量;
根据场景名称的表征向量和物品信息的表征向量,得到应用场景信息的表征向量。
其中,应用场景信息中包含有至少一个物品信息。
可选的,在一些实施例中,信息关联模块42具体可以用于:
获取距离大于预设距离的热点信息,作为预设模型的正样本;
选取距离小于或等于预设距离的热点信息,作为预设模型的负样本。
可选的,在一些实施例中,信息关联模块42具体可以用于:
根据预设模型的正样本和负样本,对预设模型进行训练得到初始模型;
根据初始模型对距离小于或等于预设距离的热点信息进行打分,得到每个距离小于或等于预设距离的热点信息对应的得分;
从距离小于或等于预设距离的热点信息中选取出得分大于预设分数阈值的热点信息,更新至正样本中;
从距离小于或等于预设距离的热点信息中选取出得分小于或等于预设分数的热点信息,更新至负样本中;
根据更新之后的正样本和负样本,对初始模型进行训练,得到第一目标模型。
在一些实施例中,需求获取模块43具体可以用于:
根据应用场景信息,在第二网站获取与应用场景信息关联的热点信息集合;
根据热点信息集合,对预设分类模型进行训练得到第二目标模型;
根据目标热点信息、应用场景信息和第二目标模型,获取潜在物品需求。
本公开实施例提供的装置,可用于执行上述实施例中的方法,其实现原理和技术效果类似,在此不再赘述。
需要说明的是,应理解以上装置的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上 分开。且这些模块可以全部以软件通过处理元件调用的形式实现例如,物品获取模块可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上物品获取模块的功能。其它模块的实现与之类似。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本公开实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘solid state disk(SSD))等。
图5为本公开实施例提供的服务器的结构示意图。如图5所示,该服务器50包括:至少一个处理器51、存储器52、总线53及通信接口54。其中:处理器51、通信接口54以及存储器52通过总线53完成相互间的通信。通信接口54,用于与其它设备进行通信。示例性的,该通信接口54可以与第二网站的服务器进行通信,以从第二网站获取热点信息。处理器51,用于执行存储器52中存储的计算机执行指令,具体可以执行上述实施例中所描述的方法中的相关步骤。处理器可能是中央处理器。服务器包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。存储器,用于存放计算机执行指令。存储器可能包含高速RAM存储器,也可能还包括非易失性存储器,例如至少一个磁盘存储器。
本实施例还提供一种可读存储介质,可读存储介质中存储有计算机指令, 当服务器的至少一个处理器执行该计算机指令时,服务器执行上述的各种实施方式提供的热点信息的处理方法。
本实施例还提供一种程序产品,该程序产品包括计算机指令,该计算机指令存储在可读存储介质中。服务器的至少一个处理器可以从可读存储介质读取该计算机指令,至少一个处理器执行该计算机指令使得服务器实施上述的各种实施方式提供的热点信息的处理方法。
本公开中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系;在公式中,字符“/”,表示前后关联对象是一种“相除”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中,a,b,c可以是单个,也可以是多个。
可以理解的是,在本公开实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本公开的实施例的范围。在本公开的实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开的实施例的实施过程构成任何限定。
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或对其中部分或全部技术特征进行等同替换;而这些修改或替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。

Claims (19)

  1. 一种热点信息的处理方法,其特征在于,包括:
    获取第一网站的热点物品,确定所述热点物品对应的应用场景信息,所述热点物品为所述第一网站中用户关注度达到预设阈值的物品;
    从第二网站中获取热点信息,在所述热点信息中确定出与所述应用场景信息相关联的目标热点信息;
    根据所述目标热点信息和所述应用场景信息,获取潜在物品需求,所述潜在物品需求用于指示用户在所述第一网站中感兴趣的物品。
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一网站的热点物品,包括:
    获取所述用户对所述第一网站中物品的历史行为,所述历史行为包括浏览行为、搜索行为、下单行为和收藏行为中的至少一种;
    根据所述历史行为的次数和所述预设阈值,从所述第一网站的物品中选取出所述热点物品。
  3. 根据权利要求1或2所述的方法,其特征在于,所述确定所述热点物品对应的应用场景信息,包括:
    获取所述热点物品的物品信息,所述物品信息包括标题信息和属性信息;
    根据所述标题信息和属性信息,获取所述热点物品的应用场景信息。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述标题信息和属性信息,获取所述热点物品的应用场景信息,包括:
    从所述标题信息中提取得到场景分词,所述场景分词为描述所述热点物品的应用场景的词语;
    获取所述属性信息中的预设属性,确定所述预设属性对应的属性值;
    将所述场景分词与所述属性值进行组合,得到所述应用场景信息。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述在所述热点信息中确定出与所述应用场景信息相关联的目标热点信息,包括:
    根据所述热点信息和所述应用场景信息,对预设模型进行训练得到第一目标模型;
    根据所述第一目标模型,从所述热点信息中确定出与所述应用场景信息关联的目标热点信息。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述热点信息和所述关联信息,对预设模型进行训练得到第一目标模型,包括:
    对所述热点信息进行向量化,得到所述热点信息的表征向量;
    对所述应用场景信息进行向量化,得到所述应用场景信息的表征向量;
    获取所述热点信息的表征向量与所述应用场景信息的表征向量的距离;
    根据所述距离,确定所述预设模型的正样本和负样本,所述正样本和负样本用于作为所述预设模型的训练数据;
    根据所述预设模型的训练数据,对所述预设模型进行训练得到所述第一目标模型。
  7. 根据权利要求6所述的方法,其特征在于,所述对所述热点信息进行向量化,得到所述热点信息的表征向量,包括:
    对所述热点信息进行分词,得到至少一个热点分词;
    对每个热点分词进行向量化,得到每个热点分词对应的分词向量;
    根据每个热点分词对应的分词向量,获取所述热点信息的表征向量。
  8. 根据权利要求6所述的方法,其特征在于,所述对所述应用场景信息进行向量化,得到所述应用场景信息的表征向量,包括:
    获取所述应用场景信息的场景名称和所述应用场景信息中包含的物品信息,所述应用场景信息中包含有至少一个物品信息;
    对所述场景名称进行向量化,得到所述场景名称的表征向量;
    对所述物品信息进行向量化,所述物品信息的表征向量;
    根据所述场景名称的表征向量和所述物品信息的表征向量,得到所述应用场景信息的表征向量。
  9. 根据权利要求6所述的方法,其特征在于,所述根据所述距离,确定所述预设模型的正样本和负样本,包括:
    获取所述距离大于预设距离的热点信息,作为所述预设模型的正样本;
    选取所述距离小于或等于预设距离的热点信息,作为所述预设模型的负样本。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述预设模型的训练数据,对所述预设模型进行训练得到第一目标模型,包括:
    根据所述预设模型的正样本和负样本,对所述预设模型进行训练得到初 始模型;
    根据所述初始模型对距离小于或等于预设距离的热点信息进行打分,得到每个距离小于或等于预设距离的热点信息对应的得分;
    从距离小于或等于预设距离的热点信息中选取出得分大于预设分数阈值的热点信息,更新至所述正样本中;
    从距离小于或等于预设距离的热点信息中选取出得分小于或等于预设分数的热点信息,更新至所述负样本中;
    根据所述更新之后的正样本和负样本,对所述初始模型进行训练,得到所述第一目标模型。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述根据所述目标热点信息和所述应用场景信息,获取潜在物品需求,包括:
    根据所述应用场景信息,在所述第二网站获取与所述应用场景信息关联的热点信息集合;
    根据所述热点信息集合,对预设分类模型进行训练得到第二目标模型;
    根据所述目标热点信息、所述应用场景信息和所述第二目标模型,获取所述潜在物品需求。
  12. 一种热点信息的处理装置,其特征在于,包括:
    物品获取模块,用于获取第一网站的热点物品,确定所述热点物品对应的应用场景信息,所述热点物品为所述第一网站中用户关注度达到预设阈值的物品;
    信息关联模块,用于从第二网站中获取热点信息,在所述热点信息中确定出与所述应用场景信息相关联的目标热点信息;
    需求获取模块,用于根据所述目标热点信息和所述应用场景信息,获取潜在物品需求,所述潜在物品需求用于指示用户在所述第一网站中感兴趣的物品。
  13. 根据权利要求12所述的装置,其特征在于,在获取第一网站的热点物品时,所述物品获取模块具体用于:
    获取所述用户对所述第一网站中物品的历史行为,所述历史行为包括浏览行为、搜索行为、下单行为和收藏行为中的至少一种;
    根据所述历史行为的次数和所述预设阈值,从所述第一网站的物品中选 取出所述热点物品。
  14. 根据权利要求12或13所述的装置,其特征在于,在确定所述热点物品对应的应用场景信息时,所述物品获取模块具体用于:
    获取所述热点物品的物品信息,所述物品信息包括标题信息和属性信息;
    根据所述标题信息和属性信息,获取所述热点物品的应用场景信息。
  15. 根据权利要求14所述的装置,其特征在于,在所述根据所述标题信息和属性信息,获取所述热点物品的应用场景信息时,所述物品获取模块具体用于:
    从所述标题信息中提取得到场景分词,所述场景分词为描述所述热点物品的应用场景的词语;
    获取所述属性信息中的预设属性,确定所述预设属性对应的属性值;
    将所述场景分词与所述属性值进行组合,得到所述应用场景信息。
  16. 根据权利要求12-15任一项所述的装置,其特征在于,在所述热点信息中确定出与所述应用场景信息相关联的目标热点信息时,所述信息关联模块具体用于:
    根据所述热点信息和所述应用场景信息,对预设模型进行训练得到第一目标模型;
    根据所述第一目标模型,从所述热点信息中确定出与所述应用场景信息关联的目标热点信息。
  17. 一种服务器,其特征在于,包括:处理器,以及与所述处理器通信连接的存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1-11中任一项所述的方法。
  18. 一种可读存储介质,其特征在于,所述可读存储介质中存储有计算机指令,所述计算机指令被处理器执行时用于实现如权利要求1-11任一项所述的方法。
  19. 一种程序产品,包括计算机指令,其特征在于,该计算机指令被处理器执行时实现权利要求1-11任一项所述的方法。
PCT/CN2022/113119 2022-01-26 2022-08-17 热点信息的处理方法、装置、服务器和可读存储介质 WO2023142448A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210092682.8A CN116541587A (zh) 2022-01-26 2022-01-26 热点信息的处理方法、装置、服务器和可读存储介质
CN202210092682.8 2022-01-26

Publications (1)

Publication Number Publication Date
WO2023142448A1 true WO2023142448A1 (zh) 2023-08-03

Family

ID=87449370

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/113119 WO2023142448A1 (zh) 2022-01-26 2022-08-17 热点信息的处理方法、装置、服务器和可读存储介质

Country Status (2)

Country Link
CN (1) CN116541587A (zh)
WO (1) WO2023142448A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925986A (zh) * 2021-04-08 2021-06-08 国网电子商务有限公司 商品对象推荐方法、装置、电子设备及存储介质
WO2021169218A1 (zh) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 数据推送方法、系统、电子装置及存储介质
CN113743973A (zh) * 2020-11-30 2021-12-03 北京沃东天骏信息技术有限公司 分析市场热点趋势的方法和装置
CN113744011A (zh) * 2020-06-17 2021-12-03 北京沃东天骏信息技术有限公司 物品搭配方法和物品搭配装置
CN113821718A (zh) * 2021-02-01 2021-12-21 北京沃东天骏信息技术有限公司 一种物品信息推送方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169218A1 (zh) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 数据推送方法、系统、电子装置及存储介质
CN113744011A (zh) * 2020-06-17 2021-12-03 北京沃东天骏信息技术有限公司 物品搭配方法和物品搭配装置
CN113743973A (zh) * 2020-11-30 2021-12-03 北京沃东天骏信息技术有限公司 分析市场热点趋势的方法和装置
CN113821718A (zh) * 2021-02-01 2021-12-21 北京沃东天骏信息技术有限公司 一种物品信息推送方法和装置
CN112925986A (zh) * 2021-04-08 2021-06-08 国网电子商务有限公司 商品对象推荐方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN116541587A (zh) 2023-08-04

Similar Documents

Publication Publication Date Title
US10026021B2 (en) Training image-recognition systems using a joint embedding model on online social networks
CN107609152B (zh) 用于扩展查询式的方法和装置
CN105224699B (zh) 一种新闻推荐方法及装置
US8909648B2 (en) Methods and systems of supervised learning of semantic relatedness
KR20200094627A (ko) 텍스트 관련도를 확정하기 위한 방법, 장치, 기기 및 매체
US20160170982A1 (en) Method and System for Joint Representations of Related Concepts
CN107862553A (zh) 广告实时推荐方法、装置、终端设备及存储介质
US20150052098A1 (en) Contextually propagating semantic knowledge over large datasets
CN110110225B (zh) 基于用户行为数据分析的在线教育推荐模型及构建方法
CN105868267B (zh) 一种移动社交网络用户兴趣的建模方法
US20150026105A1 (en) Systems and method for determining influence of entities with respect to contexts
WO2020233344A1 (zh) 一种搜索方法、装置及存储介质
AU2017232659A1 (en) Similarity mining method and device
CN112559684A (zh) 一种关键词提取及信息检索方法
CN111444304A (zh) 搜索排序的方法和装置
CN112926308B (zh) 匹配正文的方法、装置、设备、存储介质以及程序产品
CN111522886B (zh) 一种信息推荐方法、终端及存储介质
CN112818230B (zh) 内容推荐方法、装置、电子设备和存储介质
CN110795613B (zh) 商品搜索方法、装置、系统及电子设备
CN103761286B (zh) 一种基于用户兴趣的服务资源检索方法
CN107832319B (zh) 一种基于语义关联网络的启发式查询扩展方法
Servia-Rodríguez et al. Inferring contexts from Facebook interactions: A social publicity scenario
CN112487304B (zh) 基于观点向量化的影响力传播模型的建立方法
US20130332440A1 (en) Refinements in Document Analysis
CN107908749B (zh) 一种基于搜索引擎的人物检索系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22923252

Country of ref document: EP

Kind code of ref document: A1