CN112182396A

CN112182396A - Information pushing method based on user behaviors

Info

Publication number: CN112182396A
Application number: CN202011084095.1A
Authority: CN
Inventors: 罗列异; 黄吉琦; 任益斌; 张轲; 程韶曦; 金松; 张帆; 王迪先
Original assignee: Zhejiang Xinlan Network Media Co ltd
Current assignee: Zhejiang Xinlan Network Media Co ltd
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2021-01-05

Abstract

The invention discloses an information pushing method based on user behaviors, which comprises the following steps: classifying the text data through an LDA text clustering model; analyzing all text data belonging to the same category to obtain a character label of each category; marking the picture data and the video data through a standard celebrity picture library; receiving user behavior data sent by a user side; calculating and analyzing the user behavior data to obtain a user behavior calculation result and storing the user behavior calculation result in user configuration information of the user; calculating the favorite result of the user according to the updated calculation results of all user behaviors; and recommending targeted content to the user according to the preference result. According to the information pushing method based on the user behaviors, the text data are classified through the LDA text clustering model, and then all the text data belonging to the same category are analyzed to obtain the character label of each category, so that a news information provider can accurately identify the preference of a user.

Description

Information pushing method based on user behaviors

Technical Field

The invention relates to an information pushing method based on user behaviors.

Background

With the development of internet technology, more and more people prefer to obtain information from the network, and various news information type APPs emerge endlessly. In order to improve user experience, a plurality of news information APP acquire operation behaviors of a user and judge the preference of the user, so that contents which the user may like are recommended to the user in a targeted mode.

And when the news data in the news information APP are stored in a warehouse, the operation of classification labels is required. Especially when processing text data, classification is generally performed by an LDA text clustering model. Under the condition of unsupervised learning, the LDA text clustering model is divided into self-defined category quantities from mass text data, and the problem that training cannot be performed due to the fact that the data quantity is too large and labeling cannot be conducted is solved. However, the categories separated by the LDA text clustering model cannot be labeled with typical chinese meanings, and the information provider cannot identify the content of text data in each category.

Disclosure of Invention

The invention provides an information pushing method based on user behaviors, which adopts the following technical scheme:

an information pushing method based on user behaviors comprises the following steps:

acquiring a plurality of unmarked news data, wherein the news data comprises text data, picture data and video data;

cleaning news data;

classifying the text data through an LDA text clustering model;

analyzing all classified text data belonging to the same category to obtain a character label of each category;

matching the picture data through a labeled standard celebrity picture library to label the picture data;

extracting frames from the video data, and matching the extracted frame pictures of the video data through a labeled standard celebrity picture library to label the video data;

storing the classified and labeled news data into a system for a user to browse, and receiving user behavior data sent by a user side;

calculating and analyzing the user behavior data to obtain a user behavior calculation result, and storing the user behavior calculation result into user configuration information of the user, wherein all the user behavior calculation results of the user are stored in the user configuration information;

and calculating the favorite result of the user according to the updated calculation results of all the user behaviors, and recommending targeted content to the user according to the favorite result.

Furthermore, an effective calculation period is set, the preference result is updated in the period at regular time, and all the user behavior calculation results in the effective calculation period are recalculated to obtain a new preference result.

Furthermore, the calculation results of the user behaviors are divided into a plurality of statistical categories, and different effective calculation periods are set for the calculation results of different statistical categories.

Further, when the preference result is calculated, different calculation weights are set according to the sequence of the generation time of the user behavior result.

Further, when calculating the preference result of the user, the closer the generation time of the user behavior result is to the current time, the larger the corresponding calculation weight is.

Further, if the text data is currently browsed by the user, the category of the text data currently browsed by the user is obtained, and a plurality of text data are selected from other text data of the category and pushed to the user.

Further, a specific method for selecting a plurality of text data from other text data of the category and pushing the selected text data to the user is as follows:

acquiring other text data under the category from the system;

sequencing the acquired text data according to the heat;

and pushing a plurality of text data ranked at the top to the user.

acquiring other text data under the category from the system;

ordering the acquired text data according to the heat degree to obtain a first order;

sequencing the acquired text data according to the release time to obtain a second sequence;

setting a calculation weight value for the text data in the first sequence according to a first rule;

setting a calculation weight value for the text data in the second sequence according to a second rule;

calculating the comprehensive weight of each text data;

reordering according to the comprehensive weight of each text data under the category to obtain a third ordering;

and pushing a plurality of text data ranked at the top in the third ranking to the user.

Further, text data is classified through an LDA text clustering model, and the classification quantity is set according to needs.

Further, the number of classifications is set according to the total amount of text data.

The information pushing method based on the user behaviors has the advantages that after text data are classified through the LDA text clustering model, all the classified text data belonging to the same category are analyzed to obtain the character label of each category, and a news information provider can accurately identify the preference of a user.

Drawings

Fig. 1 is a schematic diagram of an information pushing method based on user behavior according to the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and the embodiments.

Fig. 1 shows an information push method based on user behavior according to the present invention, which mainly includes the following steps: s1: and acquiring a plurality of unmarked news data, wherein the news data comprises text data, picture data and video data. S2: and cleaning the news data. S3: and classifying the text data through an LDA text clustering model. S4: and analyzing all classified text data belonging to the same category to obtain the character label of each category. S5: and matching the picture data through the labeled standard celebrity picture library to label the picture data. S6: and performing frame extraction on the video data, and matching frame extraction pictures of the video data through a labeled standard celebrity picture library to label the video data. S7: and storing the classified and labeled news data into a system for a user to browse, and receiving user behavior data sent by the user side. S8: and calculating and analyzing the user behavior data to obtain a user behavior calculation result, and storing the user behavior calculation result into user configuration information of the user, wherein all the user behavior calculation results of the user are stored in the user configuration information. S9: and calculating the favorite result of the user according to the updated calculation results of all the user behaviors, and recommending targeted content to the user according to the favorite result. Through the steps, the preference of the user is calculated, and the appropriate content is pertinently recommended to the user according to the preference of the user. The above steps are specifically described below.

For step S1: and acquiring a plurality of unmarked news data, wherein the news data comprises text data, picture data and video data.

First, news data is acquired from a plurality of ways. Generally, news data is acquired from a plurality of information acquisition ports such as a green wave, a microblog and the like through the internet. These data include text data, picture data, and video data.

For step S2: and cleaning the news data.

The acquired news data is cleaned through this step. Such as removing the watermark from the text, certain descriptions added to the text by some data providers, etc.

For step S3: and classifying the text data through an LDA text clustering model.

And for the text data, classifying the text data through an LDA text clustering model. And before classifying the text data through the LDA text clustering model, setting the classification quantity according to the requirement. Preferably, in order to avoid excessive text data in one category, the number of categories is set according to the total amount of text data. The number of classifications is proportional to the total amount of text data, i.e., the larger the amount of text data, the larger the number of classifications is set.

It can be understood that when the text data is classified by the LDA text clustering model, negative data in the text data can be eliminated according to the setting.

For step S4: and analyzing all classified text data belonging to the same category to obtain the character label of each category.

The LDA text clustering model in step S3 classifies the text data, and the obtained different classes only have corresponding meaningless distinguishing codes. The content of the text data in each category cannot be obtained from these discrimination codes, which is specifically related to aspects such as military, cultural, political, and the like. In step S4, for each category obtained through the processing by the LDA text clustering model, intelligent semantic analysis is performed on the text data in the same category to obtain a text label of the text data in the category. After the processing of step S4, the classification without word label obtained by the LDA text clustering model processing will obtain a word label distinguished from other classes. By these text labels, it is possible to quickly identify to which aspect the contents of text data in different categories belong.

For step S5: and matching the picture data through the labeled standard celebrity picture library to label the picture data.

For the labeling of the picture data, matching is mainly carried out through a labeled standard celebrity picture library, the figures in the picture data are identified, and corresponding labels are marked on the picture data.

For step S6: and performing frame extraction on the video data, and matching frame extraction pictures of the video data through a labeled standard celebrity picture library to label the video data.

Annotations for video images are similar to picture data. Firstly, performing frame extraction on video data to obtain a frame extraction picture, matching through a standard celebrity picture library which is labeled, identifying people in the frame extraction picture, and marking a corresponding label on the video data.

For step S7: and storing the classified and labeled news data into a system for a user to browse, and receiving user behavior data sent by the user side.

And classifying and labeling the news data and then importing the news data into the system. The user browses the data through the APP at the user end. The system automatically collects the user behavior data sent by the user side. Such behavior data includes, but is not limited to, user actions such as clicking on news data, forwarding, commenting, staying time, dragging video, etc.

For step S8: and calculating and analyzing the user behavior data to obtain a user behavior calculation result, and storing the user behavior calculation result into user configuration information of the user, wherein all the user behavior calculation results of the user are stored in the user configuration information.

Different weights are assigned to various behaviors to indicate how popular the news users of different behaviors are. And analyzing and calculating the received user behavior data to obtain a user behavior calculation result. And then storing the user behavior calculation result obtained by calculation into the user configuration information of the user. Each user corresponds to one piece of user configuration information, and all user behavior calculation results of the user are stored in the user configuration information. The user configuration information reflects all the operation behaviors of the user.

For step S9: and calculating the favorite result of the user according to the updated calculation results of all the user behaviors, and recommending targeted content to the user according to the favorite result.

And when a user behavior calculation result is newly added in the user configuration information, calculating the preference result of the user according to all the updated user behavior calculation results. This preference result reflects the user's preference. The system can accurately recommend the interested contents to the user according to the favorite result of the user.

In a preferred embodiment, an effective calculation period is set, the preference result is updated periodically in the period, and all the user behavior calculation results in the effective calculation period are recalculated to obtain a new preference result.

It will be appreciated that the user's preference for things is time-limited. Generally, over time, the user may have shifted something of interest. And all the user behavior calculation results of the user are stored in the user configuration information. Therefore, each time the preference of the user is calculated, the calculation result is biased by all the stored user behaviors.

Therefore, preferably, an effective calculation period is set, the user preference is updated regularly in the period, the calculation result of the user behavior exceeding the effective calculation period, which is stored in the user configuration information, is eliminated, and the preference result is recalculated. In the present embodiment, the effective calculation period is set to three months. It will be appreciated that the effective calculation period may be adjusted as desired.

As a preferred embodiment, the calculation results of the user behavior are divided into a plurality of statistical categories, and different effective calculation periods are set for the calculation results of different statistical categories.

It will be appreciated that the time at which the user loses interest is different for transactions in different categories, i.e. different effective calculation periods need to be set for different categories of news data, and longer effective calculation periods need to be set for categories for which the interest lasts longer.

As a preferred embodiment, when calculating the preference result of the user, the closer the generation time of the user behavior result is to the current time, the larger the corresponding calculation weight.

As a preferred embodiment, if the text data is currently browsed by the user, the category of the text data being browsed by the user is acquired, and a plurality of text data are selected from other text data of the category and pushed to the user.

As a preferred embodiment, a specific method for selecting a plurality of text data from other text data in the category and pushing the selected text data to the user is as follows: and acquiring other text data under the category from the system. And sequencing the acquired text data according to the heat degree. And pushing a plurality of text data ranked at the top to the user. I.e. preferably push the text data with higher popularity to the user.

As an optional implementation manner, a specific method for selecting a plurality of text data from other text data in the category and pushing the selected text data to the user is as follows: and acquiring other text data under the category from the system. And sequencing the acquired text data according to the heat degree to obtain a first sequence. And sequencing the acquired text data according to the release time to obtain a second sequence. And setting a calculation weight value for the text data in the first sequence according to a first rule, wherein the higher the heat of the text data is, the larger the calculation weight value is. And setting a calculation weight value for the text data in the second sequence according to a second rule, wherein the calculation weight value is larger the closer the text data release time is to the current time. And calculating the comprehensive weight of each text data, wherein each text data corresponds to one heat calculation weight and one release time calculation weight, and the two weights are added to obtain the comprehensive weight. And reordering according to the comprehensive weight of each text data under the category to obtain a third ordering. And pushing a plurality of text data ranked at the top in the third ranking to the user. That is, when selecting text data to be pushed to a user, not only the popularity of the text data is taken into consideration, but also the distribution time of the text data is used as an index for comprehensive consideration. It is understood that the setting of the calculation weight can be specified according to the requirement.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It should be understood by those skilled in the art that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the scope of the present invention.

Claims

1. An information pushing method based on user behaviors is characterized by comprising the following steps:

cleaning the news data;

classifying the text data through an LDA text clustering model;

analyzing all the classified text data belonging to the same category to obtain a character label of each category;

calculating and analyzing the user behavior data to obtain a user behavior calculation result, and storing the user behavior calculation result into user configuration information of a user, wherein all the user behavior calculation results of the user are stored in the user configuration information;

and calculating the favorite result of the user according to the updated user behavior calculation results, and recommending targeted content to the user according to the favorite result.

2. The information pushing method based on user behavior as claimed in claim 1,

setting an effective calculation period, updating the preference result at regular time in the period, and recalculating all the user behavior calculation results in the effective calculation period to obtain a new preference result.

3. The information pushing method based on user behavior as claimed in claim 2,

and dividing the user behavior calculation result into a plurality of statistical categories, and setting different effective calculation periods for the calculation results of different statistical categories.

4. The information pushing method based on user behavior as claimed in claim 1,

and when the preference result is calculated, setting different calculation weights according to the generation time of the user behavior result.

5. The information pushing method based on user behavior as claimed in claim 1,

when the preference result of the user is calculated, the closer the generation time of the user behavior result is to the current time, the larger the corresponding calculation weight is.

6. The information pushing method based on user behavior as claimed in claim 1,

if the text data is browsed by the user currently, the type of the text data browsed by the user is obtained, and a plurality of text data are selected from other text data of the type and pushed to the user.

7. The information pushing method based on user behavior as claimed in claim 6,

the specific method for selecting a plurality of text data from other text data of the category and pushing the selected text data to the user comprises the following steps:

acquiring other text data under the category from the system;

sequencing the acquired text data according to the heat;

and pushing a plurality of the text data ranked at the top to the user.

8. The information pushing method based on user behavior as claimed in claim 6,

acquiring other text data under the category from the system;

sequencing the acquired text data according to the heat degree to obtain a first sequence;

calculating the comprehensive weight of each piece of text data;

pushing a number of the text data ranked top in the third ranking to the user.

9. The information pushing method based on user behavior as claimed in claim 1,

and classifying the text data through an LDA text clustering model, and setting the classification quantity according to the requirement.

10. The information pushing method based on user behavior as claimed in claim 9,

and setting the classification quantity according to the total quantity of the text data.