CN110119477B

CN110119477B - Information pushing method, device and storage medium

Info

Publication number: CN110119477B
Application number: CN201910398975.7A
Authority: CN
Inventors: 孙振龙; 胡澜涛; 陈磊; 张博; 刘祺; 梁铭霏; 丘志杰; 饶君; 刘毅; 王良栋; 刘书凯; 商甜甜; 苏舟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2024-02-27
Anticipated expiration: 2039-05-14
Also published as: CN110119477A

Abstract

The embodiment of the invention discloses an information pushing method, an information pushing device and a storage medium; the embodiment of the invention can acquire the user data of the target user, then, the user data is subjected to feature extraction to obtain the user features, then, the user features and the information to be pushed are subjected to low-dimensional dense vectorization to obtain the user vector and the information vector, the similarity of the information vector and the user vector is calculated, and then, the information to be pushed, of which the similarity meets the preset condition, is pushed to the target user. The scheme can improve the accuracy of information pushing.

Description

Information pushing method, device and storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an information pushing method, an apparatus, and a storage medium.

Background

With the development and popularization of information, people are enclosed in the wanyang sea of information. Information resources on the internet are expanded exponentially, and massive information resources have various characteristics of isomerism, multiple elements, distribution and the like. Thus, it is important to be able to pertinently recommend content of interest to a user according to information of the user, and to provide personalized services to the user.

However, in the existing recommendation method, content is recommended to a target crowd in a targeted manner mainly through a targeted recommendation manner, for example, content of interest is recommended to a user according to a targeted condition (such as gender, age, region, elite level or interest category of the user, etc.). Or through a re-marketing mode, browsing the content according to the history of the user, and continuously recommending the content to the user. However, the directional recommendation method has few acquired directional conditions, and the target crowd defined by people is not necessarily the real crowd like the recommended content. In the re-marketing mode, the browsed user is used as a target user, and the target crowd is limited.

Disclosure of Invention

The embodiment of the invention provides an information pushing method, an information pushing device and a storage medium, which can improve the accuracy of information pushing.

The embodiment of the invention provides an information pushing method, which comprises the following steps:

acquiring user data of a target user;

extracting the characteristics of the user data to obtain user characteristics;

performing low-dimensional dense vectorization on the user characteristics and the information to be pushed to obtain a user vector and an information vector;

calculating the similarity of the information vector and the user vector;

And pushing the information to be pushed, the similarity of which meets the preset condition, to the target user.

Correspondingly, the embodiment of the invention also provides an information pushing device, which comprises:

an acquisition unit configured to acquire user data of a target user;

the feature extraction unit is used for extracting the features of the user data to obtain user features;

the vectorization unit is used for carrying out low-dimensional dense vectorization on the user characteristics and the information to be pushed to obtain a user vector and an information vector respectively;

a calculation unit configured to calculate a similarity between the information vector and the user vector;

and the pushing unit is used for pushing the information to be pushed, the similarity of which meets the preset condition, to the target user.

Optionally, in some embodiments, the feature extraction unit may be specifically configured to unify the features of the user data into a preset format, count values of each feature parameter in the preset format, where the feature parameters include feature weights, and normalize the feature weights according to the counted values to obtain the user features.

Optionally, in some embodiments, the user features include a plurality of features of the user, the information to be pushed includes an information identifier, and the vectorizing unit may include a user vectorizing subunit and an information sub-vectorizing unit, as follows:

And the user vectorization subunit is used for carrying out low-dimensional dense vectorization on each feature in the user features according to the vectorization model to obtain a plurality of feature vectors, and carrying out weighted summation on the feature vectors according to the weight of each feature of the user to obtain the user vectors.

And the information vectorization subunit is used for searching the low-dimensional dense vector corresponding to the information identifier in the vectorization model to obtain an information vector.

Optionally, in some embodiments, the information pushing device may further include an acquisition unit and a training unit, as follows:

the acquisition unit is used for acquiring the user data samples and the user behavior data samples.

The training unit is used for training a preset model by using the user data sample and the user behavior data sample to obtain a vectorization model.

Optionally, in some embodiments, the training unit may include a feature extraction subunit, a sample formatting subunit, and a training subunit, as follows:

and the characteristic extraction subunit is used for carrying out characteristic extraction on the user data sample to obtain a user characteristic sample.

The sample formatting subunit is configured to format the user behavior data sample to obtain a formatted sample.

And the training subunit is used for training a preset model by using the user characteristic sample and the formatted sample to obtain a vectorization model.

Optionally, in some embodiments, the training subunit may be specifically configured to perform random low-dimensional dense vectorization on the user feature sample and the formatted sample by using a preset model to obtain a user vector sample and an information vector sample respectively; predicting the user behavior by using the user vector sample and the information vector sample to obtain a sample predicted value; and converging a preset model according to the sample predicted value and the sample true value to obtain a vectorization model.

Optionally, in some embodiments, the training subunit may specifically utilize an embedding layer to randomly generate a low-dimensional dense vector corresponding to each feature sample of the user, obtain a feature vector sample, perform weighted summation on the feature vector sample according to a weight of each feature sample of the user, obtain a user vector sample, randomly generate a low-dimensional dense vector corresponding to the information sample identifier by using the embedding layer, obtain an information vector sample, then perform vector dot product on the user vector sample and the information vector sample, obtain a sample vector value, map the sample vector value to a preset function interval, obtain a sample prediction value, and converge on a preset model according to the sample prediction value and the sample true value, so as to obtain a vectorization model.

Optionally, in some embodiments, the training unit may be specifically configured to divide the user behavior data sample into a positive behavior data sample and a negative behavior data sample, to obtain a user behavior data sample set, where the positive behavior data sample is a sample for executing a behavior in the user behavior data sample, and the negative behavior data sample is a sample other than executing the behavior in the user behavior data sample; and training a preset model by using the user data sample and the plurality of positive behavior data samples corresponding to the same negative behavior data sample to obtain a vectorization model.

In addition, the embodiment of the invention also provides a storage medium, which stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor to execute the steps in any information pushing method provided by the embodiment of the invention.

The embodiment of the invention can acquire the user data of the target user, then, the user data is subjected to feature extraction to obtain user features, then, the user features and the information to be pushed are subjected to low-dimensional dense vectorization to obtain user vectors and information vectors, the similarity of the information vectors and the user vectors is calculated, and then, the information to be pushed, of which the similarity meets the preset condition, is pushed to the target user; the scheme can improve the accuracy of information pushing.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1a is a schematic view of a scenario of an information push method provided by an embodiment of the present invention;

FIG. 1b is a flowchart of an information push method according to an embodiment of the present invention;

FIG. 1c is a block diagram of a vectorized model provided by an embodiment of the present invention;

FIG. 1d is a block diagram of a preset model provided by an embodiment of the present invention;

FIG. 1e is a further block diagram of a vectorized model provided by an embodiment of the present invention;

FIG. 2a is a frame diagram of information push provided by an embodiment of the present invention;

FIG. 2b is a further block diagram of a default model provided by an embodiment of the present invention;

FIG. 2c is another flowchart of an information pushing method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an information pushing device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a network device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The embodiment of the invention provides an information pushing method, an information pushing device and a storage medium. The information push can be integrated in network equipment, and the network equipment can be a server or a terminal and other equipment.

For example, referring to fig. 1a, first, the network device integrated with the information pushing apparatus may obtain user data of a target user, then perform feature extraction on the user data to obtain user features, then perform low-dimensional dense vectorization (for example, generate a low-dimensional dense vector by using a vectorization model) on the user features and information to be pushed (for example, popular content property such as "screen X-pass", "wave X-ball", etc.) to obtain a user vector and an information vector, then calculate similarity between the information vector and the user vector, and then push the information to be pushed, where the similarity meets a preset condition, to the target user.

According to the scheme, the user data is utilized to conduct feature extraction, and then the degree of interest of the user on the information is obtained by analyzing the similarity between the extracted user features and the information to be pushed, and the information is pushed according to the degree of interest of the user, so that the accuracy of information pushing is effectively improved.

The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.

The embodiment will be described from the perspective of an information pushing device, which may be specifically integrated in a network device, where the network device may be a server or a device such as a terminal; the terminal may include a mobile phone, a tablet computer, a notebook computer, a personal computer (Personal Computer, PC), and the like.

An information pushing method, comprising: obtaining user data of a target user, extracting features of the user data to obtain user features, carrying out low-dimensional dense vectorization on the user features and information to be pushed to obtain user vectors and information vectors, calculating the similarity between the information vectors and the user vectors, and pushing the information to be pushed, wherein the similarity meets preset conditions, to the target user.

As shown in fig. 1b, the specific flow of the information pushing method may be as follows:

101. user data of a target user is obtained.

For example, the information pushing device may specifically acquire the user data of the target user, or may further provide the user data of the target user to the information pushing device after acquiring the user data of the target user by other devices, such as a terminal, that is, the information pushing device may specifically acquire the user data of the target user, or may receive the user data of the target user sent by other devices.

The target user refers to an object that needs to push information, for example, information that needs to push Zhang Sanqi is interested in Zhang Sanqi, and the target user is Zhang Sanqi. The user data refers to data that may reflect personal information of the user, such as age, sex, interests, tags, and territories of the user, and the like.

102. And extracting the characteristics of the user data to obtain the user characteristics.

For example, the features of the user data may be unified into a preset format, the values of the feature parameters in the preset format are counted, the feature parameters include feature weights, and then the feature weights are normalized according to the counted values, so as to obtain the user features.

The setting mode of the preset format may be various, for example, may be flexibly set according to the actual application requirement, or may be preset and stored in the network device. In addition, the preset format may be built in the network device, or may be stored in a memory and transmitted to the network device, or the like.

For example, to facilitate subsequent processing, the features may be unified into a "user identification|feature group name\t feature ID" from the acquired user data: weight space feature ID: weight … … |feature group name \t feature ID: weight space feature ID: the format of the weights … … ". And performing global statistics on the unified features, such as the occurrence times, the weight maximum value, the weight minimum value, the weight average value, the weight standard deviation and the like of the statistical features. And then normalizing the feature weights according to the counted values to obtain the user features. The feature normalization may be performed by using an L1 norm (i.e., absolute value addition, also called manhattan distance), an L2 norm (i.e., sum of euclidean distances), and a z-score (z-score).

103. And carrying out low-dimensional dense vectorization on the user characteristics and the information to be pushed to obtain a user vector and an information vector.

For example, the user features may include multiple features of the user, the information to be pushed may include an information identifier, specifically, each feature in the user features may be subjected to low-dimensional dense vectorization according to a vectorization model to obtain multiple feature vectors, then the feature vectors are weighted and summed according to the weight of each feature of the user to obtain a user vector, and then the low-dimensional dense vector corresponding to the information identifier is searched in the vectorization model to obtain an information vector.

In order to accelerate the model training and prediction process and reduce the memory consumption, compress the original high-dimensional feature vector into a lower-dimensional feature vector without losing the expression capability of the original feature as much as possible, the feature hashing method may be used to compress the high-dimensional feature vector of the user feature into a low-dimensional feature vector, that is, "perform low-dimensional dense vectorization on each feature in the user feature according to the vectorization model, to obtain a plurality of feature vectors" may include:

and carrying out hash transformation on each feature in the user features by using a vectorization model to obtain a plurality of hash features, and searching feature vectors corresponding to the hash features in the vectorization model to obtain a plurality of feature vectors.

Alternatively, the vectorization model may be trained from user data samples and user behavior data samples. The training information can be provided for the information pushing device after being trained by other equipment, or the information pushing device can also be used for training by itself. That is, before "vectorizing each of the user features in a low-dimensional dense manner according to the vectorization model" in step, the information push method may further include:

and acquiring a user data sample and a user behavior data sample, and training a preset model by using the user data sample and the user behavior data sample to obtain a vectorization model, wherein the vectorization model framework can be shown as a figure 1 c. For example, the following may be specifically mentioned:

A. user data samples and user behavior data samples are obtained.

For example, multiple sets of user data samples and user behavior data samples may be collected as training data, for example, training data may be obtained from a database or a network, for example, training data may be obtained from a Hadoop distributed file system (Hadoop Distributed File System, HDFS), and then the training data may be preprocessed, for example, the user data samples may be feature extracted, formatted, and so on, to obtain data that meets the input criteria of the preset model.

Wherein, the behavior data is an observation report about the behavior of the body and the environment when the behavior occurs, and the user behavior data mainly refers to various data about the behavior of the user and the environment when the behavior occurs. Such as browsing, clicking, posting, etc. by the user in the web site and mobile phone software (App). Where behavioral data typically has 3 data dimensions of time, frequency, and outcome. The behavioral data time dimension is primarily concerned with the time period and duration in which the behavior occurs. Duration concerns the process of behavior occurrence, recording the behavior start and end times. The frequency of behavior data is mainly focused on the number and trend of certain specific behaviors, wherein the number has a large positive correlation with the interests of the user. The results of the behavior data mainly concern whether the transaction is completed, forwarded, praise and the like, and are used for judging the results of the user clicking and browsing.

B. And extracting the characteristics of the user data sample to obtain a user characteristic sample.

For example, the features of the user data sample may be unified into a preset format, the values of the feature parameters in the preset format are counted, the feature parameters include feature weights, and then the feature weights are normalized according to the counted values, so as to obtain the user feature sample.

For example, the feature may be unified as "user identification|feature group name\t feature ID" from the acquired user data samples: weight space feature ID: weight … … |feature group name \t feature ID: weight space feature ID: the format of the weights … … ". And performing global statistics on the unified features, such as the occurrence times, the weight maximum value, the weight minimum value, the weight average value, the weight standard deviation and the like of the statistical features. And then normalizing the feature weights according to the counted values to obtain the user features. The feature normalization may be performed by using an L1 norm, an L2 norm, a z-score (z-score), or the like.

C. And formatting the user behavior data sample to obtain a formatted sample.

For example, the formatted samples may be built specifically based on a behavior log, wherein the formatted samples may be in the format of (user identification, information identification, tag). Wherein, the information identification refers to identification of an object of the user execution behavior. The user behavior data samples may be divided into positive behavior data samples and negative behavior data samples, and then the labels in the formatted samples may be represented by 1 or 0, where 1 represents a positive example sample, i.e., a positive behavior data sample, and 0 represents a negative example sample, i.e., a negative behavior data sample. For example, a user who operates the information may be selected as a positive behavior data sample, and then the positive behavior data sample is up-sampled to obtain a formatted sample, and a user who does not operate the information may be selected as a negative behavior data sample, i.e., a user other than the user who operates the information is selected as a negative behavior data sample. The negative behavior data sample may be randomly sampled according to a preset condition, for example, the preset condition may be that a user who can acquire recent behavior and the feature number of the user data is greater than a preset threshold value, so as to obtain a formatted sample.

D. Training a preset model by using the user characteristic sample and the formatted sample to obtain a vectorization model.

For example, a preset model may be specifically adopted to perform random low-dimensional dense vectorization on the user feature sample and the formatted sample, so as to obtain a user vector sample and an information vector sample respectively; predicting the user behavior by using the user vector sample and the information vector sample to obtain a sample predicted value; and converging a preset model according to the sample predicted value and the sample true value to obtain a vectorization model.

For example, to improve training efficiency of the model and reduce memory consumption, the high-dimensional feature vectors of the user feature samples and the formatted samples may be compressed into low-dimensional feature vectors, for example, the preset model may include an input layer, an embedded layer, an intermediate layer and an output layer, the preset model structure may be as shown in fig. 1d, and the embedded layer of the preset model may be used to randomly generate the user vector samples and the information vector samples with preset dimensions, that is, "the user feature samples and the formatted samples are randomly and densely vectorized with low-dimensional by using the preset model to obtain the user vector samples and the information vector samples", which may include:

Randomly generating a low-dimensional dense vector corresponding to each characteristic sample of the user by using the embedded layer to obtain a characteristic vector sample; carrying out weighted summation on the feature vector samples according to the weight of each feature sample of the user to obtain user vector samples; and randomly generating a low-dimensional dense vector corresponding to the information sample identifier by using the embedded layer to obtain an information vector sample.

For example, to obtain an accurate predicted value, a vector dot product may be first made of a user vector sample and an information vector sample, and then a value obtained by the dot product is mapped between 0 and 1 through an S-type (Sigmoid) function, that is, "predicting a user behavior by using the user vector sample and the information vector sample to obtain a sample predicted value", may include:

performing vector dot product by using the user vector sample and the information vector sample to obtain a sample vector value; and mapping the sample vector value to a preset function interval to obtain a sample predicted value.

And calculating cross entropy of the sample predicted value and the sample true value, for example, logistic loss, then, adjusting an embedding (embedding) layer by counter-propagating gradient to achieve the purpose of convergence, and storing the adjusted parameters in the embedding layer to finally obtain the vectorization model.

To further improve the training efficiency of the model and save the disk space, the model may be trained in a way of sharing negative examples, for example, a plurality of positive behavior data samples share one negative behavior data sample, as shown in fig. 1 e. Because the negative examples are randomly generated, the negative examples do not need to be generated for each attribute separately, multiple models can share one negative example, the shared negative examples are split after being read, the number of split = the number of attributes, and the split negative examples are sequentially used for training the attributes, so that Input/Output (IO) operations and storage are reduced. The training of the preset model by using the user data sample and the user behavior data sample to obtain a vectorized model may include:

dividing the user behavior data sample into a positive behavior data sample and a negative behavior data sample, and obtaining a user behavior data sample set, wherein the positive behavior data sample is a sample for executing behaviors in the user behavior data sample, and the negative behavior data sample is a sample except for executing the behaviors in the user behavior data sample; and training a preset model by using the user data sample and the plurality of positive behavior data samples corresponding to the same negative behavior data sample to obtain a vectorization model.

In order to verify the accuracy of the model, the user data sample can be divided into a verification set and a test set, the verification set and the user behavior data sample are utilized to train, and then the test set is utilized to conduct score estimation on the trained model, so that the Area (AUC) Under the model evaluation index Curve is obtained.

In order to verify the correctness of the model, calculation of the information feature is also required. For example, the vectorization model is utilized to calculate the hot features of the information, and the correctness of the model is judged according to the calculation result.

104. And calculating the similarity of the information vector and the user vector.

There are various ways to calculate the similarity between the information vector and the user vector, for example, the similarity between the information vector and the user vector may be obtained by calculating the cosine value between the information vector and the user vector, and so on. For example, cosine values of the information vector and the user vector may be calculated to obtain the similarity between the information vector and the user vector.

105. And pushing the information to be pushed, the similarity of which meets the preset condition, to the target user.

For example, the information to be pushed, the similarity of which meets the preset condition, may be stored in a key-value (KV) storage system, and the information to be pushed is pushed to the user when the user waits for refreshing.

The preset conditions may be set in various manners, for example, may be flexibly set according to the actual application requirement, or may be preset and stored in the network device. In addition, the preset condition may be built in the network device, or may be stored in a memory and transmitted to the network device, or the like. The predetermined condition may be that the similarity is greater than a predetermined threshold, for example, the similarity is greater than 80%, and so on. The preset threshold value can be flexibly set according to the requirements of practical application, and can be preset and stored in the network equipment.

As can be seen from the foregoing, in this embodiment, user data of a target user may be obtained first, then feature extraction is performed on the user data to obtain user features, then low-dimensional dense vectorization is performed on the user features and information to be pushed to obtain a user vector and an information vector, then similarity between the information vector and the user vector is calculated, and then the information to be pushed, whose similarity satisfies a preset condition, is pushed to the target user; according to the scheme, the user data is utilized to conduct feature extraction, and then the degree of interest of the user on the information is obtained by analyzing the similarity between the extracted user features and the information to be pushed, and the information is pushed according to the degree of interest of the user, so that the accuracy of information pushing can be effectively improved compared with the scheme that the information is pushed only by means of clicking behaviors of the user.

The method described in the previous embodiment is described in further detail below by way of example.

In this embodiment, the information pushing device is specifically integrated in a network device, and the information to be pushed is specifically a popular content property right (intellectual property, IP), the vectorization model is specifically a content property right vectorization model (Intellectual property to Vector, IP2 Vec) model, and a specific framework of the information pushing may be illustrated as shown in fig. 2 a.

Firstly, training the IP2Vec model is needed, which can be specifically as follows:

(1) The network device obtains a user data sample and a user behavior data sample.

For example, multiple sets of user data samples and user behavior data samples may be collected as training data, for example, training data may be obtained from a Hadoop distributed file system, and then feature extraction and sample construction may be performed on the training data to obtain data meeting input criteria of a preset model.

(2) The network equipment trains a preset model by utilizing the user data sample and the user behavior data sample to obtain an IP2Vec model; for example, the following may be specifically mentioned:

A. and extracting the characteristics of the user data sample to obtain a user characteristic sample.

For example, the feature may be unified as "user identification|feature group name\t feature ID" from the acquired user data samples: weight space feature ID: weight … … |feature group name \t feature ID: weight space feature ID: the weight … … ″ may be in the form of age (age), gender (gender), interest (interest), tag (tag), etc., for example, "uin\tSex_1age_2county_Chinese provider_Henan city_bizcategory_1009:80bizcategory_2004:20usercatcategory_1009:90 usercatcategory_2004:10fobbiz_11111:100fobbiz_222222:30", and then performing global statistics on the unified features, such as occurrence number, weight maximum, weight minimum, weight average, and weight standard deviation, etc. And then normalizing the feature weights to obtain the user features. The feature normalization may be performed by using an L1 norm, an L2 norm, a z-score (z-score), or the like. For example, the normalized user feature is "uin\t sex_1age_2count_chinese provider_henna city_zheng city bizcategory_1009:0.8bizcategory_2004:0.2 userccategory_1009:0.9 userccategory_2004:0.1 fobbiz_11111111:1.0 fobbiz_222222:0.3".

B. The network device formats the user behavior data sample to obtain a formatted sample.

For example, a formatted sample may be specifically constructed based on a behavior log, where the format of the formatted sample may be (uin, ip_id, label), where uin is a user identifier, ip_id is an IP identifier, label is a label, 1 represents a positive case, and 0 represents a negative case. For example, a user who clicks on IP may be obtained as a positive example, and a negative example may be randomly sampled from users who have never clicked on IP, are active (have exposure in the last two weeks), and have feature numbers greater than 20. For example, the formatted sample may be "10001\t 1\t sex_1age_2country _Chinese program_city_zhengzhou city bizcategory_ 1009:1.0bizcategory_2004:0.17508813161usercategory_1009:1.0usercategory_2004:0.17508813161fobbiz_11111111:1.0fobbiz_222222:0.5868896552".

C. And the network equipment trains the preset model by utilizing the user characteristic sample and the formatted sample to obtain an IP2Vec model.

For example, a preset model may be specifically adopted to perform random low-dimensional dense vectorization on the user feature sample and the formatted sample, so as to obtain a user vector sample and an information vector sample; predicting the user behavior by using the user vector sample and the information vector sample to obtain a sample predicted value; and converging a preset model according to the sample predicted value and the sample true value to obtain an IP2Vec model.

For example, to improve training efficiency of the model and reduce memory consumption, the high-dimensional feature vectors of the user feature samples and the formatted samples may be compressed into low-dimensional feature vectors, for example, the preset model may include an input layer, an embedded layer, an intermediate layer, and an output layer, and the preset model structure may be as shown in fig. 2b, and the embedded layer of the preset model may be used to randomly generate the user vector samples and the information vector samples with preset dimensions, for example, the input layer: the user feature dimension is 1000 ten thousand dimensions and the ip_id dimension is 1000 dimensions. An embedding layer: the vector (casting) dimension is a 32-dimensional float (float) type. An intermediate layer: the user vector sample (user unbedding) dimension is a 32-dimensional float type, and the information vector sample (IP unbedding) dimension is a 32-dimensional float type. Output layer: the sample prediction value is 1-dimensional flow type, and a low-dimensional dense vector (such as a 32-dimensional flow type vector) corresponding to each feature sample of the user can be randomly generated by using an embedding layer, for example, hash transformation is randomly performed on each feature in the user feature sample to obtain a plurality of hash features (feature_hash), for example, feature_hash: weight space feature_hash: weight … …, and feature vector samples (feature_displacement), for example, feature_displacement: weight space … …, are obtained. Carrying out weighted summation on the feature vector samples according to the weight (weight) of each feature sample of the user to obtain user vector samples (user weighting); and randomly generating a low-dimensional dense vector corresponding to the information sample identification (ip_id) by using an embedding layer to obtain an information vector sample (IP embedding). Vector dot products are made of user vector samples (user embedding) and information vector samples (IP embedding), and values obtained by dot products are mapped between 0 and 1 through an S-type (Sigmoid) function to obtain output (output), namely a sample predicted value. Then, the cross entropy is calculated by the output and Label (i.e. the sample true value), for example, logistic loss, then the embedding (embedding) layer is adjusted by back propagation gradient to achieve the purpose of convergence, and the adjusted parameters are stored in the embedding layer, finally the IP2Vec model is obtained.

To further improve the training efficiency of the model and save disk space, the model may be trained in a shared negative example manner, for example, a plurality of positive behavior data samples share a negative behavior data sample. Because the negative examples are randomly generated, the negative examples do not need to be generated for each attribute separately, multiple models can share one negative example, the shared negative examples are split after being read, the number of split = the number of attributes, and the split negative examples are sequentially used for training the attributes, so that Input/Output (IO) operations and storage are reduced. The training of the preset model by using the user data sample and the user behavior data sample to obtain a vectorized model may include:

In order to verify the accuracy of the model, the user data sample can be divided into a verification set and a test set, the verification set and the user behavior data sample are utilized to train, and then the test set is utilized to conduct score estimation on the trained model, so that the Area (AUC) Under the model evaluation index Curve is obtained. It should be noted that, the complexity of the training algorithm is the number of attributes, the number of average features, i.e. 5000 x 41 x 171, for each attribute sample size, and 20 hours of training time is required on a single 16-core machine. The training time is improved by 4 times by adopting a mode of sharing negative examples, the storage is reduced to be 1/count (attribute) originally, and the AUC has no obvious change.

In order to verify the correctness of the model, calculation of an IP top feature is also required. For example, IP is calculated as the top feature of the aircraft carrier, and the domestic aircraft carrier, the fighter-20, the Liaoning warship, the electromagnetic catapult, the fighter-11 d, the bombing-6 k, the threo-30, the nuclear power, the intercontinental ballistic missile, the aircraft and the like are obtained, so that the IP top feature is found more accurately after training and the content is more abundant.

And secondly, through the trained IP2Vec model, the interested IP can be pushed to the target user, and a specific flow chart can be seen in FIG. 2c.

As shown in fig. 2c, a specific flow of an information pushing method may be as follows:

201. the network device obtains user data of the target user.

For example, the user data of the target user may be acquired by the network device. For example, the age, sex, interests, tags, and territories of the user are obtained.

202. And the network equipment performs feature extraction on the user data to obtain user features.

For example, the network device may specifically unify the features of the user data into a preset format, count the values of each feature parameter in the preset format, where the feature parameters include feature weights, and normalize the feature weights according to the counted values to obtain the user features.

For example, to facilitate subsequent processing, the features may be unified into a "user identification|feature group name\t feature ID" from the acquired user data: weight space feature ID: weight … … |feature group name \t feature ID: weight space feature ID: the format of the weight … … ", for example," uin\t sex_1age_22count_chinese provision_Guangdong city_bizcategory_1018:90 bizcategory_2014:30 usercatcategory_1018:80 usercatcategory_2014:10 fobbiz_11111111:100 fobbiz_222222:40 ". And performing global statistics on the unified features, such as the occurrence times, the weight maximum value, the weight minimum value, the weight average value, the weight standard deviation and the like of the statistical features. And then normalizing the feature weights to obtain the user features. The feature normalization may be performed by using an L1 norm (i.e., absolute value addition, also called manhattan distance), an L2 norm (i.e., sum of euclidean distances), and a z-score (z-score). For example, the normalized user feature is "uin\t sex_1age_22count_china provision_Guangdong city-bizcategory_1018:0.9 bizcategory_2014:0.3 usercatcategory_1018:0.8 usercatcategory_2004:0.1 fobbiz_11111111:1.0 fobbiz_222222:0.4".

203. And the network equipment performs low-dimensional dense vectorization on each feature in the user features according to the vectorization model to obtain a plurality of feature vectors.

For example, the user features include multiple features of the user, and the network device may specifically perform hash transformation on each feature in the user features by using a vectorization model to obtain multiple hash features, for example, feature_hashid: weight space feature_hashid: weight … …, and find feature vectors (unbedding) corresponding to the multiple hash features (feature_hashid) in the vectorization model to obtain multiple feature vectors, for example, feature_unbedding: weight space feature_unbedding: weight … ….

204. The network device performs weighted summation on the feature vector according to the weight of each feature of the user to obtain the user vector.

For example, the network device may specifically obtain a weight of each feature of the user, and perform weighted summation on the feature vector according to the weight of each feature to obtain a user vector (user weighting).

205. And the network equipment searches the low-dimensional dense vector corresponding to the IP identifier in the IP2Vec model to obtain an information vector.

For example, the IP may include an IP identifier, and the network device may obtain an ID of the IP from the network, that is, input the ip_id into the IP2Vec model, and directly search an information vector (IP embedding) corresponding to the ip_id from an embedding (embedding) layer of the IP2Vec model.

206. The network device calculates a similarity of the information vector and the user vector.

For example, the cosine values of the information vector (IP embedding) and the user vector (user embedding) may be calculated, to obtain the similarity between the information vector and the user vector.

207. And the network equipment pushes the information to be pushed, the similarity of which meets the preset condition, to the target user.

For example, the most popular and target user has a similarity greater than a preset threshold, for example, the similarity is greater than 80%, and the IP is stored in a key-value (KV) storage system, that is, the most popular and highest similarity IP is intercepted, for example, the first 20 data of top is intercepted, and when waiting for the user to refresh, the content of the IP is pushed to the user.

In addition, it should be noted that, by using the scheme to perform interested IP recommendation on the user, the click rate of the user is improved in XX application, for example, in XX application video recommendation service: main Timeline (TL) video exposure click rate: +0.58%; XX application Top5 click Rate: +0.29%; video interest heuristic master TL article click rate: +0.44%; XX applies new user top5 click rate: +3.15%; XX applies new user top15 click Rate: +1.79%. On XX application graphics context recommendation service: a per capita digital Painting (mp) article reading time length (minutes) +0.54; mp effective reading time length of each user is +0.53; mp article people average topic (topic) number +0.37%; mp article people recommended tag number +0.48%; mp articles each recommend category number +0.15%, and so on.

As can be seen from the foregoing, in this embodiment, user data of a target user may be obtained first, then feature extraction is performed on the user data to obtain user features, then low-dimensional dense vectorization is performed on the user features and information to be pushed to obtain a user vector and an information vector, then similarity between the information vector and the user vector is calculated, and then the information to be pushed, whose similarity satisfies a preset condition, is pushed to the target user; according to the scheme, the user data is utilized to conduct feature extraction, then the similarity between the extracted user features and the information to be pushed is analyzed to obtain the interest degree of the user on the information, and the information is pushed according to the interest degree of the user, so that compared with the scheme of pushing the information only by means of clicking behaviors of the user, the accuracy of information pushing can be effectively improved, meanwhile, due to the fact that negative examples are randomly sampled, the scheme adopts a mode that a plurality of positive examples share one negative example, and therefore disk space can be saved and training efficiency can be improved.

In order to better implement the above method, correspondingly, the embodiment of the invention also provides an information pushing device, which can be specifically integrated in a network device, wherein the network device can be a server or a terminal and other devices.

For example, as shown in fig. 3, the information pushing apparatus may include an acquisition unit 301, a feature extraction unit 302, a vectorization unit 303, a calculation unit 304, and a pushing unit 305, as follows:

(1) An acquisition unit 301;

an acquisition unit 301, configured to acquire user data of a target user.

For example, the information pushing device may specifically acquire the user data of the target user, or may further provide the acquired user data of the target user to the acquiring unit 301 after acquiring the user data of the target user by other devices, such as a terminal, that is, the acquiring unit 301 specifically acquires the user data of the target user, or may receive the user data of the target user sent by other devices.

(2) A feature extraction unit 302;

the feature extraction unit 302 is configured to perform feature extraction on the user data to obtain a user feature.

For example, the feature extraction unit 302 may be specifically configured to unify the features of the user data into a preset format, count values of various feature parameters in the preset format, where the feature parameters include feature weights, and normalize the feature weights according to the counted values to obtain the user features. The feature extraction process may be specifically referred to the previous embodiments, and will not be described herein.

(3) A vectorization unit 303;

and the vectorization unit 303 is configured to perform low-dimensional dense vectorization on the user feature and the information to be pushed, so as to obtain a user vector and an information vector respectively.

Optionally, in some embodiments, the user features may include a plurality of features of the user, the information to be pushed may include an information identifier, and the vectorizing unit may include a user vectorizing subunit and an information sub-vectorizing unit, as follows:

For example, in order to accelerate the model training and prediction process and reduce the memory consumption, the original high-dimensional feature vector is compressed into a lower-dimensional feature vector, and the expression capability of the original feature is not lost as much as possible, and the user vectorization unit may specifically be configured to perform hash transformation on each feature in the user feature by using the vectorization model to obtain a plurality of hash features, and search feature vectors corresponding to the hash features in the vectorization model to obtain a plurality of feature vectors.

Alternatively, the vectorization model may be trained from multiple sets of user data samples and user behavior data samples. The information pushing device can be provided for the information pushing device after training by other equipment, or the information pushing device can perform training by itself; that is, the information pushing device may further include an acquisition unit and a training unit, as follows:

an obtaining unit 306, configured to obtain a user data sample and a user behavior data sample.

For example, the obtaining unit 306 may specifically collect a plurality of sets of user data samples and user behavior data samples as training data, for example, may obtain training data from a database or a network, for example, may obtain training data from a Hadoop distributed file system (Hadoop Distributed File System, HDFS), and then perform preprocessing on the training data, for example, may perform feature extraction on the user data samples, format the user behavior data samples, and so on, so as to obtain data that meets an input standard of a preset model.

The training unit is used for training the preset model by utilizing the user data sample and the user behavior data sample to obtain a vectorization model.

and the feature extraction subunit is used for carrying out feature extraction on the user data sample to obtain a user feature sample.

For example, the feature extraction subunit may be specifically configured to unify the features of the user data sample into a preset format, count values of each feature parameter in the preset format, where the feature parameter includes a feature weight, and normalize the feature weight according to the counted value to obtain the user feature sample.

And the sample formatting subunit is used for formatting the user behavior data sample to obtain a formatted sample.

For example, the sample formatting subunit may specifically construct formatted samples based on the behavior log, where the formatted samples may be in the format of (user identification, information identification, tag). Wherein, the user behavior data sample can be divided into a positive behavior data sample and a negative behavior data sample, and then the label in the formatted sample can be represented by 1 or 0, wherein 1 represents a positive example, namely the positive behavior data sample, and 0 represents a negative example, namely the negative behavior data sample. For example, a user who operates the information may be selected as a positive behavior data sample, and then the positive behavior data sample is up-sampled to obtain a formatted sample, and a user who does not operate the information may be selected as a negative behavior data sample, i.e., a user other than the user who operates the information is selected as a negative behavior data sample. The negative behavior data sample may be randomly sampled according to a preset condition, for example, the preset condition may be that a user who can acquire recent behavior and the feature number of the user data is greater than a preset threshold value, so as to obtain a formatted sample.

And the training subunit is used for training the preset model by utilizing the user characteristic sample and the formatted sample to obtain a vectorization model.

For example, the training subunit may be specifically configured to perform random low-dimensional dense vectorization on the user feature sample and the formatted sample by using a preset model to obtain a user vector sample and an information vector sample; predicting the user behavior by using the user vector sample and the information vector sample to obtain a sample predicted value; and converging a preset model according to the sample predicted value and the sample true value to obtain a vectorization model.

Optionally, in order to improve training efficiency of the model and reduce memory consumption, the training subunit may compress the high-dimensional feature vectors of the user feature samples and the formatted samples into low-dimensional feature vectors, and specifically may randomly generate a low-dimensional dense vector corresponding to each feature sample of the user by using an embedding layer to obtain feature vector samples, weight-sum the feature vector samples according to weights of each feature sample of the user to obtain user vector samples, randomly generate a low-dimensional dense vector corresponding to the information sample identifier by using the embedding layer to obtain information vector samples, and then, vector dot product is performed by using the user vector samples and the information vector samples to obtain sample vector values, map the sample vector values to a preset function interval to obtain sample predicted values, and converge the preset model according to the sample predicted values and the sample true values to obtain the vectorized model.

To further improve the training efficiency of the model and save disk space, the model may be trained in a shared negative example manner, for example, a plurality of positive behavior data samples share a negative behavior data sample. Because the negative examples are randomly generated, the negative examples do not need to be generated for each attribute separately, multiple models can share one negative example, the shared negative examples are split after being read, the number of split = the number of attributes, and the split negative examples are sequentially used for training the attributes, so that Input/Output (IO) operations and storage are reduced. The training unit is specifically configured to divide the user behavior data sample into a positive behavior data sample and a negative behavior data sample, so as to obtain a user behavior data sample set, where the positive behavior data sample is a sample for executing a behavior in the user behavior data sample, and the negative behavior data sample is a sample other than the sample for executing the behavior in the user behavior data sample; and training a preset model by using the user data sample and the plurality of positive behavior data samples corresponding to the same negative behavior data sample to obtain a vectorization model.

(4) A calculation unit 304;

a calculating unit 304, configured to calculate a similarity between the information vector and the user vector.

For example, the calculating unit may be specifically configured to calculate cosine values of the information vector and the user vector, so as to obtain similarity between the information vector and the user vector.

(5) A pushing unit 305;

For example, the pushing unit may be specifically configured to store information to be pushed, where the similarity meets a preset condition, in a key-value (KV) storage system, and when waiting for a user to refresh, push the information to be pushed to the user.

In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.

As can be seen from the foregoing, in this embodiment, the acquiring unit 301 may acquire the user data of the target user, the feature extracting unit 302 performs feature extraction on the user data to obtain the user feature, the vectorizing unit 303 performs low-dimensional dense vectorization on the user feature and the information to be pushed to obtain the user vector and the information vector, the calculating unit 304 calculates the similarity between the information vector and the user vector, and the pushing unit 305 pushes the information to be pushed, where the similarity satisfies the preset condition, to the target user; according to the scheme, the user data is utilized to conduct feature extraction, and then the degree of interest of the user on the information is obtained by analyzing the similarity between the extracted user features and the information to be pushed, and the information is pushed according to the degree of interest of the user, so that the accuracy of information pushing can be effectively improved compared with the scheme that the information is pushed only by means of clicking behaviors of the user.

In addition, the embodiment of the present invention further provides a network device, as shown in fig. 4, which shows a schematic structural diagram of the network device according to the embodiment of the present invention, specifically:

the network device may include one or more processors 401 of a processing core, memory 402 of one or more computer readable storage media, power supply 403, and input unit 404, among other components. Those skilled in the art will appreciate that the network device structure shown in fig. 4 is not limiting of the network device and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components. Wherein:

the processor 401 is a control center of the network device, connects various parts of the entire network device using various interfaces and lines, performs various functions of the network device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the network device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The network device further comprises a power supply 403 for supplying power to the various components, and preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of charge, discharge, and power consumption management are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The network device may also include an input unit 404, which input unit 404 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the network device may further include a display unit or the like, which is not described herein. In this embodiment, the processor 401 in the network device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

obtaining user data of a target user, extracting features of the user data to obtain user features, carrying out low-dimensional dense vectorization on the user features and information to be pushed to obtain user vectors and information vectors, calculating the similarity between the information vectors and the user vectors, and pushing the information to be pushed, wherein the similarity meets preset conditions, to the target user.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present invention further provides a storage medium storing a plurality of instructions capable of being loaded by a processor to perform any of the steps in the information pushing method provided by the embodiment of the present invention. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The steps in any information pushing method provided by the embodiment of the present invention can be executed due to the instructions stored in the storage medium, so that the beneficial effects that any information pushing method provided by the embodiment of the present invention can be achieved, and detailed descriptions of the previous embodiments are omitted.

The foregoing describes in detail a method, an apparatus and a storage medium for pushing information provided by the embodiments of the present invention, and specific examples are applied to describe the principles and implementations of the present invention, where the descriptions of the foregoing examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims

1. An information pushing method is characterized by comprising the following steps:

acquiring user data of a target user;

extracting the characteristics of the user data to obtain user characteristics;

carrying out low-dimensional dense vectorization on the user characteristics and the information to be pushed by adopting a vectorization model to obtain a user vector and an information vector, wherein the vectorization model is a model obtained by training a preset model by adopting positive behavior samples and negative behavior samples, a plurality of positive behavior samples share the same negative behavior sample, the negative behavior samples are split into at least one negative behavior sample, the splitting number of the negative behavior samples is the same as the attribute number of the preset model, and the split negative behavior samples are sequentially used for training the preset model on each attribute;

Calculating the similarity of the information vector and the user vector;

2. The method of claim 1, wherein the user features include a plurality of features of a user, the information to be pushed includes an information identifier, and the performing low-dimensional dense vectorization on the user features and the information to be pushed to obtain a user vector and an information vector includes:

performing low-dimensional dense vectorization on each feature in the user features according to the vectorization model to obtain a plurality of feature vectors;

carrying out weighted summation on the feature vectors according to the weight of each feature of the user to obtain user vectors;

and searching the low-dimensional dense vector corresponding to the information identifier in the vectorization model to obtain an information vector.

3. The method of claim 2, wherein said vectorizing each of the user features in accordance with the vectorization model to obtain a plurality of feature vectors comprises:

carrying out hash transformation on each feature in the user features by using a vectorization model to obtain a plurality of hash features;

and searching feature vectors corresponding to the hash features in the vectorization model to obtain a plurality of feature vectors.

4. The method of claim 2, wherein prior to vectorizing each of the user features in accordance with the vectorization model in a low-dimensional dense manner, further comprising:

acquiring a user data sample and a user behavior data sample;

and training a preset model by using the user data sample and the user behavior data sample to obtain a vectorization model.

5. The method of claim 4, wherein training the pre-set model using the user data samples and the user behavior data samples to obtain a vectorized model comprises:

extracting the characteristics of the user data sample to obtain a user characteristic sample;

formatting the user behavior data sample to obtain a formatted sample;

training a preset model by using the user characteristic sample and the formatted sample to obtain a vectorization model.

6. The method of claim 5, wherein the formatted samples comprise sample true values, wherein training a pre-set model using the user characteristic samples and the formatted samples to obtain a vectorized model comprises:

carrying out random low-dimensional dense vectorization on the user characteristic sample and the formatted sample by adopting a preset model to respectively obtain a user vector sample and an information vector sample;

Predicting the user behavior by using the user vector sample and the information vector sample to obtain a sample predicted value;

and converging a preset model according to the sample predicted value and the sample true value to obtain a vectorization model.

7. The method of claim 6, wherein the formatted samples comprise information sample identifiers, wherein the pre-set model comprises an embedding layer, wherein the performing random low-dimensional dense vectorization on the user-feature samples and the formatted samples using the pre-set model, respectively, comprises:

randomly generating a low-dimensional dense vector corresponding to each characteristic sample of the user by using the embedded layer to obtain a characteristic vector sample;

carrying out weighted summation on the feature vector samples according to the weight of each feature sample of the user to obtain user vector samples;

and randomly generating a low-dimensional dense vector corresponding to the information sample identification by using an embedded layer to obtain an information vector sample.

8. The method of claim 6, wherein predicting the user behavior using the user vector samples and the information vector samples to obtain a sample predictor comprises:

Performing vector dot product by using the user vector sample and the information vector sample to obtain a sample vector value;

and mapping the sample vector value to a preset function interval to obtain a sample predicted value.

9. The method of claim 4, wherein training the pre-set model using the user data samples and the user behavior data samples to obtain a vectorized model comprises:

dividing the user behavior data sample into a positive behavior data sample and a negative behavior data sample, and obtaining a user behavior data sample set, wherein the positive behavior data sample is a sample for executing behaviors in the user behavior data sample, and the negative behavior data sample is a sample except for executing the behaviors in the user behavior data sample;

and training a preset model by using the user data sample and the plurality of positive behavior data samples corresponding to the same negative behavior data sample to obtain a vectorization model.

10. The method according to any one of claims 1 to 9, wherein the extracting features from the user data to obtain user features includes:

unifying the characteristics of the user data into a preset format;

Counting the values of all characteristic parameters in a preset format, wherein the characteristic parameters comprise characteristic weights;

and normalizing the feature weights according to the counted values to obtain the user features.

11. The method according to any one of claims 1 to 9, wherein said calculating the similarity of the information vector and the user vector comprises:

and calculating cosine values of the information vector and the user vector to obtain the similarity of the information vector and the user vector.

12. An information pushing apparatus, characterized by comprising:

an acquisition unit configured to acquire user data of a target user;

the vectorization unit is used for carrying out low-dimensional dense vectorization on the user characteristics and the information to be pushed by adopting a vectorization model to respectively obtain a user vector and an information vector, the vectorization model is a model obtained by training a preset model by adopting positive behavior samples and negative behavior samples, a plurality of positive behavior samples share the same negative behavior sample, the negative behavior samples are split into at least one, the splitting number of the negative behavior samples is the same as the attribute number of the preset model, and the split negative behavior samples are sequentially used for training the preset model on each attribute;

13. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the information pushing method of any of claims 1 to 11.

14. A network device comprising a processor and a memory, the memory storing an application, the processor being configured to run the application in the memory to perform the steps in the information push method of any of claims 1 to 11.