WO2021027362A1

WO2021027362A1 - Information pushing method and apparatus based on data analysis, computer device, and storage medium

Info

Publication number: WO2021027362A1
Application number: PCT/CN2020/092856
Authority: WO
Inventors: 卢显锋
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-08-13
Filing date: 2020-05-28
Publication date: 2021-02-18
Also published as: CN110688553A

Abstract

An information pushing method and apparatus based on data analysis, a computer device, and a storage medium. The method relates to an artificial intelligence technology, and is applied to the field of prediction models in intelligent decisions. The method comprises: collecting behavior data of a user in a web crawler manner (S110); performing feature engineering processing on the behavior data in a one-hot coding and normalization manner to obtain target data (S120); inputting the target data into a pre-trained potential user mining model to output a potential user prediction value, the potential user prediction value being used for representing the possibility that the user belongs to a potential user (S130); and comparing the potential user prediction value with a preset threshold to determine the potential user and push information to the potential user (S140). By implementing the method, the accuracy of mining potential users to be insured can be improved, advertisements can be effectively pushed, and the cost of obtaining user information by an enterprise is reduced.

Description

Information push method, device, computer equipment and storage medium based on data analysis

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on August 13, 2019, the application number is 201910745385.7, and the invention title is "data analysis-based information push methods, devices, computer equipment and storage media". All of them The content is incorporated in this application by reference.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to an information push method, device, computer equipment and storage medium based on data analysis.

Background technique

With the development of technology and economy, people's living standards are improving day by day, and people's pursuit of quality of life is getting higher and higher. Cars have gradually become an indispensable part of people's lives, and car insurance also provides protection for cars and people's lives. Existing auto insurance customers usually learn about auto insurance information through channels such as 4S shops or auto maintenance shops and then purchase auto insurance. However, the inventor realizes that this method of obtaining the source of customers is relatively single, and it is usually a rigid customer of auto insurance and cannot obtain information of potential customers. For some online customers who are willing to purchase insurance, mining is usually based on the user's browsing history. However, this mining method has low accuracy and high cost, and it is difficult to identify true and effective potential users.

Summary of the invention

The embodiments of the present application provide an information push method, device, computer equipment, and storage medium based on data analysis, aiming to solve the problem of low accuracy in mining online customers willing to apply for insurance.

In the first aspect, an embodiment of the present application provides an information push method based on data analysis, which includes: collecting user behavior data through a web crawler; performing one-hot encoding and normalization on the behavior data Feature engineering processing to obtain target data; input the target data into a pre-trained potential user mining model to output a potential user predicted value, the potential user predicted value used to characterize the possibility that the user belongs to a potential user; Compare the predicted value of the potential user with a preset threshold to determine the potential user and push information on the potential user.

In the second aspect, the embodiment of the present application also provides an information push device based on data analysis, which includes: a crawler unit for collecting user behavior data by way of web crawlers; a feature engineering unit for using one-hot encoding Perform feature engineering processing on the behavior data in a normalized and normalized manner to obtain target data; a prediction unit is used to input the target data into a pre-trained potential user mining model to output potential user prediction values, The potential user predicted value is used to characterize the possibility that the user belongs to a potential user; the pushing unit is used to compare the potential user predicted value with a preset threshold to determine the potential user and push information about the potential user.

In a third aspect, an embodiment of the present application also provides a computer device, which includes a memory and a processor, and a computer program is stored on the memory. When the processor executes the computer program, it realizes: collecting by means of a web crawler User behavior data; perform feature engineering processing on the behavior data through one-hot encoding and normalization to obtain target data; input the target data into a pre-trained potential user mining model to output potential user predictions The potential user predicted value is used to characterize the possibility that the user belongs to a potential user; the potential user is compared with a preset threshold according to the potential user predicted value to determine the potential user and push information about the potential user.

In a fourth aspect, an embodiment of the present application also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program enables when executed by a processor to realize: collecting user information by means of web crawlers Behavioral data; perform feature engineering processing on the behavioral data through one-hot encoding and normalization to obtain target data; input the target data into a pre-trained potential user mining model to output potential user prediction values, The potential user prediction value is used to characterize the possibility that the user belongs to a potential user; the potential user prediction value is compared with a preset threshold to determine the potential user and push information about the potential user. Optionally, the computer-readable storage medium may be a non-volatile computer-readable storage medium.

The embodiment of the application collects user behavior data and processes the data through feature engineering processing, and then predicts the behavior data through the potential user mining model to mine potential users, and pushes advertisements to potential users, which can improve the mining potential The accuracy of insuring users can effectively push advertisements and reduce the cost of obtaining user information.

Description of the drawings

FIG. 1 is a schematic diagram of an application scenario of an information push method based on data analysis provided by an embodiment of the application;

2 is a schematic flowchart of an information push method based on data analysis provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a sub-flow of an information push method based on data analysis provided by an embodiment of the application;

4 is a schematic diagram of a sub-flow of the method for pushing information based on data analysis provided by an embodiment of the application;

FIG. 5 is a schematic diagram of a sub-flow of an information pushing method based on data analysis provided by an embodiment of the application;

6 is a schematic flowchart of an information push method based on data analysis provided by another embodiment of the application;

FIG. 7 is a schematic block diagram of an information pushing device based on data analysis provided by an embodiment of the application;

FIG. 8 is a schematic block diagram of specific units of an information push device based on data analysis provided by an embodiment of the application;

FIG. 9 is a schematic block diagram of an information pushing device based on data analysis provided by another embodiment of the application; and

FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the application.

detailed description

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

It should be understood that the term "and/or" used in the description of this application and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations.

Please refer to FIG. 1 and FIG. 2. FIG. 1 is a schematic diagram of an application scenario of an information push method based on data analysis provided by an embodiment of the application. FIG. 2 is a schematic flowchart of an information push method based on data analysis provided by an embodiment of the application. The potential user mining is applied to the terminal 10 and is realized through the interaction between the terminal 10 and the server 20.

FIG. 2 is a schematic flowchart of an information push method based on data analysis provided by an embodiment of the present application. As shown in the figure, the method includes the following steps S110-S140.

S110. Collect user behavior data through a web crawler.

In one embodiment, the user's behavior data refers to data that the user performs some behavior on the network and the network records the behavior, for example, the user searches for compulsory traffic insurance on Taobao. A web crawler is a program or script that automatically crawls information on the World Wide Web according to certain rules. Specifically, first select some specific webpages as the start page, crawl the webpage from the start page by means of a web crawler, and filter a large number of crawled webpages after the crawling is complete to obtain the target webpage. The target webpage is the user For the web page that will be browsed, the behavior data of the user browsing the target webpage is finally obtained from the preset database of the target webpage.

In an embodiment, as shown in FIG. 3, the step S110 may include steps: S111-S113.

S111. Crawling a preset webpage by means of a web crawler.

Specifically, a web crawler refers to a program that automatically captures information on the World Wide Web in accordance with certain rules, which mainly includes three parts: collection, storage, and processing. Specifically, first select the URL of a representative webpage as the initial URL and start to fetch data from the server. The preset webpage is the representative webpage. The initial URL is from the customer's point of view. Usually, the customer will use the search engine To search for car insurance information, for example, you can use the search result page of compulsory traffic insurance on Baidu as the initial URL, or the search result page of compulsory traffic insurance on Taobao as the initial URL; then store the crawled webpages and analyze and filter them. The crawled initial URL contains the new URL, and the initial URL is parsed to filter the new URL and select the URL related to insurance. For example, the URL of the FAQ about insurance is placed in the URL queue waiting to be crawled, and the rest Irrelevant URLs are discarded; finally, select the URL of the web page to be crawled next in the URL queue to be crawled, and repeat the above process until it stops when traversing the entire network.

S112. Filter the crawled webpages according to a preset webpage index to obtain a target webpage.

Specifically, since the crawled webpages contain a large number of worthless webpages, it is necessary to further filter the crawled webpages, and select some valuable webpages as target webpages, that is, webpages that users are likely to browse Evaluate and filter the crawled webpages according to the preset webpage index to obtain the target webpage. Among them, the default webpage index refers to the webpage index provided by the data sharing platform based on the massive search and browsing behavior data of major search engines. The webpage index is specifically based on the browsing data of the website (view volume, browsing duration, and number of views) The value obtained through a series of evaluations. For example, the official website of Insurance Company A has a website index of 89. Obtain the preset webpage index of the crawled webpage, sort the crawled webpages from high to low according to the preset webpage index, and select the top ten webpages as the target webpage. Of course, it is understandable that, You can also choose another number of pages as the target page.

S113. Obtain user behavior data from a preset database according to the target webpage.

Specifically, the preset database refers to a database storing the target webpage, and the preset database stores all data related to the target webpage. Specifically, after the target webpage is obtained by screening, the target webpage interface is called according to the URL of the target webpage. The interface is provided with the consent of the operator of the target webpage, and the target webpage is called from the preset database. The webpage log of the webpage, after obtaining the webpage log, analyzes the obtained webpage log and finally obtains the user's behavior data, where the user's behavior data includes: user information, user browsing records, and user IP address.

S120: Perform feature engineering processing on the behavior data by means of one-hot encoding and normalization to obtain target data.

In one embodiment, feature engineering refers to the process of transforming original data into target data of a model. Commonly used feature engineering methods include: timestamp processing, decomposition of category attributes, binning/partitioning, cross feature, feature selection, feature Scaling and feature extraction. Behavioral data is mainly divided into two categories. One is numerical behavioral data, such as car age, browsing time, and annual income, and the other is non-numerical behavioral data, such as favorites, comments, concerns, and adding to shopping carts. Wait. Specifically, the non-numerical behavior data is converted into target data for model input by decomposing category attributes, and the numerical behavior data is converted into target data for model input by feature scaling.

In an embodiment, as shown in FIG. 4, the step S120 may include steps S121-S122.

S121: Perform one-hot encoding on the non-numerical behavior data to obtain target data.

Specifically, for non-numerical features, the method of decomposing category attributes is used to perform feature engineering. The method of decomposing category attributes is to encode behavior data through one-hot encoding, that is, one-hot encoding. The method is to use N-bit status registers. To encode N states, each state has its own independent register bit, and at any time, only one of them is valid. For example, the gender attribute includes male and female. After one-hot encoding, the target data of "male" is [1,0], and the target data of "female" is [0,1]. For example, whether the user bookmarks the webpage , After one-hot encoding, the target data of "favorite" is [1,0], and the target data of "not favorite" is [0,1].

S122: Normalize the numerical behavior data according to a preset formula to obtain target data.

Specifically, for numerical features, feature scaling is used for feature engineering. Because some numerical features have a much higher span value than other features, such as annual income and age, in order to avoid certain features and other features The size of is very different, and the feature value needs to be scaled to the same range value. Specifically, a preset formula is used to normalize the numerical target data, and the preset formula is specifically as follows:

X′=(X-minX)/(maxX-minX)

Among them, X′ is the normalized characteristic value, X is the current user characteristic parameter, minX is the minimum parameter of the current user characteristic, and maxX is the maximum parameter of the current user characteristic. For example, if the maximum value of annual income is 500,000, the minimum value of annual income is 60,000, and the current user's annual income is 100,000, then after normalization, a normalized feature value of 0.09 in the range of 0 to 1 is obtained.

S130. Input the target data into a pre-trained potential user mining model to output a potential user prediction value, where the potential user prediction value is used to characterize the possibility that the user belongs to a potential user.

In one embodiment, the potential user mining model is constructed by using a gradient boosting decision tree algorithm (Gradient Boosting Decision Tree). The gradient boosting decision tree is a combined decision tree algorithm, which is mainly through multiple decision trees in series. , The next decision tree learning uses the residual of the decision tree in the previous lesson, the residual is obtained by the gradient, and all the decision trees are combined to form the gradient boosting decision tree. For example, predict potential users, which features include: user age and user annual income. The ages of users A, B, C, and D are 18, 26, 36, and 41 respectively, and the annual income is 0, 300,000, 100,000, and 50, respectively. Wan, first of all, the first decision tree classifies user AB into the category below 30 years old according to the age label (based on 30 years old), and divides CD into the category above 30 years old. The predicted values of ABCD as potential users are respectively 0.1, 0.3, 0.6 and 0.8, the residual of class AB is the difference between the average of the predicted value of AB and the predicted value, so the average of the predicted value of AB is 0.2, and the residual of AB is -0.1 and 0.1 respectively ; And the average of the predicted value of CD is 0.7, and the CD residuals are -0.1 and 0.1 respectively, then the next decision is predicted based on the residual of the previous decision tree, based on the annual income label (based on 150,000) Divide AC to below 150,000 and BD to above 150,000. The next decision tree is solved according to the residual of the previous decision tree, and the residual value of AC obtained by the next decision tree is 0, that is (- 0.1+0.1)/2=0, the residual value of BC is also 0, and the residuals of all users are finally 0, so that the final prediction values of ABCD are 0, 0.4, 0.5 and 0.9 respectively, and the final prediction value is the prediction The sum of the value and the residual. The core of its prediction is that each tree learns the residuals of the sum of all previous tree conclusions. The potential user mining model has been pre-trained, and the potential user mining model is run on the Spark platform to predict the target data. Spark is a fast and universal computing engine designed for large-scale data processing. The Spark platform includes the algorithm component Spark MLlib (Machine Learning Library, machine learning library), Spark MLlib includes an algorithm library. The algorithm library has a gradient boosting decision tree algorithm. Spark MLlib provides an algorithm interface for the gradient boosting decision tree algorithm to predict target data.

In an embodiment, as shown in FIG. 5, the step S130 may include steps: S131-S132.

S131. Construct a target sample according to the target data.

Specifically, a target sample refers to a sample composed of target data and a label (label) for model input, where the target sample is divided into a positive sample and a negative sample, the label value of the positive sample is 1, and the label value of the negative sample Is 0. For example, a positive sample is that the annual income is greater than or equal to 100,000, and a negative sample is that no car has been purchased. If the customer's annual income is 100,000, then the target sample is (0.09, 1), where 0.09 is the feature value and 1 is the label value; If the customer does not purchase a car, then the target sample is (0, 0).

S132. Input the target sample into the gradient boosting decision tree model to iteratively update and output the predicted value of the potential user.

Specifically, the potential user mining model adopts the gradient boosting decision tree algorithm. The gradient boosting decision tree algorithm is through multiple rounds of iteration. Each round of iteration obtains a decision tree. The loss of each round of decision tree in the previous round of decision tree The function is based on fitting, and finally the conclusions of all decision trees are added up to get the predicted value. Specifically, the formula of the gradient boosting decision tree algorithm is as follows:

F _m (x)=F _m-1 (x)+T(x; θ _m )

L[y,F(x)]=[yF(x)] ²

Among them, F _M (x) represents the model, T(x; θ _m ) represents the decision tree, θ _m is the decision tree parameter, m is the number of decision trees, L is the loss function, x is the sample feature, and y is the sample label. The sample feature and sample label constitute the target sample, the label value is 0 or 1, i is the number of samples, and T uses the CART decision tree, which is a typical binary decision tree that can be classified or regressed. Specifically, first initialize the decision tree that is to set F ₀ (x) = 0, then calculate the loss function according to the target sample, then update the model according to the loss function, continue to iterate the model until the end of the iteration to obtain the final model, and finally calculate each decision in the model The predicted value of the tree is summed and averaged to obtain the predicted value of potential users.

S140: Compare the predicted value of the potential user with a preset threshold to determine the potential user and push information about the potential user.

In one embodiment, after the predicted value of the potential user is obtained, the predicted value of the potential user is compared with a preset threshold, and if the predicted value of the potential user is greater than the preset threshold, the user is determined to be a potential user; If the predicted value of is less than the preset threshold, it is determined that the user is a non-potential user. For example, if the preset threshold is 0.6 and the predicted value of the potential user is 0.8, then the predicted value of the user is greater than the preset threshold to determine that the user is a potential user. After getting potential users, push advertisements to this part of potential users. The advertisements pushed can be insurance information, auto insurance product information, insurance links, etc. Specifically, the list of potential users and the advertisement link are sent to the operator of the target webpage, and the operator pushes the advertisement link according to the user's IP address when the potential user logs in and browses the webpage.

In one embodiment, as shown in FIG. 6, after the step S140, it further includes steps: S150-S160.

S150. Obtain a feedback result of the advertisement push.

In an implementation, the feedback result refers to whether the potential user has opened the advertisement link pushed by the target webpage. If the user opens the advertisement link pushed by the target webpage, it is a positive feedback; if the user does not open the advertisement pushed by the target webpage Links are negative feedback. Specifically, the feedback result is obtained from the target webpage, and the feedback result is stored in the preset database of the target webpage operator in the form of webpage log. Therefore, the calling interface is obtained from the preset database of the target webpage and parsed to obtain the webpage log, and then pass The regular expression sets the URL of the pushed advertisement link as the rule string, and filters the browsing record of browsing the advertisement link from the web log, and the browsing record is the feedback result.

S160. Prompt optimization of the potential user mining model via email according to the feedback result.

In an implementation, whether the user mining model needs to be optimized is mainly judged by the conversion rate. The conversion rate refers to the ratio of the number of potential users who viewed the pushed advertising links to the number of all potential users. The more potential users of the advertising link, the higher the conversion rate. Specifically, the actual conversion rate is compared with the expected conversion rate. If the actual conversion rate is greater than the expected conversion rate, it indicates that the potential user mining model has a good conversion effect and does not need to be optimized; if the actual conversion rate is less than the expected conversion rate, It shows that the conversion effect of the potential user mining model is poor, and the model needs to be optimized. According to the feedback results, a reminder email is generated, and the reminder email is sent to the email address of the model manager to remind the model to be optimized.

The embodiment of the application shows an information push method based on data analysis, which collects user behavior data through a web crawler; performs feature engineering processing on the behavior data through one-hot encoding and normalization to obtain target data Input the target data into a pre-trained potential user mining model to output a potential user prediction value, the potential user prediction value is used to characterize the possibility that the user belongs to a potential user; according to the potential user prediction value Comparing with the preset threshold value to determine potential users and push information to the potential users, potential insured users can be mined, advertising can be effectively pushed, and the cost of obtaining user information can be reduced.

FIG. 7 is a schematic block diagram of an information push device 200 based on data analysis provided by an embodiment of the present application. As shown in FIG. 7, corresponding to the above information pushing method based on data analysis, the present application also provides an information pushing device 200 based on data analysis. The data analysis-based information pushing device 200 includes a unit for executing the above-mentioned data analysis-based information pushing method, and the device can be configured in a desktop computer, a tablet computer, a laptop computer, and other terminals. Specifically, referring to FIG. 7, the information pushing device 200 based on data analysis includes: a crawler unit 210, a feature engineering unit 220, a prediction unit 230, and a pushing unit 240.

The crawler unit 210 is used to collect user behavior data by way of web crawlers.

In an embodiment, as shown in FIG. 8, the crawler unit 210 includes: a crawler subunit 211, a screening unit 212 and an acquisition subunit 213.

The crawler subunit 211 is used for crawling a preset webpage by way of a web crawler.

The screening unit 212 is used for screening the crawled webpages according to a preset webpage index to obtain a target webpage.

Specifically, since the crawled webpages contain a large number of worthless webpages, it is necessary to further filter the crawled webpages, and select some valuable webpages as target webpages, that is, webpages that users are likely to browse , Evaluate and filter the crawled webpages according to the preset webpage index to obtain the target webpage. Among them, the preset webpage index refers to the webpage index provided by the data sharing platform based on the massive amount of Internet user search behavior data of major search engines. Get the preset webpage index of the crawled webpage, sort the crawled webpages from high to low according to the preset webpage index, and select the top ten webpages as the target webpage. Of course, it is understandable that, You can also choose another number of pages as the target page.

The obtaining subunit 213 is configured to obtain user behavior data from a preset database according to the target webpage.

The feature engineering unit 220 is configured to perform feature engineering processing on the behavior data through one-hot encoding and normalization to obtain target data.

In an embodiment, as shown in FIG. 8, the feature engineering unit 220 includes: an encoding unit 221 and a normalization unit 222.

The encoding unit 221 is configured to perform one-hot encoding on the non-numerical behavior data to obtain target data.

The normalization unit 222 is configured to normalize the numerical behavior data according to a preset formula to obtain target data.

X′=(X-minX)/(maxX-minX)

The prediction unit 230 is configured to input the target data into a pre-trained potential user mining model to output a potential user prediction value, and the potential user prediction value is used to characterize the possibility that the user belongs to a potential user.

In one embodiment, the potential user mining model is constructed by using a gradient boosting decision tree algorithm (Gradient Boosting Decision Tree). The gradient boosting decision tree is a combined decision tree algorithm, which is mainly through multiple decision trees in series. , The next decision tree learning uses the residual of the decision tree in the previous lesson, the residual is obtained by the gradient, and all the decision trees are combined to form the gradient boosting decision tree. For example, predict potential users, which features include: user age and user annual income. The ages of users A, B, C, and D are 18, 26, 36, and 41 respectively, and the annual income is 0, 300,000, 100,000, and 50, respectively. Wan, first of all, the first decision tree classifies user AB into the category below 30 years old according to the age label (based on 30 years old), and divides CD into the category above 30 years old. The predicted values of ABCD as potential users are respectively 0.1, 0.3, 0.6 and 0.8, the residual of class AB is the difference between the average of the predicted value of AB and the predicted value, so the average of the predicted value of AB is 0.2, and the residual of AB is -0.1 and 0.1 respectively ; And the average of the predicted value of CD is 0.7, and the CD residuals are -0.1 and 0.1 respectively, then the next decision is predicted based on the residual of the previous decision tree, based on the annual income label (based on 150,000) Divide AC to below 150,000 and BD to above 150,000. The next decision tree is solved according to the residual of the previous decision tree, and the residual value of AC obtained by the next decision tree is 0, that is (- 0.1+0.1)/2=0, the residual value of BC is also 0, and the residuals of all users are finally 0, so that the final prediction values of ABCD are 0, 0.4, 0.5 and 0.9 respectively, and the final prediction value is the prediction The sum of the value and the residual. The core of its prediction is that each tree learns the residuals of the sum of all previous tree conclusions. The potential user mining model has been pre-trained, and the potential user mining model is run on the Spark platform to predict the target data. Spark is a fast and universal computing engine designed for large-scale data processing. The Spark platform includes the algorithm component Spark MLlib( Machine Learning Library), Spark MLlib includes an algorithm library. The algorithm library has a gradient boosting decision tree algorithm. Spark MLlib provides an algorithm interface for the gradient boosting decision tree algorithm to predict target data.

In an embodiment, as shown in FIG. 8, the feature engineering unit 220 includes: a construction unit 231 and a prediction subunit 232.

The construction unit 231 is configured to construct a target sample according to the target data.

Specifically, a target sample refers to a sample composed of target data and a label (label) for model input, where the target sample is divided into a positive sample and a negative sample, the label value of the positive sample is 1, and the label value of the negative sample Is 0. For example, the positive sample is annual income greater than or equal to 100,000, and the negative sample is for example not buying a car. If the customer’s annual income is 100,000, the target sample is (0.09, 1), and if the customer does not purchase a car, the target sample is (0 , 0).

The prediction subunit 232 is configured to input the target sample into the gradient boosting decision tree model to iteratively update and output the predicted value of the potential user.

F _m (x)=F _m-1 (x)+T(x; θ _m )

L[y,F(x)]=[yF(x)] ²

The pushing unit 240 is configured to compare the predicted value of the potential user with a preset threshold to determine the potential user and push the information of the potential user.

In one embodiment, after the predicted value of the potential user is obtained, the predicted value of the potential user is compared with a preset threshold, and if the predicted value of the potential user is greater than the preset threshold, the user is determined to be a potential user; If the predicted value of is less than the preset threshold, it is determined that the user is a non-potential user. For example, if the preset threshold is 0.6 and the predicted value of the potential user is 0.8, then the predicted value of the user is greater than the preset threshold to determine that the user is a potential user. After getting the potential users, push advertisements to this part of the potential users. The advertisements pushed can be insurance information, auto insurance product information, and insurance links. Specifically, the list of potential users and the advertisement link are sent to the operator of the target webpage, and the operator pushes the advertisement link according to the user's IP address when the potential user logs in and browses the webpage.

In an embodiment, as shown in FIG. 9, the information pushing device 200 based on data analysis further includes: an acquiring unit 250 and a prompting unit 260.

The obtaining unit 250 is configured to obtain the feedback result of the advertisement push.

The prompt unit 260 is configured to prompt and optimize the potential user mining model through email according to the feedback result.

In an implementation, whether the user mining model needs to be optimized is mainly judged by the conversion rate. The conversion rate refers to the ratio of the number of potential users who viewed the pushed advertising links to the number of all potential users. The more potential users of the advertising link, the higher the conversion rate. Specifically, the actual conversion rate is compared with the expected conversion rate. If the actual conversion rate is greater than the expected conversion rate, it indicates that the potential user mining model has a good conversion effect and does not need to be optimized; if the actual conversion rate is less than the expected conversion rate, It shows that the conversion effect of the potential user mining model is poor, and the model needs to be optimized.

The embodiment of the application shows an information push device based on data analysis, which collects user behavior data through a web crawler; performs feature engineering processing on the behavior data through one-hot encoding and normalization to obtain target data Input the target data into a pre-trained potential user mining model to output a potential user prediction value, the potential user prediction value is used to characterize the possibility that the user belongs to a potential user; according to the potential user prediction value Comparing with the preset threshold value to determine potential users and push information to the potential users, potential insured users can be mined, advertising can be effectively pushed, and the cost of obtaining user information can be reduced.

It should be noted that those skilled in the art can clearly understand that the above-mentioned data analysis-based information push device 200 and the specific implementation process of each unit can refer to the corresponding description in the foregoing method embodiment, for the convenience and brevity of the description. , I won’t repeat it here.

The above-mentioned information pushing device based on data analysis can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 10.

Please refer to FIG. 10, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a terminal, where the terminal may be an electronic device with communication functions such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device. 10, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions. When the program instructions are executed, the processor 502 can execute an information push method based on data analysis.

The processor 502 is used to provide calculation and control capabilities to support the operation of the entire computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute an information push method based on data analysis.

The network interface 505 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.

Wherein, the processor 502 is configured to run a computer program 5032 stored in a memory to implement the following steps: collect user behavior data by means of web crawlers; and perform one-hot encoding and normalization on the behavior data Perform feature engineering processing to obtain target data; input the target data into a pre-trained potential user mining model to output potential user prediction values, which are used to characterize the possibility that the user belongs to a potential user ; Compare the predicted value of the potential user with the preset threshold to determine the potential user and push the information of the potential user.

In one embodiment, when the processor 502 implements the step of collecting user behavior data by means of a web crawler, it specifically implements the following steps: crawling a preset webpage by means of a web crawler; The fetched webpages are filtered to obtain a target webpage; the user's behavior data is obtained from a preset database according to the target webpage.

In an embodiment, when the processor 502 implements the step of performing feature engineering processing on the behavior data through one-hot encoding and normalization to obtain target data, it specifically implements the following steps: The behavior data is one-hot encoded to obtain the target data; the numerical behavior data is normalized according to a preset formula to obtain the target data.

In an embodiment, the processor 502 inputs the target data into a pre-trained potential user mining model to output a potential user prediction value. The potential user prediction value is used to characterize that the user is a potential user. In the user possibility step, the following steps are specifically implemented: construct a target sample according to the target data; input the target sample into the gradient boosting decision tree model to iteratively update the predicted value of the potential user.

In one embodiment, after the processor 502 implements the step of comparing the predicted value of the potential user with a preset threshold to determine the potential user and push the information of the potential user, the processor 502 further implements the following step: The feedback result of the advertisement push; according to the feedback result, the potential user mining model is prompted to optimize through the email.

It should be understood that, in this embodiment of the application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by computer programs instructing relevant hardware. The computer program includes program instructions, and the computer program can be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiments.

Therefore, this application also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, where the computer program includes program instructions. When the program instructions are executed by the processor, the processor executes the following steps: collect user behavior data through a web crawler; perform feature engineering processing on the behavior data through one-hot encoding and normalization to obtain target data; Input the target data into a pre-trained potential user mining model to output a potential user prediction value, which is used to characterize the possibility that the user belongs to a potential user; according to the potential user prediction value and The preset threshold is compared to determine potential users and push information to the potential users. Optionally, the computer-readable storage medium may be a non-volatile storage medium or a volatile storage medium.

In an embodiment, when the processor executes the program instructions to implement the step of collecting user behavior data by means of a web crawler, it specifically implements the following steps: crawling a preset webpage by means of a web crawler; The preset webpage index filters the crawled webpages to obtain the target webpage; and obtains user behavior data from the preset database according to the target webpage.

In an embodiment, when the processor executes the program instructions to implement the step of performing feature engineering processing on the behavior data by one-hot encoding and normalization to obtain the target data, it specifically implements the following steps : Perform one-hot encoding on the non-numeric behavior data to obtain target data; normalize the numeric behavior data according to a preset formula to obtain the target data.

In an embodiment, the processor executes the program instructions to realize the input of the target data into a pre-trained potential user mining model to output potential user predicted values, and the potential user predicted values are used In the step of characterizing the possibility that the user belongs to a potential user, the following steps are specifically implemented: construct a target sample according to the target data; input the target sample into the gradient boosting decision tree model to iteratively update the predicted value of the potential user .

In an embodiment, after the processor executes the program instructions to implement the comparison between the predicted value of the potential user and a preset threshold to determine the potential user and perform the information push step for the potential user, The following steps are achieved: obtaining the feedback result of the advertisement push; according to the feedback result, prompting and optimizing the potential user mining model through email.

The storage medium may be a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk, and other computer-readable storage media that can store program codes.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the hardware and software Interchangeability. In the above description, the composition and steps of each example have been generally described in terms of function. Whether these functions are performed by hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the several embodiments provided in this application, it should be understood that the disclosed device and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of each unit is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be omitted or not implemented.

The steps in the method of the embodiment of the present application can be adjusted, merged, and deleted in order according to actual needs. The units in the devices in the embodiments of the present application may be combined, divided, and deleted according to actual needs. In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

An information push method based on data analysis, which includes:

Collect user behavior data through web crawlers;

Perform feature engineering processing on the behavior data by means of one-hot encoding and normalization to obtain target data;

Inputting the target data into a pre-trained potential user mining model to output a potential user prediction value, where the potential user prediction value is used to characterize the possibility that the user belongs to a potential user;

Compare the predicted value of the potential user with a preset threshold to determine the potential user and push information on the potential user.
The method for pushing information based on data analysis according to claim 1, wherein said collecting user behavior data by means of web crawlers comprises:

Crawl preset webpages by means of web crawlers;

Filter the crawled webpages according to the preset webpage index to obtain the target webpage;

The user's behavior data is obtained from a preset database according to the target webpage.
The information push method based on data analysis according to claim 1, wherein said performing feature engineering processing on said behavior data in a way of one-hot encoding and normalization to obtain target data comprises:

Performing one-hot encoding on the non-numerical behavior data to obtain target data;

The numerical behavior data is normalized according to a preset formula to obtain target data.
The method for pushing information based on data analysis according to claim 1, wherein said inputting said target data into a pre-trained potential user mining model to output potential user prediction values, said potential user prediction values being used for Characterizing the possibility that the user is a potential user includes:

Construct a target sample according to the target data;

The target sample is input into the gradient boosting decision tree model to iteratively update the predicted value of the potential user.
The method for pushing information based on data analysis according to claim 1, wherein after comparing the predicted value of the potential user with a preset threshold to determine the potential user and pushing the information of the potential user, the method further comprises:

Obtaining the feedback result of the advertisement push;

According to the feedback result, the potential user mining model is prompted to optimize through an email.
The method for pushing information based on data analysis according to claim 4, wherein said inputting said target sample into a gradient boosting decision tree model to iteratively update and output the predicted value of potential users comprises:

Initialize the decision tree model, and calculate the loss function according to the target sample;

Update the decision tree model according to the loss function, and continue to iterate the decision tree model until the iteration ends to obtain the final decision tree model;

The predicted value of each decision tree in the decision tree model is summed and averaged to obtain the predicted value of potential users.
The information pushing method based on data analysis according to claim 5, wherein the feedback result is used to indicate whether the potential user has opened the advertisement link pushed by the target webpage;

Wherein, the feedback result includes positive feedback or feedback, the positive feedback is used to indicate that the user has opened the advertisement link pushed by the target webpage, and the negative feedback is used to indicate that the user has not opened the advertisement link pushed by the target webpage.
An information push device based on data analysis, which includes:

Crawler unit, used to collect user behavior data through web crawlers;

The feature engineering unit is used to perform feature engineering processing on the behavior data through one-hot encoding and normalization to obtain target data;

A prediction unit, configured to input the target data into a pre-trained potential user mining model to output a potential user prediction value, the potential user prediction value being used to characterize the possibility that the user belongs to a potential user;

The pushing unit is configured to compare the predicted value of the potential user with a preset threshold to determine the potential user and push information of the potential user.
A computer device, wherein the computer device includes a memory and a processor, a computer program is stored on the memory, and the processor implements the following steps when the processor executes the computer program:

Collect user behavior data through web crawlers;

Perform feature engineering processing on the behavior data by means of one-hot encoding and normalization to obtain target data;

Inputting the target data into a pre-trained potential user mining model to output a potential user prediction value, where the potential user prediction value is used to characterize the possibility that the user belongs to a potential user;

Compare the predicted value of the potential user with a preset threshold to determine the potential user and push information on the potential user.
The computer device according to claim 9, wherein when the processor executes the collection of user behavior data by means of a web crawler, the following steps are specifically executed:

Crawl preset webpages by means of web crawlers;

Filter the crawled webpages according to the preset webpage index to obtain the target webpage;

The user's behavior data is obtained from a preset database according to the target webpage.
The computer device according to claim 9, wherein when the processor executes the feature engineering processing on the behavior data by one-hot encoding and normalization to obtain the target data, the following steps are specifically executed:

Performing one-hot encoding on the non-numerical behavior data to obtain target data;

The numerical behavior data is normalized according to a preset formula to obtain target data.
The computer device according to claim 9, wherein the processor executes the input of the target data into a pre-trained potential user mining model to output a potential user prediction value, and the potential user prediction value is used for When characterizing the possibility that the user is a potential user, the following steps are specifically performed:

Construct a target sample according to the target data;

The target sample is input into the gradient boosting decision tree model to iteratively update the predicted value of the potential user.
The computer device according to claim 9, wherein after the processor executes the comparison between the predicted value of the potential user and a preset threshold to determine the potential user and pushes the information of the potential user, the processor further executes the following step:

Obtaining the feedback result of the advertisement push;

According to the feedback result, the potential user mining model is prompted to optimize through an email.
The computer device according to claim 12, wherein when the processor executes the input of the target sample into the gradient boosting decision tree model to iteratively update the predicted value of the potential user, the following steps are specifically executed:

Initialize the decision tree model, and calculate the loss function according to the target sample;

Update the decision tree model according to the loss function, and continue to iterate the decision tree model until the iteration ends to obtain the final decision tree model;

The predicted value of each decision tree in the decision tree model is summed and averaged to obtain the predicted value of potential users.
The computer device according to claim 13, wherein the feedback result is used to indicate whether the potential user has opened the advertisement link pushed by the target webpage;

Wherein, the feedback result includes positive feedback or feedback, the positive feedback is used to indicate that the user has opened the advertisement link pushed by the target webpage, and the negative feedback is used to indicate that the user has not opened the advertisement link pushed by the target webpage.
A computer-readable storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:

Collect user behavior data through web crawlers;

Perform feature engineering processing on the behavior data by means of one-hot encoding and normalization to obtain target data;

Inputting the target data into a pre-trained potential user mining model to output a potential user prediction value, where the potential user prediction value is used to characterize the possibility that the user belongs to a potential user;

Compare the predicted value of the potential user with a preset threshold to determine the potential user and push information on the potential user.
The computer-readable storage medium according to claim 16, wherein when the user's behavior data is collected by means of a web crawler, the computer program is executed by the processor to implement the following steps:

Crawl preset webpages by means of web crawlers;

Filter the crawled webpages according to the preset webpage index to obtain the target webpage;

The user's behavior data is obtained from a preset database according to the target webpage.
The computer-readable storage medium according to claim 16, wherein the computer program is executed by the processor when the characteristic engineering processing is performed on the behavior data by one-hot encoding and normalization to obtain the target data Implement the following steps:

Performing one-hot encoding on the non-numerical behavior data to obtain target data;

The numerical behavior data is normalized according to a preset formula to obtain target data.
The computer-readable storage medium according to claim 16, wherein said inputting said target data into a pre-trained potential user mining model to output potential user prediction values, said potential user prediction values being used to characterize all When the user is a potential user, the computer program is executed by the processor to implement the following steps:

Construct a target sample according to the target data;

The target sample is input into the gradient boosting decision tree model to iteratively update the predicted value of the potential user.
The computer-readable storage medium according to claim 16, wherein, after comparing the predicted value of the potential user with a preset threshold to determine the potential user and push the information of the potential user, the computer program further Used by the processor to implement the following steps:

Obtaining the feedback result of the advertisement push;

According to the feedback result, the potential user mining model is prompted to optimize through an email.