WO2022156589A1 - Procédé et dispositif de détermination d'un taux de clic de diffusion en direct - Google Patents

Procédé et dispositif de détermination d'un taux de clic de diffusion en direct Download PDF

Info

Publication number
WO2022156589A1
WO2022156589A1 PCT/CN2022/071797 CN2022071797W WO2022156589A1 WO 2022156589 A1 WO2022156589 A1 WO 2022156589A1 CN 2022071797 W CN2022071797 W CN 2022071797W WO 2022156589 A1 WO2022156589 A1 WO 2022156589A1
Authority
WO
WIPO (PCT)
Prior art keywords
live broadcast
user
data
click
model
Prior art date
Application number
PCT/CN2022/071797
Other languages
English (en)
Chinese (zh)
Inventor
王艺斐
王晶晶
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2022156589A1 publication Critical patent/WO2022156589A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0838Historical data

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a method and device for determining the click-through rate of a live broadcast.
  • the embodiments of the present disclosure provide a method and apparatus for determining the click-through rate of live broadcast, which can improve the prediction accuracy of the click-through rate of live broadcast, thereby improving the accuracy of push, and reducing the possibility of insufficient stock or slow sales.
  • a method for determining a live broadcast click-through rate comprising:
  • the multiple historical user data, and the generation time of the multiple historical user data determine the user behavior sequence corresponding to the multiple historical user data
  • the click rate prediction model is trained
  • the click-through rate of the target user on the target live broadcast data is determined.
  • determining the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data including:
  • the user behavior feature and the generation time corresponding to the user behavior feature are used as the input of the sequence generation model, and the weight value corresponding to each of the user behavior features is determined according to the output of the sequence generation model;
  • the user behavior sequence is generated according to the user behavior feature and the weight value.
  • determining the weight corresponding to each of the user behavior features according to the output of the sequence generation model including:
  • the output of the sequence generation model is normalized to obtain a weight value corresponding to each of the user behavior features.
  • the training of the click-through rate prediction model according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics includes:
  • the user behavior sequence is input into the ARMA model, and the user dynamic feature is determined according to the output of the ARMA model;
  • the user dynamic feature, the user attribute feature and the live broadcast feature are used as the input of the click-through rate prediction model to train the click-through rate prediction model.
  • the method further includes:
  • the live broadcast data is pushed to the target user.
  • the method further includes:
  • the inventory corresponding to the target live broadcast data is determined, and inventory management is performed according to the inventory.
  • the sequence generation model is a random Senli model
  • the click-through rate prediction model is an XGBOOST model.
  • a device for determining a click-through rate of a live broadcast including:
  • the acquisition module is used to acquire multiple historical user data and multiple historical live broadcast data
  • a sequence generation module configured to determine the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data;
  • a feature generation module configured to determine user attribute features according to the plurality of historical user data, and determine live broadcast features according to the plurality of historical live broadcast data;
  • a model training module used for training the click-through rate prediction model according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics
  • the data processing module is used to determine the click rate of the target user on the target live broadcast data according to the trained click rate prediction model.
  • an electronic device for determining the click-through rate of a live broadcast including:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method for determining the click-through rate of a live broadcast provided by the present disclosure.
  • a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the method for determining the click-through rate of a live broadcast provided by the present disclosure.
  • the above-mentioned embodiments have the following advantages or beneficial effects: because the time-based user data training model is used to determine the technical means of the live broadcast click-through rate, the technical problems of inaccurate subjective prediction push, insufficient stock or unsalable phenomena are overcome, and the result is achieved.
  • the technical effect of improving the prediction accuracy of the click-through rate of the live broadcast thereby improving the accuracy of the push, and reducing the possibility of insufficient stock or slow sales.
  • FIG. 1 is an exemplary system architecture diagram of a method for determining a live click rate or an apparatus for determining a live click rate that is suitable for application in an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a main process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a detailed flow of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of main modules of a device for determining a live click rate according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present disclosure.
  • ARMA model is an important method to study time series, including: autoregressive model (Autoregressive model, referred to as AR model), moving average model (Moving average model, referred to as MA model) and an autoregressive moving average model (ARMA).
  • AR model autoregressive model
  • MA model moving average model
  • ARMA autoregressive moving average model
  • Censoring refers to the property that the autocorrelation function (ACF) or partial autocorrelation function (PACF) of the time series is 0 after a certain order.
  • ACF autocorrelation function
  • PAF partial autocorrelation function
  • Tailing refers to the property that the autocorrelation function (ACF) or partial autocorrelation function (PACF) of the time series is not all zero after a certain order.
  • ACF autocorrelation function
  • PAF partial autocorrelation function
  • AIC Akaike Information Criterion
  • Akaike Information Criterion is a standard to measure the goodness of statistical model fitting. Usually, the smaller the AIC value, the better the model.
  • BIC Bayesian Information Criterion
  • FIG. 1 shows an exemplary system architecture diagram of a method for determining a click-through rate of live broadcast or an apparatus for determining a click-through rate of a live broadcast that is suitable for use in an embodiment of the present disclosure.
  • the exemplary system architecture of the method or the apparatus for determining the live click rate includes:
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.
  • the server 105 may be a server that provides various services, for example, a background management server that provides support for shopping websites browsed by the terminal devices 101 , 102 , and 103 .
  • the background management server may analyze and process the received user feature query request and other data, and feed back the processing results (eg, user features) to the terminal devices 101 , 102 , and 103 .
  • the method for determining the click-through rate of live broadcast is generally performed by the server 105 , and accordingly, the device for determining the click-through rate of the live broadcast is generally set in the server 105 .
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 is a schematic diagram of the main process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure. As shown in FIG. 2 , the method for determining a live broadcast click-through rate of the present disclosure includes:
  • Step S201 acquiring multiple historical user data and multiple historical live broadcast data.
  • a plurality of historical user data and a plurality of historical live broadcast data are obtained based on the historical data of the platform.
  • the historical user data may include data such as the user's age, gender, purchasing ability, occupation, and preference, and may also include the user's browsing data. , comments, favorites, add-ons, ordering, sharing and other operations data, can also include the time when the user operation data is generated; historical live broadcast data can include live broadcast brand, lottery, time, interaction, anchor and commodity information.
  • Step S202 according to the sequence generation model, the multiple historical user data and the generation time of the multiple historical user data, determine the user behavior sequence corresponding to the multiple historical user data.
  • the user behavior characteristics are determined based on the user operation data therein, and the corresponding generation time of the user behavior characteristics is determined based on the time when the user operation data is generated, and the user behavior characteristics and user behavior characteristics are determined.
  • the corresponding generation time of the feature is input to the sequence generation model, and the feature score of each user behavior feature is output.
  • the feature scores of the user behavior features are normalized to obtain the weight value corresponding to each user behavior feature; based on the data of the user behavior feature and the weight value corresponding to the user behavior feature, the user behavior sequence is generated after weighted summation.
  • the user behavior sequence contains 12 elements, that is, the 12-month user behavior score, which characterizes the user's behavior in the past 1 year.
  • Step S203 Determine user attribute characteristics according to the plurality of historical user data, and determine live broadcast characteristics according to the plurality of historical live broadcast data.
  • the user attribute feature is determined based on the user information data therein, and the live broadcast feature is determined based on the plurality of historical live broadcast data therein.
  • Step S204 train a click-through rate prediction model according to the user behavior sequence, the user attribute feature, and the live broadcast feature.
  • the user behavior sequence obtained in step S202 is input into the ARMA model, the ARMA model is trained, and the ARMA model parameters are output as user dynamic features.
  • the user dynamic characteristics, the user attribute characteristics obtained in step S203, and the live broadcast characteristics are input into the click-through rate prediction model, the click-through rate prediction model is trained, and the trained click-through rate prediction model is output.
  • the CTR prediction model is the XGBOOST model.
  • Step S205 according to the trained click-through rate prediction model, determine the click-through rate of the target user on the target live broadcast data.
  • the target user data and a plurality of live broadcast data to be pushed are acquired, and the click rate of the target user on the target live broadcast data is determined according to the target user data and the trained click rate prediction model.
  • the click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering.
  • push live broadcast data for the target user or, according to the click rate, determine the inventory corresponding to the target live broadcast data, and carry out inventory management according to the inventory amount to timely increase, allocate supply warehouses or support demand warehouses.
  • the user behavior sequence corresponding to the historical user data is determined according to the multiple historical user data
  • the live broadcast characteristics are determined according to the multiple historical live broadcast data
  • train the click-through rate prediction model according to the trained click-through rate prediction model, determine the click rate of the target user on the target live broadcast data and other steps, which can adapt to the periodic changes of user behavior, optimize the performance of the live broadcast data prediction model, and make full use of Live broadcast resources, accurately predict live broadcast click-through rate, accurately push live broadcasts to users and manage inventory reasonably.
  • FIG. 3 is a schematic diagram of a detailed process of a method for determining a live broadcast click-through rate according to an embodiment of the present disclosure. As shown in FIG. 3 , the method for determining a live broadcast click-through rate of the present disclosure includes:
  • step S301 a database of live broadcast is constructed.
  • a live broadcast database is constructed based on the existing historical data of the platform, and historical user data and historical live broadcast data are obtained from the historical data of the platform.
  • Historical user data can include multiple pieces of data. Take historical user data of an e-commerce platform as an example.
  • historical user data can include data such as the user's age, gender, purchasing ability, occupation, and preferences.
  • historical user data can be It includes data of users' operations such as browsing live broadcasts, browsing products, adding purchases, placing orders, sharing, and commenting.
  • historical user data may include operation time data of users corresponding to the operation data.
  • the historical live broadcast data may include multiple pieces. Taking historical user data of an e-commerce platform as an example, for example, the historical live broadcast data may include data such as the live broadcast brand, lottery, time, interaction, anchor, and merchandise.
  • the platform can obtain relevant data on a regular basis, and update and save the obtained relevant data to the database.
  • Step S302 constructing a user attribute feature of the live broadcast.
  • the information data of the historical user data is extracted therefrom, and based on the information data of the historical user data, the user attribute feature can be determined.
  • the historical user data may include data such as the user's age, gender, occupation, preference category, purchasing ability, geographic location, and consumption time.
  • the attribute characteristics may include user age characteristics, user gender characteristics, user occupation characteristics, user preference category characteristics, user purchasing ability characteristics, user geographic location characteristics, user consumption time characteristics, and the like.
  • step S303 a live broadcast feature of the live broadcast is constructed.
  • the live broadcast database constructed in step S301 information data of the historical live broadcast data is extracted therefrom, and based on the information data of the historical live broadcast data, the live broadcast feature can be determined.
  • the historical live broadcast data may include data such as the brand, lottery, time, interaction, anchor and product of the live broadcast, and the live broadcast characteristics determined based on the historical live broadcast data may include the live broadcast brand characteristics.
  • the characteristics of live broadcast brands include the characteristics of the number of live broadcast brands, the characteristics of the number of fans of the live broadcast brand, etc.; the characteristics of live broadcast lottery draws include the characteristics of whether the live broadcast draws lottery, the characteristics of the number of live broadcast lottery draws, etc.; the characteristics of live broadcast time include whether the live broadcast time is a weekend, the live broadcast time period, etc.
  • the characteristics of live broadcast anchors include the characteristics of the number of live broadcast anchors, the characteristics of the number of fans of the live broadcast anchor, the characteristics of the type of live broadcast anchors, the characteristics of the type of goods brought by the live broadcast anchor, etc.
  • the characteristics of live broadcast products include the characteristics of the number of live broadcast products, the characteristics of the average price of live broadcast products, and the characteristics of the type of live broadcast products. etc.; the additional features of live broadcast include whether there are star features in the live broadcast room, whether the live broadcast has continuous microphone features, etc.
  • the user attribute feature of the live broadcast obtained based on step S302 and the live broadcast feature of the live broadcast obtained in step S302 include discrete features and continuous features.
  • Discrete features for example, user's age feature, gender feature, geographical location feature, etc.
  • the Embedding embedding process can extract features from the original data, and perform dimension reduction processing through the principle of matrix multiplication. Continuous features are inherently continuous and therefore do not require processing.
  • XGBOOST input the processed continuous live broadcast user attribute features and live broadcast features into the XGBOOST model for training, and output the feature score of each feature.
  • the characteristics of the final live broadcast are used as the user attribute characteristics and live broadcast characteristics.
  • XGBOOST can alleviate errors caused by feature sparseness and correlation, effectively remove redundant features, and improve feature quality.
  • the feature crossover capability of the XGBOOST model itself can be used in python to use the feature importance function of the model for feature screening. Because the focus of the two indicators of the product browsing rate in the live broadcast room and the product order rate in the live broadcast room are different, the importance of the output features is also different for the model training of the two indicators.
  • Step S304 constructing the user behavior characteristics of the live broadcast.
  • the operation data of the historical user data is extracted therefrom, and based on the operation data of the historical user data, the user behavior characteristic can be determined.
  • the historical user data may include the user's data on operations such as browsing live broadcasts, browsing products, adding purchases, placing orders, and sharing
  • the user behavior characteristics determined based on the historical user data may include User browsing live broadcast features, user browsing product features, user add-on purchase features, user ordering features, user sharing features, etc.
  • Step S305 constructing a user behavior sequence of the live broadcast.
  • the operation time data corresponding to the operation data of the historical user data is extracted therefrom, and based on the operation time data corresponding to the operation data of the historical user data, the behavioral characteristics of each user can be determined. generation time.
  • the user behavior characteristics of the live broadcast constructed according to step S304 and the corresponding generation time input sequence generation model of each user behavior characteristic are trained, and the feature score of each user behavior characteristic is output after calculating the information gain, and the feature
  • the scores are normalized to obtain the weight of each user's behavioral feature, and the user's behavioral sequence can be obtained by summing the data of the user's behavioral feature and the weight of each user's behavioral feature.
  • the weight can represent the importance/degree of importance of different features
  • the sequence refers to a vector composed of the values of multiple features within a predetermined time period.
  • the sequence generation model may be a random forest model.
  • the user behavior characteristics of the live broadcast constructed according to step S304 include user browsing live broadcast characteristics, user browsing commodity characteristics, user add-on purchase characteristics, user ordering characteristics, and user sharing characteristics; the history is extracted from the live broadcast database constructed in step S301.
  • the operation time data corresponding to the operation data of the user data determines the generation time of each user behavior feature, that is, the generation time of the user’s live broadcast feature, the user’s product browsing feature, the user’s additional purchase feature, the user’s order feature, and the user’s sharing feature.
  • the feature of user browsing live broadcast is represented by A
  • the feature of user browsing products is represented by B
  • the feature of user add-on purchase is represented by C
  • the feature of user placing an order is represented by D
  • the feature of user sharing is represented by E.
  • a live broadcast database is constructed based on the historical data of the platform within the past one year, and historical user data and historical live broadcast data are obtained from the live broadcast database. Based on the acquired historical user data and historical live broadcast data, construct user behavior characteristics A, B, C, D, and E for the past year, and generate user behavior characteristics A, B, C, D, E and corresponding user behavior characteristics.
  • Time input random forest model for training calculate the information gain and output the feature scores of A, B, C, D, E, normalize the feature scores of A, B, C, D, E to obtain A, B,
  • the weights W A , W B , W C , W D , and W E of C, D, and E combine the data of A, B, C, D, and E with the weights W A , W of A, B, C, D, and E.
  • B , WC , WD , and WE perform weighted summation to obtain the user behavior sequence.
  • the data of A, B, C, D, and E may be the number of times users browse live broadcasts, the times users browse products, the times users add purchases, the times users place orders, and the times users share.
  • the weighted sum of the data of A, B, C, D, and E and the weights of A , B , C , D , and E is carried out to obtain the user behavior sequence, including: The weighted sum of the data of A, B, C, D, and E in January and the weights of A, B, C, D, and E, W A , W B , W C , W D , and W E , get the user in January Behavior scores; weighting the data of A, B, C, D, E from February to December with the weights of A , B , C , D , E, WA, WB, WC, WD, WE, respectively Sum up to get the behavior scores of users from February to December; combine the user behavior scores of 12 months to obtain the user behavior sequence.
  • the user behavior sequence contains
  • the product viewing rate in the live broadcast room and the product order rate in the live broadcast room focuses on the pageviews, and the live broadcast room product order rate focuses more on the order volume. Therefore, the model training for the two metrics, the feature importance of the output is also different.
  • the data of A, B, C, D, and E are input into the random forest model, and the output weights W A , W B , W C , W D , W E are [0.3, 0.2 , 0.2, 0.1, 0.2]; and for the product order rate indicator in the live broadcast room, the data of A, B, C, D, and E are input into the random forest model, and the output weights W A , W B , W C , W D , W E is [0.1, 0.2, 0.2, 0.3, 0.2].
  • the weights W A and W B of A and B are greater than the weight W D of D, indicating the importance of user browsing live broadcast characteristics and user browsing product characteristics in the product viewing rate index in the live broadcast room.
  • the weight of D W D accounts for a higher proportion of the live broadcast room product ordering rate, indicating that the user's ordering characteristics are under the live broadcast room products. The importance is higher in the single rate indicator.
  • Step S306 constructing a user dynamic feature of the live broadcast.
  • the user behavior sequence is input into the ARMA model, and the stationarity of the user behavior sequence is detected by the ADF test. Based on the results of the ADF test, it is judged whether the user behavior sequence is stationary, and if not, the difference processing (difference operation) is performed until the user behavior sequence is stationary. After confirming that the user behavior sequence is stable, calculate the autocorrelation coefficient a and partial autocorrelation coefficient b of the user behavior sequence, and identify the ARMA model according to the autocorrelation coefficient a (ACF) and the partial autocorrelation coefficient b (PACF).
  • ACF autocorrelation coefficient a
  • PPF partial autocorrelation coefficient b
  • identifying the ARMA model includes: if the autocorrelation coefficient a is tailing and the partial autocorrelation coefficient b is p-order truncation, then the ARMA model is an ARp model; if the autocorrelation coefficient a is q-order truncation, the partial autocorrelation coefficient b If the autocorrelation coefficient a is tailing and the partial autocorrelation coefficient b is also tailing, the ARMA model is the ARMAp,q model. Based on the determined ARMA model, the order p and q are determined in combination with AIC and BIC criteria. The order p and q represent the autocorrelation characteristics of the sequence itself, especially the periodic behavior. After the order p and q are determined, the model parameters of the ARMA model can be obtained, and the model parameters of the ARMA model are combined into a feature vector, which is the user dynamic feature.
  • User dynamic features are sequence features, which themselves are continuous.
  • deep learning models such as RNN, LSTM, and temporal convolutional network can be used to construct user dynamic features.
  • the ARMA model is used to abstract the user behavior sequence, and its model parameters are used to construct the user dynamic characteristics of the live broadcast.
  • Step S307 the click rate prediction model is trained.
  • a characteristic sample of the live broadcast click rate is constructed.
  • the feature samples of the live CTR are divided into training set and test set, and 80% of the feature samples are selected as the training set for training the model to obtain the trained CTR prediction model; the remaining 20% of the feature samples are used as the test set. Used to test the CTR prediction model after live training.
  • the training set feature samples of the live click rate are input into the XGBOOST model, the weak classifiers are trained in an iterative loop, the multiple weak classifiers are iteratively integrated into a combined classifier, and the trained click rate prediction model is obtained.
  • the click-through rate prediction model is used to predict the click-through rate of the live broadcast, wherein the click-through rate may include data such as regular clicks, browsing, favorites, add-ons, ordering, and sharing.
  • the XGBOOST model (eXtreme Gradient Boosting) is a boosting tree model, iterative training is performed according to the input training set feature samples, the weak classifiers of each iteration are learned step by step, and the coefficients of the samples in the training set are updated according to the coefficients of the weak classifiers. Weight; the residuals between the results of fitting previous weak classifiers and the training set samples, and iteratively integrates multiple weak classifiers into a strong classifier to obtain a prediction model.
  • XGBOOST has good learning performance.
  • XGBOOST model can consume less computing resources in a short time and obtain a prediction model with excellent performance. It combines the advantages of sexual awareness and cross-validation.
  • models such as LR, random forest, GBDT, BP neural network and other algorithms can be used for model training.
  • Step S308 verifying the click-through rate prediction model.
  • step S307 input the test set feature samples of the live broadcast click rate into the combined classifier obtained by training in step S307, that is, the trained click rate prediction model, output the user's live broadcast click rate, and calculate the click rate prediction model according to the test result.
  • Error judge whether the model error is higher than the error standard value, and modify the click-through rate prediction model according to the model error, so that the click-through rate prediction model meets the requirements.
  • the prediction accuracy of the prediction model is greatly improved, and the click-through rate of the live broadcast can be accurately predicted.
  • Step S309 the model is used.
  • the obtained target user data and a plurality of live broadcast data to be pushed are input into the click-through rate prediction model, and output.
  • the click-through rate of the target live stream data According to the click rate, the live broadcast data can be pushed to the target user; or, according to the click rate, the inventory corresponding to the target live broadcast data can be determined, and inventory management can be carried out according to the inventory amount, for example, direct allocation of inventory from the origin, allocation from the supply warehouse Inventory to demand warehouse, etc.
  • the click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering. If the click-through rate of an order for a certain product is high, the warehouse will increase the inventory of the product; if a user has a high click-through rate to browse, favorite or add a certain product, it will recommend the live broadcast of the product or similar products to the user. 's live broadcast.
  • constructing a live broadcast database by constructing a live broadcast user attribute feature; constructing a live broadcast feature; constructing a live broadcast user behavior feature; constructing a live broadcast user behavior sequence; constructing a live broadcast user dynamic feature; click-through rate prediction Model training; verification of the click-through rate prediction model; model use and other steps can adapt to the cyclical changes in user behavior, optimize the performance of the live broadcast data prediction model, make full use of live broadcast resources, accurately predict the live broadcast click rate, and can accurately push the live broadcast to users and reasonably Manage inventory.
  • FIG. 4 is a schematic diagram of the main modules of the apparatus for determining the click-through rate of live broadcast according to an embodiment of the present disclosure.
  • the apparatus 400 for determining the click-through rate of live broadcast of the present disclosure includes:
  • the obtaining module 401 is configured to obtain multiple historical user data and multiple historical live broadcast data.
  • the obtaining module 401 obtains a plurality of historical user data and a plurality of historical live broadcast data based on the historical data of the platform. It can include data of users' browsing, comments, favorites, add-ons, ordering, sharing, etc., as well as the time when the user's operation data is generated; historical live broadcast data can include live broadcast brand, lottery, time, interaction, anchor and commodity and other information data.
  • the sequence generation module 402 is configured to determine the user behavior sequence corresponding to the multiple historical user data according to the sequence generation model, the multiple historical user data, and the generation time of the multiple historical user data.
  • the user behavior characteristics are determined based on the user operation data therein, and the generation time corresponding to the user behavior characteristics is determined based on the time when the user operation data is generated, and the sequence is generated.
  • the module 402 generates a model by inputting the user behavior feature and the generation time sequence corresponding to the user behavior feature, and outputs a feature score of each user behavior feature.
  • the sequence generation module 402 normalizes the feature scores of the user behavioral features to obtain a weight value corresponding to each user behavioral feature; based on the data of the user behavioral feature and the weighted value corresponding to the user behavioral feature, the weighted sum is generated.
  • User behavior sequence contains 12 elements, that is, the 12-month user behavior score, which characterizes the user's behavior in the past 1 year.
  • the feature generation module 403 is configured to determine user attribute features according to the multiple historical user data, and determine the live broadcast feature according to the multiple historical live broadcast data.
  • the feature generating module 403 determines the user attribute feature based on the user information data therein, and determines the live broadcast feature based on the plurality of historical live broadcast data therein.
  • the model training module 404 is configured to train the click rate prediction model according to the user behavior sequence, the user attribute feature and the live broadcast feature.
  • the model training module 404 inputs the user behavior sequence into the ARMA model, trains the ARMA model, and outputs the ARMA model parameters as user dynamic features.
  • the model training module 404 inputs the user dynamic features, the user attribute features and the live broadcast features obtained by the feature generation module 403 into the click-through rate prediction model, trains the click-through rate prediction model, and outputs the trained click-through rate prediction model.
  • the CTR prediction model is the XGBOOST model.
  • the data processing module 405 is configured to determine the click rate of the target user on the target live broadcast data according to the trained click rate prediction model.
  • the target user data and a plurality of live broadcast data to be pushed are obtained, and the data processing module 405 determines the click rate of the target user on the target live broadcast data according to the target user data and the trained click rate prediction model.
  • the click-through rate may include data of regular clicks, browsing, favorites, add-ons, and ordering.
  • the click rate push live broadcast data for the target user; or, according to the click rate, determine the inventory corresponding to the target live broadcast data, and carry out inventory management according to the inventory amount to timely increase, allocate supply warehouses or support demand warehouses.
  • modules such as an acquisition module, a sequence generation module, a feature generation module, a model training module, and a data processing module can adapt to periodic changes in user behavior, optimize the performance of the live broadcast data prediction model, and make full use of live broadcast resources. , which can accurately predict the click-through rate of live broadcasts, accurately push live broadcasts to users and manage inventory reasonably.
  • FIG. 5 is a schematic structural diagram of a computer system suitable for implementing the terminal device according to the embodiment of the present disclosure.
  • the computer system 500 of the terminal device according to the embodiment of the present disclosure includes:
  • a central processing unit (CPU) 501 can execute various appropriate actions and processes according to a program stored in a read only memory (ROM) 502 or a program loaded into a random access memory (RAM) 503 from a storage section 508 .
  • ROM read only memory
  • RAM random access memory
  • various programs and data necessary for the operation of the system 500 are also stored.
  • the CPU 501 , the ROM 502 , and the RAM 503 are connected to each other through a bus 504 .
  • An input/output (I/O) interface 505 is also connected to bus 504 .
  • the following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, etc.; an output section 507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 508 including a hard disk, etc. ; and a communication section 509 including a network interface card such as a LAN card, a modem, and the like. The communication section 509 performs communication processing via a network such as the Internet.
  • a drive 510 is also connected to the I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage section 508 as needed.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication portion 509 and/or installed from the removable medium 511 .
  • CPU central processing unit
  • the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • the modules involved in the embodiments of the present disclosure may be implemented in software or hardware.
  • the described modules can also be set in the processor, for example, it can be described as: a processor includes an acquisition module, a sequence generation module, a feature generation module, a model training module, and a data processing module. Wherein, the names of these modules do not constitute a limitation on the module itself under certain circumstances.
  • the acquisition module can also be described as "a module for acquiring live broadcast data from a live broadcast platform".
  • the present disclosure also provides a computer-readable medium.
  • the computer-readable medium may be included in the device described in the above-mentioned embodiments, or it may exist alone without being assembled into the device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device includes: acquiring a plurality of historical user data and a plurality of historical live broadcast data; generating a model according to the sequence, The plurality of historical user data and the generation time of the plurality of historical user data determine the user behavior sequence corresponding to the plurality of historical user data; determine user attribute characteristics according to the plurality of historical user data, and A plurality of historical live broadcast data determine the live broadcast characteristics; according to the user behavior sequence, the user attribute characteristics and the live broadcast characteristics, the click-through rate prediction model is trained; according to the trained click-through rate prediction model, the target user's information about the target live broadcast data is determined. CTR.
  • the prediction accuracy of the click-through rate of the live broadcast can be improved, so that the push accuracy can be improved, and the possibility of insufficient stock or slow sales can be reduced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente divulgation se rapporte au domaine technique des ordinateurs et concerne un procédé et un dispositif permettant de déterminer un taux de clic de diffusion en direct. Un mode de réalisation spécifique du procédé consiste à : acquérir une pluralité d'éléments de données utilisateur historiques et une pluralité d'éléments de données de diffusion en direct historiques ; selon un modèle de génération de séquence, la pluralité d'éléments de données utilisateur historiques et le temps de génération de la pluralité d'éléments de données utilisateur historiques, déterminer une séquence de comportement utilisateur correspondant à la pluralité d'éléments de données utilisateur historiques ; déterminer les caractéristiques d'attributs utilisateur en fonction de la pluralité d'éléments de données utilisateur historiques, puis déterminer les caractéristiques de diffusion en direct selon la pluralité d'éléments de données de diffusion en direct historiques ; en fonction de la séquence de comportement utilisateur, des caractéristiques d'attributs utilisateur et des caractéristiques de diffusion en direct, apprendre un modèle de prédiction de taux de clic ; et en fonction du modèle de prédiction de taux de clics appris, déterminer un taux de clic d'un utilisateur cible concernant des données de diffusion en direct cibles. Le mode de réalisation permet d'améliorer la précision de prédiction d'un taux de clic de diffusion en direct, ce qui permet d'améliorer la précision de transfert et de réduire la possibilité de phénomènes tels qu'un stock insuffisant ou un stock invendable.
PCT/CN2022/071797 2021-01-21 2022-01-13 Procédé et dispositif de détermination d'un taux de clic de diffusion en direct WO2022156589A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110081300.7 2021-01-21
CN202110081300.7A CN113778979A (zh) 2021-01-21 2021-01-21 一种直播点击率的确定方法和装置

Publications (1)

Publication Number Publication Date
WO2022156589A1 true WO2022156589A1 (fr) 2022-07-28

Family

ID=78835536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071797 WO2022156589A1 (fr) 2021-01-21 2022-01-13 Procédé et dispositif de détermination d'un taux de clic de diffusion en direct

Country Status (2)

Country Link
CN (1) CN113778979A (fr)
WO (1) WO2022156589A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117579872A (zh) * 2024-01-15 2024-02-20 北京永泰万德信息工程技术有限公司 一种直播显示屏的直播推送方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778979A (zh) * 2021-01-21 2021-12-10 北京沃东天骏信息技术有限公司 一种直播点击率的确定方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992710A (zh) * 2019-02-13 2019-07-09 网易传媒科技(北京)有限公司 点击率预估方法、系统、介质和计算设备
CN110929206A (zh) * 2019-11-20 2020-03-27 腾讯科技(深圳)有限公司 点击率预估方法、装置、计算机可读存储介质和设备
CN111046294A (zh) * 2019-12-27 2020-04-21 支付宝(杭州)信息技术有限公司 点击率预测方法、推荐方法、模型、装置及设备
US20200285937A1 (en) * 2017-10-11 2020-09-10 Beijing Sankuai Online Technology Co., Ltd Consumption capacity prediction
CN113778979A (zh) * 2021-01-21 2021-12-10 北京沃东天骏信息技术有限公司 一种直播点击率的确定方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200285937A1 (en) * 2017-10-11 2020-09-10 Beijing Sankuai Online Technology Co., Ltd Consumption capacity prediction
CN109992710A (zh) * 2019-02-13 2019-07-09 网易传媒科技(北京)有限公司 点击率预估方法、系统、介质和计算设备
CN110929206A (zh) * 2019-11-20 2020-03-27 腾讯科技(深圳)有限公司 点击率预估方法、装置、计算机可读存储介质和设备
CN111046294A (zh) * 2019-12-27 2020-04-21 支付宝(杭州)信息技术有限公司 点击率预测方法、推荐方法、模型、装置及设备
CN113778979A (zh) * 2021-01-21 2021-12-10 北京沃东天骏信息技术有限公司 一种直播点击率的确定方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117579872A (zh) * 2024-01-15 2024-02-20 北京永泰万德信息工程技术有限公司 一种直播显示屏的直播推送方法及系统
CN117579872B (zh) * 2024-01-15 2024-04-30 北京永泰万德信息工程技术有限公司 一种直播显示屏的直播推送方法及系统

Also Published As

Publication number Publication date
CN113778979A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2017035970A1 (fr) Procédé et appareil de distribution sélective d'informations
US10776816B2 (en) System and method for building a targeted audience for an online advertising campaign
US11127032B2 (en) Optimizing and predicting campaign attributes
US20140358694A1 (en) Social media pricing engine
CN111095330B (zh) 用于预测在线用户交互的机器学习方法和系统
WO2022156589A1 (fr) Procédé et dispositif de détermination d'un taux de clic de diffusion en direct
CN110866040B (zh) 用户画像生成方法、装置和系统
US20210192549A1 (en) Generating analytics tools using a personalized market share
CN112598472A (zh) 产品推荐方法、装置、系统、介质和程序产品
CN109978594B (zh) 订单处理方法、装置及介质
CN110197317B (zh) 目标用户确定方法及装置、电子设备及存储介质
CN112150184A (zh) 点击率预估方法及其系统、计算机系统及计算机可读介质
CN110555747A (zh) 确定目标用户的方法和装置
CN110490682B (zh) 分析商品属性的方法和装置
CN109299351B (zh) 内容推荐方法和装置、电子设备及计算机可读介质
CN113495991A (zh) 一种推荐方法和装置
CN110796505B (zh) 一种业务对象推荐方法以及装置
US20120265588A1 (en) System and method for recommending new connections in an advertising exchange
US20150339693A1 (en) Determination of initial value for automated delivery of news items
CN107357847B (zh) 数据处理方法及其装置
CN114549125A (zh) 物品推荐方法及装置、电子设备和计算机可读存储介质
CN111506643B (zh) 用于生成信息的方法、装置和系统
CN113159877A (zh) 数据处理方法、装置、系统、计算机可读存储介质
CN113792952A (zh) 用于生成模型的方法和装置
CN113822734A (zh) 用于生成信息的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22742068

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22742068

Country of ref document: EP

Kind code of ref document: A1