CN114463052A - User attention index generation method, device, equipment and storage medium - Google Patents

User attention index generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN114463052A
CN114463052A CN202210058505.8A CN202210058505A CN114463052A CN 114463052 A CN114463052 A CN 114463052A CN 202210058505 A CN202210058505 A CN 202210058505A CN 114463052 A CN114463052 A CN 114463052A
Authority
CN
China
Prior art keywords
user attention
user
sample data
feature classification
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210058505.8A
Other languages
Chinese (zh)
Inventor
李想
胡勇
甘孟壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chezhi Interconnection Beijing Technology Co ltd
Original Assignee
Chezhi Interconnection Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chezhi Interconnection Beijing Technology Co ltd filed Critical Chezhi Interconnection Beijing Technology Co ltd
Priority to CN202210058505.8A priority Critical patent/CN114463052A/en
Publication of CN114463052A publication Critical patent/CN114463052A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user attention index generation method, which comprises the following steps: carrying out feature classification on the user attention data to obtain user attention sample data of the selected feature classification in a set time interval; according to the user attention sample data of the selected feature classification, performing model training on the user attention data of the selected feature classification to obtain the feature weight of each selected feature classification; and calculating the user attention degree score of each user according to the feature weight of each selected feature classification, and screening the confidence interval of the user attention degree according to the user attention degree score. The method and the device have the advantages that a large number of user feature classifications and corresponding user attention sample data are used for training, the weights of all the feature classifications are obtained, the feature weights are more practical, the main features of users with low attention and users with loss are analyzed by using the attention scores, and a service intervention strategy is made in a targeted mode.

Description

User attention index generation method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of data analysis, in particular to a method and a device for calculating a user attention index by using a LightGBM tree model, electronic equipment and a storage medium.
Background
In the field of internet automobile sales, the user attention index is a measuring index for showing the attention degree of a user to an automobile product, the fire heat degree and the quality of different products can be visually measured through the user attention index, effective guide data can be provided for operation and popularization of the product, the user attention index is mainly used for comparing the market acceptance of the product and competitive products among different automobile systems, carrying out strength analysis on the competitive relationship, and helping enterprises to make a targeted popularization strategy.
In the prior art, the calculation of the user attention index mainly has the defects that the feature selection depends on artificial rules, so that the feature selection is inaccurate, the weight calculation of the features is unreasonable, a user with low attention cannot be found, factors causing low attention cannot be analyzed, and the accuracy of the calculation of the user attention index is influenced.
Therefore, an effective method for generating a user attention index is needed to solve the problems in the prior art.
Disclosure of Invention
To this end, the present invention provides a user attention index generation method, apparatus, electronic device and storage medium in an effort to solve or at least alleviate at least one of the problems presented above.
According to an aspect of the present invention, a method for generating a user attention index is provided, the method performs feature analysis on user attention data by using a LightGBM tree model, calculates weights of all features, performs weighting by using the feature weights and the user attention data corresponding to the feature weights, and obtains a user attention score and a ranking of a user, the method includes the steps of: carrying out feature classification on the user attention data, and acquiring user attention sample data of the selected feature classification in a set time interval, wherein the sample data comprises positive sample data and negative sample data; according to the user attention sample data of the selected feature classification, performing model training on the user attention data of the selected feature classification to obtain the feature weight of each selected feature classification; and calculating the user attention degree score of the user attention degree sample data of each user according to the feature weight of each selected feature classification, and screening the confidence interval of the user attention degree according to the user attention degree score.
Optionally, in the method for generating a user attention index according to the present invention, the step of performing feature classification on the user attention data to obtain user attention sample data of a selected feature classification in a set time interval includes: acquiring user attention data in a set time interval, wherein the user attention data comprises access times, access duration, search times and comparison times of user attention related information; according to the user attention data in the set time interval, carrying out feature classification on the user attention data, wherein the feature classification comprises classification according to the user access corresponding database items; according to the feature classification of the user attention data, one or more selected user attention feature classifications are obtained; and acquiring user attention sample data in a database corresponding to the selected feature classification within a set time interval according to the one or more user attention feature classifications.
Optionally, in the method for generating a user attention index according to the present invention, the step of obtaining user attention sample data in a database corresponding to the selected feature classification within a set time interval includes: acquiring user attention sample data of the selected feature classification in a set time interval; setting a sample label of the user attention sample data; acquiring the search times and the corresponding search time in the user attention sample data of the user according to the access times and the corresponding access time in the user attention sample data; setting a time threshold, and judging the minimum time interval between access time and search time in user attention sample data of a certain user; if the minimum time interval between the access time and the search time is less than a set time threshold, setting sample data of the user as positive sample data; and if the minimum time interval between the access time and the search time is greater than a set time threshold, setting the sample data of the user as negative sample data.
Optionally, in the method for generating a user attention index according to the present invention, the step of performing model training on the user attention data of the selected feature classification according to the user attention sample data of the selected feature classification to obtain the feature weight of each selected feature classification includes: according to the user attention sample data of the selected feature classification, designating a continuous feature classification and a discrete feature classification in the selected feature classification, and configuring a label type of the user attention sample data of the selected feature classification, wherein the label type comprises a positive sample label and a negative sample label, the positive sample label comprises positive sample data, and the negative sample label comprises negative sample data; training user attention sample data of the selected feature classification by using a LightGBM tree model; observing the training LOSS LOSS of the LightGBM tree model, and verifying the model evaluation index AUC of a verification set; storing the LightGBM tree model result after training, and outputting the importance index of the selected feature classification; and normalizing the importance indexes of all the selected feature classifications to obtain the feature weight of each selected feature classification.
Optionally, in the method for generating a user attention index according to the present invention, the step of training the user attention sample data of the selected feature classification using the LightGBM tree model includes: configuring lightGBM model parameters, wherein the lightGBM model parameters comprise the learning rate, the iteration times, the regular coefficient and the sampling rate of a lightGBM model; inputting user attention sample data of the selected feature classification according to the configured LightGBM model parameters; and training the LightGBM tree model on the user attention sample data of the selected feature classification according to the continuous feature classification and the discrete feature classification in the specified selected feature classification and the label type of the user attention sample data of the selected feature classification.
Optionally, in the method for generating a user attention index according to the present invention, the step of calculating a user attention score of user attention sample data of each user according to the feature weight of each selected feature classification, and screening a confidence interval of the user attention according to the user attention score includes: acquiring user attention sample data of the selected feature classification of each user and corresponding feature weight of the selected feature classification; carrying out linear weighting on the user attention sample data and the feature weight of the selected feature classification of each user to obtain a user attention score of the user attention sample data of each user; calculating the standard deviation and the variance of the user attention degree scores of the user attention degree sample data of each user according to the user attention degree scores of the user attention degree sample data of each user; and setting a minimum variance threshold according to the standard deviation and the variance of the user attention score of the user attention sample data of each user, and deleting the user attention sample data of the selected feature classification of the user when the variance of the user attention score of a certain user is greater than the set minimum variance threshold.
Optionally, in the method for generating a user attention index according to the present invention, the method further includes: obtaining the characteristic weight of the selected characteristic classification of the lost user according to the user attention degree score of the user attention degree sample data of each user; and according to the feature weight of the selected feature classification of the lost user, intervening user attention sample data of the feature classification with high feature weight in the feature weight of the selected feature classification of the lost user.
According to another aspect of the present invention, a user attention index generating apparatus is disclosed, which performs feature analysis on user attention data using a LightGBM tree model, calculates weights of all features, performs weighting using the feature weights and the user attention data corresponding to the feature weights, and obtains a user attention score and a ranking of a user, the apparatus comprising:
the sample data acquisition module is used for carrying out feature classification on the user attention data and acquiring user attention sample data of the selected feature classification in a set time interval, wherein the sample data comprises positive sample data and negative sample data;
the characteristic weight calculation module is used for carrying out model training on the user attention data of the selected characteristic classification according to the user attention sample data of the selected characteristic classification to obtain the characteristic weight of each selected characteristic classification;
and the user attention calculation module is used for calculating the user attention score of the user attention sample data of each user according to the feature weight of each selected feature classification, and screening the confidence interval of the user attention according to the user attention score.
According to yet another aspect of the present invention, there is provided a computing device comprising: one or more processors; and a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the user attention index generation methods described above.
According to yet another aspect of the present invention, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the user attention index generation methods described above.
According to the user attention index generation scheme, the user attention sample data of the selected feature classification in the set time interval is obtained by carrying out feature classification on the user attention data; according to the user attention sample data of the selected feature classification, performing model training on the user attention data of the selected feature classification to obtain the feature weight of each selected feature classification; and calculating the user attention degree score of the user attention degree sample data of each user according to the feature weight of each selected feature classification, and screening the confidence interval of the user attention degree according to the user attention degree score. The invention can obtain the weight of all the feature classifications by using a large number of user feature classifications and corresponding user attention sample data for training, so that the weight of the feature classifications is more practical, and the main features of users with low attention and users with loss can be analyzed by using the attention scores, so as to make a service intervention strategy in a targeted manner.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a schematic diagram of a configuration of a computing device 100 according to one embodiment of the invention; and
FIG. 2 illustrates a flow diagram of a user attention index generation method 200 according to one embodiment of the invention; and
fig. 3 is a schematic structural diagram of a user attention index generation apparatus 300 according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In real-world applications, the calculation of the user attention index generally involves two aspects: selection of computational dimensions and calculation of attention scores. The calculation dimension generally includes various features such as page access times, number of people leaving a message, attention to a car series, comments, postings, public praise, messages leaving, praise and the like, and the selection of the features is performed in a manual specified mode by combining historical data on the basis of business experience and adopting a similar unsupervised mode. In the calculation of the attention score, the weight of the feature is generally specified in an expert experience mode according to analysis of historical user feedback data, then the data of users with different dimensionalities and the corresponding weight are calculated through a certain weighting method, the requirement on business experience is generally high in the whole process, and the accuracy of the attention score also depends on the accuracy of manual selection to a great extent.
In the aspect of application of the attention degree scores, the popularity of products and the competitive effect with other competitive products can be intuitively reflected, for an evaluation method using a series of general features, a general reason can be found for the products with low attention degree by analyzing which feature scores are low, and the method has certain help for business, but simultaneously has two problems: 1. the characteristic range with low attention can be found, but the influence degree cannot be quantized; 2. although the negatively affected features are located, no targeted instructive approach to subsequently raising the focus has been proposed.
Therefore, the invention aims to calculate the importance of different characteristics of users and calculate the attention degree scores of the users through the learning algorithm, the whole process output of data does not need manual participation, the behavior characteristics of specific groups such as lost users and the like can be automatically analyzed, the interdynamic content of the service can be selected as the characteristics, the influence scores of the characteristics on the loss of the users are quantized, and the product operation and popularization strategy adjustment on the service can be directly guided.
Fig. 1 is a block diagram of an example computing device 100. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. In some embodiments, the computing device 100 is configured to execute a user attention index generation method 200, where the method 200 is capable of performing feature analysis on user attention data using a LightGBM tree model, calculating weights of all features, performing weighting by using the feature weights and the user attention data corresponding to the feature weights, and obtaining a user attention score and a ranking of a user, and the program data 124 includes instructions for executing the method 200.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164. In this embodiment, the sample data acquisition module may perform feature classification on the user attention data, and acquire user attention sample data of a selected feature classification within a set time interval.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media. In some embodiments, one or more programs are stored in a computer-readable medium, including instructions for performing certain methods, such as the user attention index generation method 200 performed by the computing device 100 according to embodiments of the present invention.
Computing device 100 may be implemented as part of a small-form factor portable (or mobile) electronic device such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations.
FIG. 2 shows a flow diagram of a user attention index generation method 200 according to one embodiment of the invention. As shown in fig. 2, in the method 200, a LightGBM tree model is used to perform feature analysis on user attention data, calculate weights of all features, and perform weighting by using the user attention data corresponding to the feature weights and the feature weights to obtain a user attention score and a ranking of a user, in which, the method 200 starts with step S210, performs feature classification on the user attention data, and obtains user attention sample data of a selected feature classification in a set time interval, where the sample data includes positive sample data and negative sample data.
Specifically, for example, in the field of online automobile sales, a user browses automobile web pages, the network background server can record the access times of the user to a product library, the access times of an automobile community, the access times of a head page of an automobile system, the access times of information videos and the like in real time, the access data of the user can be classified according to the user access data recorded by the background server, and automobile information which is frequently browsed, searched and concerned by the user can be obtained according to the classification condition of the access data, particularly, the items browsed by the user and the browsing duration, the items searched by the user and the search times and the like in items provided by each automobile web page, such as the access times of the automobile product library, the access times of the automobile community, the access times of the head page of the automobile system, the access times of the information videos, the access times of automobile configuration, the access times of automobile quotations, the access times of the automobile product library, the search times of the automobile product library, the automobile information videos and the like, Number of visits by the dealer, etc. The user access information can be used as sample data of the user attention, and training of a corresponding model is carried out, so that the user attention is obtained.
Specifically, in an embodiment of the present application, the step of performing feature classification on the user attention data to obtain user attention sample data of a selected feature classification within a set time interval includes:
acquiring user attention data in a set time interval, wherein the user attention data comprises access times, access duration, search times and comparison times of user attention related information; specifically, in the embodiment of the application, a typical time interval, such as one month or one quarter, is generally selected for the user attention data, the user attention condition in one time interval can be reflected more truly by acquiring the user attention data in one time interval, and the change of the user attention and the change trend of people to the preference of the vehicle can be explored by analyzing the user attention condition in a plurality of time intervals in one period.
According to the user attention data in the set time interval, carrying out feature classification on the user attention data, wherein the feature classification comprises classification according to the user access corresponding database items; specifically, the feature classification is to classify the user attention data according to the types of the web page information pages acquired by the user, for example, if the user visits or searches the offer page, the user attention data on the offer page may be classified as one feature, and if the user visits or searches the vehicle type page, the user attention data on the vehicle type page may be classified as one feature, and the feature classification of the user attention data may be selected from a plurality of feature classifications, however, if too many choices are selected, a lot of invalid data may be caused, the calculation amount is complicated, if too few choices are selected, the real user attention condition cannot be fully reflected, the real condition that the user is interested in the vehicle cannot be analyzed, and a trend of changing interest in the car, and therefore, a suitable number of feature classifications of the user attention data need to be selected.
According to the feature classification of the user attention data, one or more selected user attention feature classifications are obtained; specifically, since the number of pages of the vehicle transaction website is thousands of pages, and the more subdivided the feature classifications, the greater the number of the page classifications, the thousands of feature classifications of the user attention data stored in the database may also be present, and if not summarized or selected, the unscientific selection of the user feature classifications may be caused, in the embodiment of the present application, the number of the selected feature classifications is generally between 30 and 50.
And acquiring user attention sample data in a database corresponding to the selected feature classification within a set time interval according to the one or more user attention feature classifications. Specifically, after a feature classification is selected, user attention data under a certain feature classification can be obtained through a database, the user attention data are all user attention data related to user access, browsing and searching of the feature classification, in order to improve the pertinence of the selected data, the user attention data in a time interval is generally selected as user attention sample data, the time interval can be one day, and a time period of the feature classification concerned by a user under the certain feature classification can be obtained through data analysis of one day; the date characteristic curve of a certain characteristic classification, which is concerned by the user, can be obtained through data analysis of one month; the month characteristic curve of the characteristic classification concerned by the user under a certain characteristic classification can be obtained through data analysis of one year; through the division of the time zone, the change condition, the change rule and the change trend of the user attention data in a zone range can be effectively analyzed.
Specifically, in an embodiment of the present application, the step of obtaining user attention sample data in a database corresponding to the selected feature classification within a set time interval includes:
acquiring user attention sample data of the selected feature classification in a set time interval;
setting a sample label of the user attention sample data;
acquiring the search times and the corresponding search time in the user attention sample data of the user according to the access times and the corresponding access time in the user attention sample data;
setting a time threshold, and judging the minimum time interval between access time and search time in user attention sample data of a certain user;
if the minimum time interval between the access time and the search time is less than a set time threshold, setting sample data of the user as positive sample data;
and if the minimum time interval between the access time and the search time is greater than a set time threshold, setting the sample data of the user as negative sample data.
Specifically, in an embodiment of the present application, the selected user attention sample data may be as shown in table 1 below:
table 1:
Figure BDA0003467479180000101
Figure BDA0003467479180000111
as can be seen from the table, the feature classifications selected in the present application include 44 items, the set time zone is 30 days, and user attention sample data of 44 feature classifications in 30 consecutive days is obtained.
Through step S220, model training is performed on the user attention data of the selected feature classification according to the user attention sample data of the selected feature classification, and a feature weight of each selected feature classification is obtained.
Specifically, after user attention sample data of selected feature classifications in a set time zone is obtained, model training is performed on the user attention sample data, so that the feature weight of each feature classification can be obtained, and the feature weight can be applied to attention index calculation of different vehicle systems.
Specifically, in an embodiment of the present application, the step of performing model training on the user attention data of the selected feature classification according to the user attention sample data of the selected feature classification, and obtaining the feature weight of each selected feature classification includes:
according to the user attention sample data of the selected feature classification, designating continuous feature classification and discrete feature classification in the selected feature classification, and configuring the label type of the user attention sample data of the selected feature classification, wherein the label type comprises a positive sample label and a negative sample label, the positive sample label comprises positive sample data, and the negative sample label comprises negative sample data;
training user attention sample data of the selected feature classification by using a LightGBM tree model;
observing the training LOSS LOSS of the LightGBM tree model, and verifying the model evaluation index AUC of a verification set;
storing the LightGBM tree model result after training, and outputting the importance index of the selected feature classification;
and normalizing the importance indexes of all the selected feature classifications to obtain the feature weight of each selected feature classification.
Specifically, in an embodiment of the present application, the step of training the user attention sample data of the selected feature classification by using the LightGBM tree model includes:
configuring lightGBM model parameters, wherein the lightGBM model parameters comprise the learning rate, the iteration times, the regular coefficient and the sampling rate of a lightGBM model;
inputting user attention sample data of the selected feature classification according to the configured LightGBM model parameters;
and training the LightGBM tree model on the user attention sample data of the selected feature classification according to the continuous feature classification and the discrete feature classification in the specified selected feature classification and the label type of the user attention sample data of the selected feature classification.
Specifically, the result of calculating the feature weight of the feature classification in table 1 through the above steps is shown in table 2 below.
Table 2:
Figure BDA0003467479180000131
Figure BDA0003467479180000141
through step S230, calculating a user attention score of the user attention sample data of each user according to the feature weight of each selected feature classification, and performing confidence interval screening on the user attention according to the user attention score.
Specifically, after the feature weight of each feature classification is obtained, the user attention score may be calculated according to the statistical count of each feature classification, and the user attention score is obtained by performing linear weighting calculation on the feature weight and the statistical count of each feature classification.
Specifically, in an embodiment of the present application, the step of calculating a user attention score of user attention sample data of each user according to the feature weight of each selected feature classification, and performing confidence interval screening on the user attention according to the user attention score includes:
acquiring user attention sample data of the selected feature classification of each user and corresponding feature weight of the selected feature classification;
carrying out linear weighting on the user attention sample data and the feature weight of the selected feature classification of each user to obtain a user attention score of the user attention sample data of each user;
calculating the standard deviation and the variance of the user attention degree scores of the user attention degree sample data of each user according to the user attention degree scores of the user attention degree sample data of each user;
and setting a minimum variance threshold according to the standard deviation and the variance of the user attention score of the user attention sample data of each user, and deleting the user attention sample data of the selected feature classification of the user when the variance of the user attention score of a certain user is greater than the set minimum variance threshold.
Specifically, it is assumed that the feature weight of each feature classification is expressed as: beta is akThe user attention sample data is expressed as: xKThen, the user attention score y of the user is:
y=β01x12x2…βkxk
meanwhile, in actual application, as a lot of data is unreliable, for example, the user behaviors such as too short access time of a user in a certain feature classification, too many access times, short total website access time and the like exist, and the excessive or too few user behaviors cannot well feed back the real intention of the user even if the user behaviors are real, and the behaviors are rejected. Therefore, the standard deviation and the variance of the user attention score of the user attention sample data of each user are calculated according to the user attention score of the user attention sample data of each user; then, a minimum variance threshold value is set, and when the variance of the user attention degree score of a certain user is larger than the set minimum variance threshold value, user attention degree sample data of the selected feature classification of the user is deleted.
Specifically, in an embodiment of the present application, the user attention index generation method further includes:
obtaining the characteristic weight of the selected characteristic classification of the lost user according to the user attention degree score of the user attention degree sample data of each user;
and intervening user attention sample data of the feature classification with high feature weight in the feature weights of the selected feature classifications of the lost users according to the feature weights of the selected feature classifications of the lost users.
Specifically, according to the method, the user attention scores of the company product and the competitor product, the user attention scores of each automobile product and each user, and the attention scores of the user on each competitor product can be obtained, so that the attention proportion of the user on the company product and the competitor product can be obtained, the user attention of the feature classification with high feature weight can be pertinently improved by analyzing the feature weight and the attention scores of the user on the company product and the competitor product, the user attention of the feature classification with high feature weight in the competitor product can be reduced, and the user attention of the user on the company product can be improved. For example, the goal of improving the attention of users has been reached by delivering advertisements targeted for promotional purposes.
Meanwhile, by the method, lost users can be analyzed to judge the main characteristic classification causing the user loss, and the loss of the users is reduced by improving the user attention of the characteristic classification.
The user attention index generation method carries out feature classification on user attention data, and obtains user attention sample data of the selected feature classification in a set time interval; according to the user attention sample data of the selected feature classification, performing model training on the user attention data of the selected feature classification to obtain the feature weight of each selected feature classification; and calculating the user attention degree score of the user attention degree sample data of each user according to the feature weight of each selected feature classification, and screening the confidence interval of the user attention degree according to the user attention degree score. The invention can obtain the weight of all the feature classifications by using a large number of user feature classifications and corresponding user attention sample data for training, so that the weight of the feature classifications is more practical, and the main features of users with low attention and users with loss can be analyzed by using the attention scores, so as to make a service intervention strategy in a targeted manner.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 3, there is provided a user attention index generating apparatus 300, the apparatus 300 comprising: the system comprises a sample data acquisition module, a characteristic weight calculation module and a user attention calculation module.
The sample data acquisition module is used for carrying out feature classification on the user attention data and acquiring user attention sample data of the selected feature classification in a set time interval, wherein the sample data comprises positive sample data and negative sample data;
the characteristic weight calculation module is used for carrying out model training on the user attention data of the selected characteristic classification according to the user attention sample data of the selected characteristic classification to obtain the characteristic weight of each selected characteristic classification;
and the user attention calculation module is used for calculating the user attention score of the user attention sample data of each user according to the feature weight of each selected feature classification, and screening the confidence interval of the user attention according to the user attention score.
Specifically, in another embodiment of the present application, the sample data acquiring module is configured to acquire user attention data within a set time interval, where the user attention data includes access times, access duration, search times, and comparison times of user attention related information; according to the user attention data in the set time interval, carrying out feature classification on the user attention data, wherein the feature classification comprises classification according to the user access corresponding database items; according to the feature classification of the user attention data, one or more selected user attention feature classifications are obtained; and acquiring user attention sample data in a database corresponding to the selected feature classification within a set time interval according to the one or more user attention feature classifications.
Specifically, in another embodiment of the present application, the sample data acquiring module is configured to acquire user attention sample data of a selected feature classification within a set time interval; setting a sample label of the user attention sample data; acquiring the search times and the corresponding search time in the user attention sample data of the user according to the access times and the corresponding access time in the user attention sample data; setting a time threshold, and judging the minimum time interval between access time and search time in user attention sample data of a certain user; if the minimum time interval between the access time and the search time is less than a set time threshold, setting sample data of the user as positive sample data; and if the minimum time interval between the access time and the search time is greater than a set time threshold, setting the sample data of the user as negative sample data.
Specifically, in another embodiment of the present application, the feature weight calculation module is configured to specify a continuous feature classification and a discrete feature classification in the selected feature classification according to the user attention sample data of the selected feature classification, and configure a tag type of the user attention sample data of the selected feature classification, where the tag type includes a positive sample tag and a negative sample tag, the positive sample tag includes positive sample data, and the negative sample tag includes negative sample data; training user attention sample data of the selected feature classification by using a LightGBM tree model; observing the training LOSS LOSS of the LightGBM tree model, and verifying the model evaluation index AUC of a verification set; storing the LightGBM tree model result after training, and outputting the importance index of the selected feature classification; and normalizing the importance indexes of all the selected feature classifications to obtain the feature weight of each selected feature classification.
Specifically, in another embodiment of the present application, the feature weight calculation module is configured to configure LightGBM tree model parameters, where the LightGBM tree model parameters include a learning rate, an iteration number, a regular coefficient, and a sampling rate of a LightGBM tree model; inputting user attention sample data of the selected feature classification according to the configured LightGBM model parameters; and training the LightGBM tree model on the user attention sample data of the selected feature classification according to the continuous feature classification and the discrete feature classification in the specified selected feature classification and the label type of the user attention sample data of the selected feature classification.
Specifically, in another embodiment of the present application, the user attention calculation module is configured to obtain user attention sample data of the selected feature classification of each user and a corresponding feature weight of the selected feature classification; carrying out linear weighting on the user attention sample data and the feature weight of the selected feature classification of each user to obtain a user attention score of the user attention sample data of each user; calculating the standard deviation and the variance of the user attention degree scores of the user attention degree sample data of each user according to the user attention degree scores of the user attention degree sample data of each user; and setting a minimum variance threshold according to the standard deviation and the variance of the user attention score of the user attention sample data of each user, and deleting the user attention sample data of the selected feature classification of the user when the variance of the user attention score of a certain user is greater than the set minimum variance threshold.
The user attention index generation device performs feature classification on user attention data through a sample data acquisition module, and acquires user attention sample data of a selected feature classification in a set time interval; performing model training on the user attention data of the selected feature classification through a feature weight calculation module according to the user attention sample data of the selected feature classification to obtain the feature weight of each selected feature classification; and calculating the user attention degree score of the user attention degree sample data of each user according to the feature weight of each selected feature classification through a user attention degree calculation module, and screening the confidence interval of the user attention degree according to the user attention degree score. The invention can obtain the weight of all the feature classifications by using a large number of user feature classifications and corresponding user attention sample data for training, so that the weight of the feature classifications is more practical, and the main features of users with low attention and users with loss can be analyzed by using the attention scores, so as to make a service intervention strategy in a targeted manner.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Additionally, some of the embodiments are described herein as a method or combination of method elements that can be implemented by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (10)

1. A user attention index generation method uses a LightGBM tree model to perform feature analysis on user attention data, calculates weights of all features, performs weighting by using the feature weights and the user attention data corresponding to the feature weights, and obtains a user attention score and a ranking of a user, and the method comprises the following steps:
carrying out feature classification on the user attention data, and acquiring user attention sample data of the selected feature classification in a set time interval, wherein the sample data comprises positive sample data and negative sample data;
according to the user attention sample data of the selected feature classification, performing model training on the user attention data of the selected feature classification to obtain the feature weight of each selected feature classification;
and calculating the user attention degree score of the user attention degree sample data of each user according to the feature weight of each selected feature classification, and screening the confidence interval of the user attention degree according to the user attention degree score.
2. The method of claim 1, wherein the step of performing feature classification on the user attention data to obtain user attention sample data of the selected feature classification within a set time interval comprises:
acquiring user attention data in a set time interval, wherein the user attention data comprises access times, access duration, search times and comparison times of user attention related information;
according to the user attention data in the set time interval, carrying out feature classification on the user attention data, wherein the feature classification comprises classification according to the user access corresponding database items;
according to the feature classification of the user attention data, one or more selected user attention feature classifications are obtained;
and acquiring user attention sample data in a database corresponding to the selected feature classification within a set time interval according to the one or more user attention feature classifications.
3. The method of claim 2, wherein the step of obtaining user attention sample data in the database corresponding to the selected feature classification within a set time interval comprises:
acquiring user attention sample data of the selected feature classification in a set time interval;
setting a sample label of the user attention sample data;
acquiring the search times and the corresponding search time in the user attention sample data of the user according to the access times and the corresponding access time in the user attention sample data;
setting a time threshold, and judging the minimum time interval between access time and search time in user attention sample data of a certain user;
if the minimum time interval between the access time and the search time is less than a set time threshold, setting sample data of the user as positive sample data;
and if the minimum time interval between the access time and the search time is greater than a set time threshold, setting the sample data of the user as negative sample data.
4. The method according to claim 1, wherein the step of performing model training on the user attention data of the selected feature classification according to the user attention sample data of the selected feature classification, and obtaining the feature weight of each selected feature classification comprises:
according to the user attention sample data of the selected feature classification, designating continuous feature classification and discrete feature classification in the selected feature classification, and configuring the label type of the user attention sample data of the selected feature classification, wherein the label type comprises a positive sample label and a negative sample label, the positive sample label comprises positive sample data, and the negative sample label comprises negative sample data;
training user attention sample data of the selected feature classification by using a LightGBM tree model;
observing the training LOSS LOSS of the LightGBM tree model, and verifying the model evaluation index AUC of a verification set;
storing the LightGBM tree model result after training, and outputting the importance index of the selected feature classification;
and normalizing the importance indexes of all the selected feature classifications to obtain the feature weight of each selected feature classification.
5. The method of claim 4, wherein the training of user attention sample data for the selected feature classification using the LightGBM tree model comprises:
configuring lightGBM model parameters, wherein the lightGBM model parameters comprise the learning rate, the iteration times, the regular coefficient and the sampling rate of a lightGBM model;
inputting user attention sample data of the selected feature classification according to the configured LightGBM model parameters;
and training the LightGBM tree model on the user attention sample data of the selected feature classification according to the continuous feature classification and the discrete feature classification in the specified selected feature classification and the label type of the user attention sample data of the selected feature classification.
6. The method of claim 1, wherein the step of calculating a user attention score of the user attention sample data of each user according to the feature weight of each selected feature classification, and performing confidence interval screening on the user attention according to the user attention score comprises:
acquiring user attention sample data of the selected feature classification of each user and corresponding feature weight of the selected feature classification;
carrying out linear weighting on the user attention sample data and the feature weight of the selected feature classification of each user to obtain a user attention score of the user attention sample data of each user;
calculating the standard deviation and the variance of the user attention degree scores of the user attention degree sample data of each user according to the user attention degree scores of the user attention degree sample data of each user;
and setting a minimum variance threshold according to the standard deviation and the variance of the user attention score of the user attention sample data of each user, and deleting the user attention sample data of the selected feature classification of the user when the variance of the user attention score of a certain user is greater than the set minimum variance threshold.
7. The method of claim 1, wherein the method steps further comprise:
obtaining the characteristic weight of the selected characteristic classification of the lost user according to the user attention degree score of the user attention degree sample data of each user;
and according to the feature weight of the selected feature classification of the lost user, intervening user attention sample data of the feature classification with high feature weight in the feature weight of the selected feature classification of the lost user.
8. A user attention index generation device which performs feature analysis on user attention data by using a LightGBM tree model, calculates weights of all features, performs weighting by using the feature weights and the user attention data corresponding to the feature weights, and acquires a user attention score and a ranking of a user, the device comprising:
the sample data acquisition module is used for carrying out feature classification on the user attention data and acquiring user attention sample data of the selected feature classification in a set time interval, wherein the sample data comprises positive sample data and negative sample data;
the characteristic weight calculation module is used for carrying out model training on the user attention data of the selected characteristic classification according to the user attention sample data of the selected characteristic classification to obtain the characteristic weight of each selected characteristic classification;
and the user attention calculation module is used for calculating the user attention score of the user attention sample data of each user according to the feature weight of each selected feature classification, and screening the confidence interval of the user attention according to the user attention score.
9. An electronic device, comprising:
one or more processors; and
a memory;
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-7.
10. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-7.
CN202210058505.8A 2022-01-12 2022-01-12 User attention index generation method, device, equipment and storage medium Pending CN114463052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210058505.8A CN114463052A (en) 2022-01-12 2022-01-12 User attention index generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210058505.8A CN114463052A (en) 2022-01-12 2022-01-12 User attention index generation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114463052A true CN114463052A (en) 2022-05-10

Family

ID=81408672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210058505.8A Pending CN114463052A (en) 2022-01-12 2022-01-12 User attention index generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114463052A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117635190A (en) * 2023-11-27 2024-03-01 河北数港科技有限公司 Log data analysis method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117635190A (en) * 2023-11-27 2024-03-01 河北数港科技有限公司 Log data analysis method and system
CN117635190B (en) * 2023-11-27 2024-05-14 河北数港科技有限公司 Log data analysis method and system

Similar Documents

Publication Publication Date Title
US8738436B2 (en) Click through rate prediction system and method
Chen et al. Predicting the influence of users’ posted information for eWOM advertising in social networks
US8103650B1 (en) Generating targeted paid search campaigns
US20190311395A1 (en) Estimating click-through rate
WO2018157625A1 (en) Reinforcement learning-based method for learning to rank and server
CN111177538B (en) User interest label construction method based on unsupervised weight calculation
CN104866969A (en) Personal credit data processing method and device
US20140324528A1 (en) Computerized System and Method for Determining an Action's Relevance to a Transaction
CN110990695A (en) Recommendation system content recall method and device
WO2015124024A1 (en) Method and device for promoting exposure rate of information, method and device for determining value of search word
CN112487283A (en) Method and device for training model, electronic equipment and readable storage medium
CN113343091A (en) Industrial and enterprise oriented science and technology service recommendation calculation method, medium and program
CN113407854A (en) Application recommendation method, device and equipment and computer readable storage medium
JP5061999B2 (en) Analysis apparatus, analysis method, and analysis program
CN114463052A (en) User attention index generation method, device, equipment and storage medium
US10346856B1 (en) Personality aggregation and web browsing
CN116775882B (en) Intelligent government affair message processing method and equipment
CN112055038A (en) Method for generating click rate estimation model and method for predicting click probability
CN107766537B (en) Position searching and sorting method and computing device
CN117217808B (en) Intelligent analysis and prediction method for activity invitation capability
CN110209944B (en) Stock analyst recommendation method and device, computer equipment and storage medium
CN115794898B (en) Financial information recommendation method and device, electronic equipment and storage medium
US20130332440A1 (en) Refinements in Document Analysis
US20200311761A1 (en) System and method for analyzing the effectiveness and influence of digital online content
CN111445280A (en) Model generation method, restaurant ranking method, system, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination