CN113946569A - User portrait construction method - Google Patents

User portrait construction method Download PDF

Info

Publication number
CN113946569A
CN113946569A CN202110987465.0A CN202110987465A CN113946569A CN 113946569 A CN113946569 A CN 113946569A CN 202110987465 A CN202110987465 A CN 202110987465A CN 113946569 A CN113946569 A CN 113946569A
Authority
CN
China
Prior art keywords
user
data
label
behavior
construction method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110987465.0A
Other languages
Chinese (zh)
Inventor
陈凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Krypton Cell Network Technology Co ltd
Original Assignee
Wuhan Krypton Cell Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Krypton Cell Network Technology Co ltd filed Critical Wuhan Krypton Cell Network Technology Co ltd
Priority to CN202110987465.0A priority Critical patent/CN113946569A/en
Publication of CN113946569A publication Critical patent/CN113946569A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user portrait construction method, which comprises the following steps: acquiring a large amount of user behavior data; establishing a fact label library according to the collected behavior data; training a label model through a plurality of fact label libraries by using logistic regression; matching the similarity of the user and the label model library through the behavior weight to construct a user portrait; the user representation is continuously corrected and adjusted using a temporal decay factor. The invention has the beneficial effects that: the method uses a Newton cooling law mathematical model to predict that the historical behavior and the current correlation of the behavior of the user are weakened continuously along with the passing of time, and continuously corrects the label attribute of the user after establishing a function related to the time attenuation; the learning method is supervised based on logistic regression, which is based on likelihood classification, and the maximum correlation of data can be obtained.

Description

User portrait construction method
Technical Field
The invention relates to the technical field of software management, in particular to a user portrait construction method.
Background
At present, the data industry index explosion level development is realized by collecting social attributes, consumption habits,
And the characteristic attributes of the user or the product are described by preference characteristics and other dimensional data, and the characteristic analysis statistics is carried out on the characteristics to mine potential value information, so that the information overview of the user is abstracted, the information overview can be taken as a root of enterprise application big data, and the information overview is a precondition for targeted advertisement delivery and personalized recommendation.
1. For example, the chinese patent discloses a user portrait construction method, a device, an electronic device, and a readable storage medium (application number: CN 201911291414.3). the user portrait construction method first obtains a preset application scene corresponding to a user portrait to be constructed, and generates at least one dimension label according to the preset application scene, where the dimension label is used to indicate user information required by different application scenes. And acquiring user information corresponding to each dimension label based on a plurality of preset information acquisition channels. And finally, constructing the user portrait according to the user information. The method acquires the user information in all directions through a plurality of preset information acquisition channels, portrays the user according to the user information, and improves the accuracy of user portrayal.
2. For example, the Chinese patent also discloses a portrait construction method (application number: CN202110476312.X) based on data self-learning, the method issues and authorizes corresponding entity algorithm authority by defining an algorithm, defines a label for an entity and binds the corresponding relationship between the label and the algorithm; grouping a plurality of labels under an entity, and appointing a label list combination under each group; binding an entity with a data set, and specifying association conditions among the data sets; and constructing an entity portrait task. The method for constructing the portrait can more intuitively express the relationship between the entity and the portrait, more finely control the generation process of the label and the construction process of the portrait, and more flexibly adjust the realization process of the algorithm through the dynamic adjustment of the threshold parameter and the input parameter, thereby achieving the multiplexing capability of the algorithm. In addition, the accuracy of the label can be fed back dynamically through secondary correlation analysis of the grouping and the label, so that a basis is provided for adjustment of algorithm parameters.
The prior art more or less uses the characteristic of labeling user information to carry out iteration and correction, but still does not get rid of the following problems:
in the disclosed technology, user information is collected in a large range to construct a tag library of the user, the user is grouped to construct a user portrait, but the processing of cold and hot tags is omitted, some user tags may increase or decrease along with the user's liking and maturity changes of things to be treated, and the user tags should be continuously learned so as to achieve automatic correction.
The prior art disclosed the weight-treated classification algorithm is not adjusted timely, and the importance of a word is in direct proportion to the number of times it appears in the article and in inverse proportion to the number of times it appears in the whole document set. The relation between the label and the user can reflect the relation between the labels to a certain extent, the patent classifies based on the weight of the correlation coefficient matrix, the direct correlation between the label and the label is greatly improved, and when the user quantity and the label magnitude are more, the more the correlation between every two labels is obvious.
Therefore, it is necessary to provide a user profile construction method for the above problems.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention provides a user portrait construction method to solve the above-mentioned problems.
A user portrait construction method comprises the following steps:
s1, acquiring user behavior data;
s2, establishing a fact label library according to the collected behavior data;
s3, training a label model through a fact label library by using logistic regression;
s4, matching the similarity of the user and the label model library through behavior weight to construct a user portrait;
s5, continuously correcting and adjusting the user image by using the time attenuation factor.
Step S1 is to pre-embed user behavior embedding points in the software, and subdivide the behavior granularity according to the times and durations of the different behavior embedding points.
The step of acquiring the user behavior data in step S1 includes:
(1) based on the operation habits and behavior paths of the client-side multipoint multi-mobile-phone users;
(2) sending buried point data to cloud (cloud server) in scene without user awareness
(3) The cloud server receives the point data and uses a data analysis type database (such as CilckHouse) to persist the data.
Wherein the component fact tag repository in step S2 builds a fact tag repository from the data collected in step S1.
Wherein the step of establishing the fact label library comprises the following steps:
(1) tag library construction (hereinafter dw library) using persistent buried point data
(2) Then cleaning the buried point data (removing misoperation data, meaningless data and violation data)
(3) Selection of data features and decision tree generation from decision tree regression algorithms
Wherein the step of training the label model in step S3 is:
(1) utilizing machine learning enables a process that lets a computer learn to process a question as if it were a person and give an answer;
(2) the label can be trained by using the logistic regression of the linear support vector machine, the idea of the label training method is that the dichotomy is used, the label training method is very suitable for the question, and the label training method belongs to the supervised learning in ML
(3) The small-granularity labels are trained through learning of logistic regression into a label model that can be matched by step S4.
Wherein the step of constructing the user portrait in step S4 is:
(1) firstly, grouping users, wherein when an application scene mainly uses labels for businesses, pushing is often not performed by using only one label, and a plurality of labels are required to be combined to meet the definition of the crowd in the business under more conditions, and the user grouping is equivalent to making a crowd template and pushing the crowd under different scenes.
(2) In the process of constructing the portrait, users with certain attributes are determined to be used as data samples, and data characteristics of the users are extracted to train a model;
(3) having clarified the user data features to match our label model, for a given data set, a dividing line can be found in the sample space to separate the two different classes of samples, and this line is furthest from the closest training data point.
Wherein the step of correcting and adjusting the user portrait in step S5 is: the user portrait is adjusted by predicting the time attenuation factor, the time attenuation factor coefficient is different for different labels, some labels are not even influenced by time, and the attenuation factor is not needed to be considered in calculation.
Wherein the embedding points comprise clicking, browsing and quitting.
The data to be cleaned comprises misoperation data, meaningless data and violation data.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention is based on the weight classification of TF-IDF algorithm (), so that the relationship between the user and the label T is more compact;
2. applying a Newton cooling law mathematical model to predict that the historical behavior and the current correlation of the behavior of the user are weakened continuously along with the passing of time, establishing a function related to the time attenuation, and continuously correcting the label attribute of the user;
3. the learning method is supervised based on logistic regression, which is based on likelihood classification, so that the maximum correlation of data can be obtained.
Drawings
Fig. 1 is a flowchart of a background live video auditing method of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.
As shown in fig. 1, a method for constructing a user portrait includes the steps of:
s1, acquiring user behavior data;
s2, establishing a fact label library according to the collected behavior data;
s3, training a label model through a fact label library by using logistic regression;
s4, matching the similarity of the user and the label model library through behavior weight to construct a user portrait;
s5, continuously correcting and adjusting the user image by using the time attenuation factor.
Step S1 is to pre-embed user behavior embedding points in the software, and subdivide the behavior granularity according to the times and durations of the different behavior embedding points. For example, 5 seconds and 30 seconds of behavior data should be categorized differently when viewing a page.
The step of acquiring the user behavior data in step S1 includes:
(1) based on the operation habits and behavior paths of the client-side multipoint multi-mobile-phone users;
(2) sending buried point data to cloud (cloud server) in scene without user awareness
(3) The cloud server receives the point data and uses a data analysis type database (such as CilckHouse) to persist the data.
Wherein the component fact tag repository in step S2 builds a fact tag repository from the data collected in step S1. For example, behavior data of long time and short time may be filtered, behavior data of 1 second may be classified as user mistouch through various model judgments, and behavior data of time greater than a certain threshold may be classified as meaningless and not participating in model calculation.
Wherein the step of establishing the fact label library comprises the following steps:
(1) tag library construction (hereinafter dw library) using persistent buried point data
(2) Then cleaning the buried point data (removing misoperation data, meaningless data and violation data);
(3) and selecting data characteristics and generating a decision tree according to a decision tree regression algorithm.
Wherein the step of training the label model in step S3 is:
(1) utilizing machine learning enables a process that lets a computer learn to process a question as if it were a person and give an answer;
(2) the label can be trained by using the logistic regression of the linear support vector machine, the idea of the label training method is that the dichotomy is used, the label training method is very suitable for the question, and the label training method belongs to the supervised learning in ML
(3) The small-granularity labels are trained through learning of logistic regression into a label model that can be matched by step S4.
Wherein the step of constructing the user portrait in step S4 is:
(1) firstly, users are grouped, and when the application scene is mainly a service use label, the label is not always used
Only one label is used for pushing, under more conditions, a plurality of labels are required to be combined to meet the definition of the crowd in business, and the grouping of users is equivalent to making a crowd template to push the crowd in different scenes.
(2) In the process of constructing the portrait, users with certain attributes are determined to be used as data samples, and data characteristics of the users are extracted to train a model;
(3) having clarified the user data features to match our label model, for a given data set, a dividing line can be found in the sample space to separate the two different classes of samples, and this line is furthest from the closest training data point.
Wherein the step of correcting and adjusting the user portrait in step S5 is: the user portrait is adjusted by predicting the time attenuation factor, the time attenuation factor coefficient is different for different labels, some labels are not even influenced by time, and the attenuation factor is not needed to be considered in calculation.
Wherein the embedding points comprise clicking, browsing and quitting.
The data to be cleaned comprises misoperation data, meaningless data and violation data.
Compared with the prior art, the invention has the beneficial effects that:
1. the weight classification based on the TF-IDF algorithm ensures that the relationship between the user and the label T is tighter;
2. the method uses a Newton cooling law mathematical model to predict that the historical behavior and the current correlation of the behavior of the user are weakened continuously along with the passing of time, and continuously corrects the label attribute of the user after establishing a function related to the time attenuation;
3. supervised learning methods based on logistic regression, which is based on probability classification, can obtain the maximum correlation of data (because in practice the speed of a person is not constant, we have no way to get the speed at different times through this line.
TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse Document Frequency).
TF-IDF is a statistical method to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of the degree of relevance between a document and a user query. In addition to TF-IDF, search engines on the internet use a ranking method based on link analysis to determine the order in which documents appear in search results.
The main idea of TF-IDF is: if a word or phrase appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification.
The method has the advantages that the algorithm accuracy and the algorithm flexibility are greatly improved, particularly, the processing of the cold and hot labels is realized, the application uses the recorded information, and the method has the advantages of static stability, small data volume and large information volume relative to user information such as user webpage browsing records, social network relations, news advertisement click records and the like, and the user portrait information constructed by the method can be more accurately defined and identified. Further, the application label information comprises application installation label information and/or application active label information, and the user is labeled from different dimensions, so that more accurate user portrait information is constructed. Further, the application installation label information and/or the application activity label information based on the application theme are provided, richer differentiation label information is obtained, and the applications can be better classified.
The working process is as follows:
a method of constructing a user representation comprising the steps of:
and S1, acquiring a large amount of user behavior data.
And S2, building a fact label library according to the collected behavior data.
And S3, training a label model through a plurality of fact label libraries by using logistic regression.
And S4, matching the similarity of the user and the label model library through the behavior weight to construct the user portrait.
S5, continuously correcting and adjusting the user image by using the time attenuation factor.
The specific steps of acquiring the user behavior data in step S1 are as follows:
(1) operation habit and behavior path based on client-side embedded multi-mobile-phone user
(2) Sending buried point data to cloud (cloud server) in scene without user awareness
(3) The cloud server receives the purchase point data and adopts a data analysis type database (such as CilckHouse) to persist the data;
wherein the data analysis type database: different from the transaction processing (OLTP) scenario, for example, a shopping cart is added, an order is placed, payment and the like in an e-market scenario require a lot of insert, update and delete operations in place, and a data analysis (OLAP) scenario generally performs flexible exploration, BI tool insight, report making and the like of any dimension after data is imported in batches. After the data is written once, an analyst needs to try to mine and analyze the data from various angles until discovering information such as business value, business change trend and the like. This is a process that requires trial and error, constant adjustment, and continuous optimization, where data is read much more often than written. This requires the underlying database to be specifically designed for this feature rather than blindly adopting the technical architecture of the conventional database.
Wherein the step of building a fact label library according to the collected behavior data in step S2 comprises:
(1) building a tag library (called dw library below) by using the persistent buried point data;
(2) then, the data of the buried points are cleared (removing misoperation data, meaningless data and violation data);
(3) selection of data features and generation of decision trees according to a decision tree regression algorithm
Decision tree: the construction of the decision tree algorithm is divided into 3 parts: selection of features, generation of a decision tree, pruning of the decision tree, selection of features-selection of features that maximize information gain; that is, the selection of a classification feature must be more deterministic for the classification, and this feature is better; generating a decision tree, namely ID3, and performing C4.5 algorithm, wherein the decision tree is constructed in an iterative mode; note that the decision tree at this time is over-fitted because each selection is a locally optimal solution; the pruning of the decision tree, namely the pruning of the decision tree, is to prevent overfitting, according to the global cost function, if a number of branches are pruned, the cost function becomes smaller, then the branch is pruned;
wherein the step of training the label model in step S3 is: because rule judgment or manual classification cannot handle users with missing data or users who are not within the rule range, machine learning is needed to be utilized so that processing can enable a computer to learn and process the problem like a human and give an answer; the label can be subjected to model training by using logistic regression of a linear support vector machine, and the idea of the label training method is very suitable for the problem by using a dichotomy and belongs to one of supervised learning in ML; the small-granularity labels are trained into a label model for matching by S4 through learning of logistic regression.
Training can adopt a good idea to ensure the integrity of the model, and the idea of the method is that a conservative type definition is adopted, so long as a client has records on any label, the client is considered to be classified into the user. The value may also be null, considering that Y for a sample does not necessarily have a value across multiple products.
Since the most effective way to improve the model KS is to extend the data dimension, i.e., feature engineering, we must encounter the problem of multiple data sources X.
Wherein the linear regression algorithm: the purpose of regression is to predict the target values of the numerical type. The most straightforward way is to write a calculation formula for the target value from the input, which is called regression equation. The process of finding the regression coefficients in the regression equation is regression.
Linear regression (linear regression) means that the input terms can be multiplied by constants respectively and the results can be added together to obtain the output.
One problem with linear regression is that under-fitting phenomena are likely to occur because it addresses an unbiased estimate with minimum mean square error. To reduce the predicted mean square error, some bias can be introduced into the estimation, one of which is Local Weighted Linear Regression (LWLR); if the data has more features than the sample points, i.e. the matrix X of the input data is not a full rank matrix, the non-full rank matrix may present problems in the inversion. To solve this problem, ridge regression (ridge regression), lasso method, forward stepwise regression may be used.
Wherein the step of constructing the user portrait in step S4 is:
firstly, grouping users, wherein when an application scene mainly uses labels for businesses, pushing is often not performed by using only one label, and a plurality of labels are required to be combined to meet the definition of the crowd in the business under more conditions, and the user grouping is equivalent to making a crowd template and pushing the crowd under different scenes.
In the process of constructing the portrait, users with some attributes are used as data samples, and data features of the users are extracted to train the model. Having clarified the user data features to match our label model, for a given data set, a dividing line can be found in the sample space to separate the two different classes of samples, and this line is furthest from the closest training data point.
Finally, in step S5, the steps of correcting and adjusting the user portrait are: predicting the temporal decay factor to make adjustments to the user representation, the heat of some of the label models in our library may grow linearly with time leading to gradual cooling. For example, a piece of news may be the highest in its "temperature" today, but over time, the piece of news will gradually change to the same "temperature" as ordinary news; the attenuation factor coefficient of time is different for different labels, some labels are not even influenced by time, and the attenuation factor does not need to be considered in calculation.
Time attenuation factor: the time decay factor represents the gradual cooling process of the heat of the label along with time, and is derived from Newton's law of cooling, and the formula is shown as follows:
Figure RE-GDA0003403009750000131
wherein T (t) is the current temperature;
Figure RE-GDA0003403009750000132
the temperature drop speed of the object; k is the cooling coefficient; h is the heat convection heat transfer coefficient of the object;
the law states that the cooling rate of an object is proportional to the temperature difference between its current temperature and room temperature. For the news domain, a piece of news may be the highest "temperature" today, but over time, the piece of news will gradually become as "temperature" as ordinary news.
By deriving newton's law of cooling, we have derived the following equation:
Figure RE-GDA0003403009750000133
wherein T (t) is the current temperature; t (T)0) Is the original temperature; k is the cooling coefficient, t0-t is the interval time;
the formula is shown in the specification: the current temperature, X exp (-cooling coefficient X interval time), applied to the label means: current weight X exp (cooling coefficient X interval time)
Such as: setting the weight of the preference of the user on the day of action as 1, setting the weight as 0.2 after 10 days, namely setting the weight to be 0.2 after 9 days, substituting the known variable into the formula, and obtaining the cooling coefficient through exponential operation, thereby obtaining the time decay factor.
Example (b):
with userID1 as the basic unit for identifying users, users are required to fill in basic information during registration, such as sex, age, area, school, interest tags; as the information input by the user, the part of the information may have different psychological authenticity of different users, and should be supplemented and corrected as basic data, wherein the correction may be reviewer correction, decision tree judgment correction, and the like.
The correction method is provided with two types:
frequency sense correction: the parameters are considered to be fixed values that exist objectively, although unknown. Thus, the parameter values can be estimated by optimizing a likelihood function or the like.
Bayesian sense correction: the parameters are considered random variables that are not observed and may themselves have a distribution. Thus, the parameters may be assumed to follow a prior distribution, and a posterior distribution of the parameters may be calculated based on the observed data.
Since the data has already been taken, a model class has already been determined, but the actual parameters are not yet known. Since the current observation sample has appeared, a set of parameters is estimated according to the result, so that the probability of the current result is the maximum (optimization goal), and since all samples in a set of samples are a whole, the probabilities of the samples are multiplied (multiplication principle in permutation and combination) to obtain an objective function; estimation correction is performed from now on.
The method comprises the steps of collecting and dividing labels according to a user behavior path reported by a client, selecting the client to report buried point data through cloud server communication in the implementation case, writing the data into a CilckHouse column type storage server by the cloud server for persistence, and discarding simple meaningless data (for example, exceeding a normal service range value, and not performing warehousing) while writing.
And (3) carrying out logistic regression on classified label data to continuously subdivide the granularity, then learning through basic information and behaviors of the userID1, and cooperatively calculating user similarity labels with the same behaviors, wherein a weight model needs to be introduced to continuously correct the labels, and the world weakening factors of different labels are different and should be dynamically adjusted, so that the portrait of each user is made.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A user portrait construction method is characterized in that: the method comprises the following steps:
s1, acquiring user behavior data;
s2, establishing a fact label library according to the collected behavior data;
s3, training a label model through a fact label library by using logistic regression;
s4, matching the similarity of the user and the label model library through behavior weight to construct a user portrait;
s5, continuously correcting and adjusting the user image by using the time attenuation factor.
2. A user representation construction method as claimed in claim 1, wherein: step S1 is to pre-embed user behavior embedding points in the software, and subdivide the behavior granularity according to the times and durations of the different behavior embedding points.
3. A user representation construction method as claimed in claim 1, wherein: the step of acquiring the user behavior data in step S1 includes:
(1) based on the operation habit and behavior path of a client embedded mobile phone user;
(2) sending the buried point data to a cloud end under a scene that a user does not sense;
(3) and the cloud end receives the buried point data and adopts a data analysis type database to persist the buried point data.
4. A user representation construction method as claimed in claim 1, wherein: wherein the component fact tag repository in step S2 builds a fact tag repository from the data collected in step S1.
5. A user representation construction method as claimed in claim 1, wherein: wherein the step of establishing the fact label library comprises the following steps:
(1) building a tag library by using the persistent buried point data;
(2) then cleaning the buried point data;
(3) and selecting data characteristics and generating a decision tree according to a decision tree regression algorithm.
6. A user representation construction method as claimed in claim 1, wherein: wherein the step of training the label model in step S3 is:
(1) utilizing machine learning enables a process that lets a computer learn to process a question as if it were a person and give an answer;
(2) performing model training on the labels by using logistic regression of a linear support vector machine,
(3) the small-granularity labels are trained through learning of logistic regression into a label model that can be matched by step S4.
7. A user representation construction method as claimed in claim 1, wherein: wherein the step of constructing the user portrait in step S4 is:
(1) grouping users;
(2) in the process of constructing the portrait, users with certain attributes are determined to be used as data samples, and data characteristics of the users are extracted to train a model;
(3) having clarified the user data features to match the label model, for a given data set, a dividing line can be found in the sample space to separate the two different classes of samples, and this line is furthest from the closest training data point.
8. A user representation construction method as claimed in claim 1, wherein: finally, in step S5, the steps of correcting and adjusting the user portrait are: the temporal attenuation factor is predicted to adjust the user representation.
9. A user representation construction method as claimed in claim 2, wherein: the embedding points comprise clicking, browsing and quitting.
10. A user representation construction method as claimed in claim 1, wherein: the data to be cleaned comprises misoperation data, meaningless data and violation data.
CN202110987465.0A 2021-08-26 2021-08-26 User portrait construction method Pending CN113946569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110987465.0A CN113946569A (en) 2021-08-26 2021-08-26 User portrait construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110987465.0A CN113946569A (en) 2021-08-26 2021-08-26 User portrait construction method

Publications (1)

Publication Number Publication Date
CN113946569A true CN113946569A (en) 2022-01-18

Family

ID=79327553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110987465.0A Pending CN113946569A (en) 2021-08-26 2021-08-26 User portrait construction method

Country Status (1)

Country Link
CN (1) CN113946569A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114090854A (en) * 2022-01-24 2022-02-25 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment
CN114428666A (en) * 2022-01-27 2022-05-03 中国铁道科学研究院集团有限公司电子计算技术研究所 Intelligent elastic expansion method and system based on CPU and memory occupancy rate

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114090854A (en) * 2022-01-24 2022-02-25 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment
CN114090854B (en) * 2022-01-24 2022-04-19 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment
CN114428666A (en) * 2022-01-27 2022-05-03 中国铁道科学研究院集团有限公司电子计算技术研究所 Intelligent elastic expansion method and system based on CPU and memory occupancy rate

Similar Documents

Publication Publication Date Title
CN110263265B (en) User tag generation method, device, storage medium and computer equipment
Karimi et al. News recommender systems–Survey and roads ahead
Bagher et al. User trends modeling for a content-based recommender system
CN108154395B (en) Big data-based customer network behavior portrait method
Salehi et al. Personalized recommendation of learning material using sequential pattern mining and attribute based collaborative filtering
Agarwal et al. Statistical methods for recommender systems
Shin et al. Context-aware recommendation by aggregating user context
Shi et al. Local representative-based matrix factorization for cold-start recommendation
CN103731738A (en) Video recommendation method and device based on user group behavioral analysis
WO2002010954A2 (en) Collaborative filtering
Kim et al. Recommendation system for sharing economy based on multidimensional trust model
CN113946569A (en) User portrait construction method
Zhang et al. A dynamic trust based two-layer neighbor selection scheme towards online recommender systems
CN112632405A (en) Recommendation method, device, equipment and storage medium
Hong-Xia An improved collaborative filtering recommendation algorithm
Bhattacharya et al. Intent-aware contextual recommendation system
CN113869931A (en) Advertisement putting strategy determining method and device, computer equipment and storage medium
Zhong et al. Design of a personalized recommendation system for learning resources based on collaborative filtering
Gisselbrecht et al. Whichstreams: A dynamic approach for focused data capture from large social media
Duan et al. A hybrid intelligent service recommendation by latent semantics and explicit ratings
Moniz et al. A framework for recommendation of highly popular news lacking social feedback
Yan et al. Dynamic clustering based contextual combinatorial multi-armed bandit for online recommendation
Zhang et al. Incorporating temporal dynamics into LDA for one-class collaborative filtering
Ficel et al. A graph-based recommendation approach for highly interactive platforms
CN109299368B (en) Method and system for intelligent and personalized recommendation of environmental information resources AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination