CN110096499B - User object identification method and system based on behavior time series big data - Google Patents
User object identification method and system based on behavior time series big data Download PDFInfo
- Publication number
- CN110096499B CN110096499B CN201910284112.7A CN201910284112A CN110096499B CN 110096499 B CN110096499 B CN 110096499B CN 201910284112 A CN201910284112 A CN 201910284112A CN 110096499 B CN110096499 B CN 110096499B
- Authority
- CN
- China
- Prior art keywords
- data
- user
- feature
- characteristic
- behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a user object identification method based on behavior time series big data, which comprises the following steps: acquiring historical data and data to be identified, and cleaning the historical data and the data to be identified according to a cleaning rule; generating a record with a uniform structure according to the cleaned data; selecting characteristics according to the data characteristic types, and constructing characteristic sets; constructing a feature vector or a feature vector group according to the feature set; respectively generating a similar discrimination matrix or a machine learning discrimination model according to the feature vector or the feature vector group of the historical data; and carrying out user identification on the feature vector generated by the data to be identified according to the similarity discrimination matrix or the machine learning discrimination model to obtain an identification result. The invention can realize accurate identity recognition on the data with the hidden or polluted user identity information.
Description
Technical Field
The invention relates to the technical field of recognition in the field of behavior computers, in particular to a user object recognition method and system based on behavior time series big data.
Background
With the development of internet communication technology and the continuous change of social forms, the achievement and thinking of internet +' have been deepened into the daily life of people, more and more traditional living habits have been changed into network virtual behaviors, and the network virtual behaviors are collected and stored by various internet operators in the form of various behavior data, such as network shopping behaviors, webpage browsing behaviors, audio and video playing behaviors and the like. Secondly, with the rise of data mining and machine learning technologies, identity recognition technologies based on big data of network users are also developed, such as identity type recognition technologies of user gender, occupation, shopping preference and the like based on user images of all large electronic commerce.
In the prior art, identity recognition technologies based on non-image data such as Web are mostly limited to recognition of user identity types, which also results in that a large amount of user non-image behavior data cannot be used for accurate recognition of user identities. Therefore, under the condition that various data analysis and machine learning technologies are mature, the fixed-point identification of the user identity is realized by deeply mining the behavior data of the user, so that the situation that the current identity identification is almost limited to the image data can be greatly expanded.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a user object identification method based on behavior time series big data. The invention can realize accurate identity recognition by utilizing the hidden or polluted data in the user identity information.
The purpose of the invention can be realized by the following technical scheme:
a user object identification method based on behavior time series big data comprises the following steps:
acquiring historical data and data to be identified, and cleaning the historical data and the data to be identified according to a cleaning rule;
generating a record with a uniform structure according to the cleaned data;
selecting characteristics according to the data characteristic types, and constructing characteristic sets;
constructing a feature vector or a feature vector group according to the feature set;
respectively generating a similar discrimination matrix or a machine learning discrimination model according to the feature vector or the feature vector group of the historical data;
and carrying out user identification on the feature vector generated by the data to be identified according to the similarity discrimination matrix or the machine learning discrimination model to obtain an identification result.
Specifically, the historical data source is a static database and is used for generating a discrimination model; the data to be identified is newly generated data, and the source can be a static database and a dynamic data stream which are used as target data.
The data cleansing rules include user filtering, field filtering, and time filtering. User filtering means filtering out invalid users according to data quantity and data source; field filtering represents removing behavior-independent data attributes; the time filtering represents the screening of data in a specified time interval, ensures the time sequence of the data and removes the data with disordered time sequence.
Specifically, the step of generating a record with a unified structure according to the cleaned data includes:
digitizing the behavior field of the data, and adding a Timestamp (Timestamp) and a user tag (UserId) to each piece of data; the numeralization represents the mapping of behavior types to integer values within a specified range; the time stamp represents the number of seconds from a certain specified date to record the acquired time and is an integer value; the user tag indicates the number to the user, which is an integer value. The unified record generated by each piece of obtained data is represented as:
Record=<userId,Timestamp,Operation1,Operation2,...,Operationn>
wherein, Operation represents the Operation behavior recorded by the user.
Specifically, the data feature types are divided into behavior features and time sequence features of behaviors. The ith characteristic being denoted as fi。
Behavior characteristics represent the specific type of behavior itself as a characteristic, the ith behavior characteristic being denoted gi,gi=Type(Operationa) Denotes OperationaAs a feature.
The time series characteristic of an action represents the switching between different action types as a characteristic, and the time series characteristic of the ith action is marked as hi,hi=(Type(Operationa)→Type(Operationb) Represents some two operations)aAnd OperationbOf the specific type of the switching behavior.
Specifically, the selected characteristic is based on the frequency heat and the TF-IDF heat of the characteristic in the data interval.
The frequency heat of a feature is a normalized value specifying the number of occurrences of the feature in a time interval, feature fiThe frequency heat is calculated by the formula
Wherein n isfiDenotes fiThe number of times of occurrence of the event,indicating the number of occurrences of all features.
The TF-IDF heat of the feature indicates that the TF-IDF value of the specified feature in the data interval is taken as the heat, and the feature fiThe heat degree of the TF-IDF is calculated in a mode of
Wherein the content of the first and second substances,representing the frequency of the signature within the user data interval;representing all the number D of users and the number D of users containing the featurejThe logarithm of the ratio of ratios represents a measure of the general importance of a feature within the data interval.
Calculating the heat of all the characteristics f of each user, selecting Top-K with the highest heat as the characteristic set of the user, and expressing the characteristic set of the ith user as feature Ui={fi,1,fi,2,…,fi,K}。
Combining the features of u users to form a feature set with consistent scale:
Feature=featureU1∪featureU2∪...∪featureUu
all user features have m Feature elements, i.e. Feature f1,f2,…,fm}。
Specifically, constructing a user Feature vector or a Feature vector group according to the Feature set Feature comprises:
using the extracted m overall features as components to form a user feature vector, wherein each component value is the heat value of the corresponding feature in the user u, namely
UserVector=<Phu,1,Phu,2,...,Phu,m>
Wherein Ph represents a calorific value.
When the data volume is enough, the data of each user is divided into a plurality of sections according to the time section, a feature vector is generated for each section respectively, and all the feature vectors form a feature vector group.
Specifically, in the step of generating the similar discrimination matrix or the machine learning discrimination model according to the feature vector or the feature vector group of the historical data,
if the generated feature vectors are all users, the feature vectors of all users can form a discrimination matrix of u × m:
SimMatrix=<UserVector1;UserVector2;...;UserVectoru>
and if the generated characteristic vector group is the characteristic vector group, using each characteristic component of the characteristic vector as attribute input, and using the corresponding user number UserId as a label to perform machine learning training to obtain a machine learning discrimination model pi. The machine learning method can select KNN, decision tree, random forest,Bayes、GBDT。
Specifically, in the step of performing user identification on the feature vector generated by the data to be identified according to the similarity discrimination matrix or the machine learning discrimination model to obtain the identification result,
if the similarity discrimination matrix is used, similarity measurement is carried out by using each feature vector in the feature vector Identifyvector to be recognized and the similarity discrimination matrix Simmatrix, and the Top Top-N users with the most similar features are selected as candidate recognition results. The similarity measure adopts Euclidean distance, cosine similarity or Pearson correlation coefficient.
If the machine learning discrimination model pi is used, each feature component of the feature vector Identifyvector to be recognized is used as the attribute input of the machine learning discrimination model pi, and the user number corresponding to the previous Top-N value of the model pi calculation output (hit probability) is used as a candidate recognition result.
Another object of the present invention is to provide a user object recognition system based on behavior time series big data.
The other purpose of the invention can be realized by the following technical scheme:
a user object identification system based on behavior time series big data comprises a data acquisition module, a feature construction module, a model generation module and a user identification module;
the data acquisition module acquires user data from a data source, cleans the data according to a specific filtering rule, retains meaningful data and forms user behavior records which are arranged in an increasing order according to timestamps;
the characteristic construction module is used for calculating the distribution characteristics of the user behaviors according to the collected data and constructing characteristic vectors with consistent scales, wherein each characteristic element value of the characteristic vector is the heat degree of the corresponding characteristic behavior;
the model generation module constructs a characteristic matrix according to the user historical characteristic behavior data or trains according to the user historical characteristic behavior data to obtain a machine learning discrimination model;
the user identification module drives the user historical behavior data feature matrix to perform similarity measurement by the data to be identified to obtain an identification result, or drives a machine learning discrimination model by the data to be identified to obtain the identification result.
The working steps of the user object identification system are as follows:
the data acquisition module acquires user historical data from a historical data source, cleans the data according to a specific rule, only retains meaningful data, and generates a data structure with a unified structure;
the feature construction module carries out heat calculation on the generated data according to the behavior or behavior time sequence, selects the behavior or behavior sequence with the highest heat as a feature set, and constructs a feature vector for each user according to the feature set;
the model generation module generates a user similarity judgment matrix according to the user characteristic vector or obtains a machine learning judgment module II according to the characteristic vector group training;
the data acquisition module acquires data to be identified from a data source to be identified and generates a feature vector to be identified according to the feature set extracted by the feature construction module; and the user identification module identifies the user by using the corresponding discrimination module for the characteristic vector to be identified and selects Top-N as an identification result.
Compared with the prior art, the invention has the following beneficial effects:
the invention can flexibly select the similar discrimination matrix or the machine learning discrimination model according to the data volume of the user, introduces the time sequence characteristics of the user behavior in the user identification process and can effectively improve the accuracy of the user identification.
Drawings
Fig. 1 is a flowchart of a method for identifying a user object based on behavior time series big data.
FIG. 2 is a flow chart of selecting features according to feature types and constructing feature sets according to embodiments of the present invention.
FIG. 3 is a flow chart of the identification based on the similarity metric matrix in the embodiment of the present invention.
FIG. 4 is a flow chart of machine learning model based recognition in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
Fig. 1 is a flowchart of a user object identification method based on behavior time series big data, which includes the steps of:
s1, acquiring historical data and data to be identified, and cleaning the historical data and the data to be identified according to cleaning rules;
in the data acquisition module in this embodiment, both the historical data and the data to be identified are acquired from the static database using a music play record data set of a certain user.
The method comprises the steps of obtaining a user historical data set from a static database, wherein each piece of Source data is recorded as Source ═ Date, Time, user, curSong, Artist, sTag, rTag and nextSong >, wherein Date, Time and user respectively represent the Date and Time of the record and the corresponding user, curSong and nextSong represent the music being listened to by the user and the music being listened to next Time, Artist and sTag represent the singer and the type of the music, and rTag represents the collection, sharing, comment and like operation of the music by the user. Wherein the curSong, sTag, rTag, nextSong contain behavior information.
S2, generating records with uniform structures according to the cleaned data;
formatting data according to rules, reserving behavior information of curSong, sTag, rTag and nextSong, mapping the behavior information into numerical value types respectively, and recording the numerical value types as Operation1,Operation2,Operation3,Operation4And adding a timestamp at the same time, numbering the users to obtain regular data, wherein each record is data ═ UserId, Operation1,Operation2,Operation3,Operation4,Timestamp>。
S3, selecting features according to the data feature types, and constructing feature sets, wherein the specific flow is shown in FIG. 2;
in this embodiment, the behavior analysis module uses the user behavior directly as the feature during feature extraction, and uses the feature frequency heat as the selection criterion.
Selecting all operations per userl,Operation2,Operation3,Operation4All types contained as features are marked as f, the frequency heat of each type to the user is calculated, and the calculation method is
Selecting Top-K according to heat for all f types of each user, and expressing the characteristic of the ith user as feature of feature Ui={fi,1,fi,2,…,fi,KCombining the features of u users to form a feature set with consistent dimension: feature ═ Feature U1∪featureU2∪...∪featureUuAll user features have m Feature elements, i.e. Feature ═ f1,f2,…,fm}。
S4, constructing a feature vector or a feature vector group according to the feature set;
in this embodiment, the frequency heat value of m features is calculated for each user and the feature vector of the user is formed to obtain
UserVector=<Phu,1,Phu,2,...,Phu,m>
S5, respectively generating a similar discrimination matrix or a machine learning discrimination model according to the feature vector or the feature vector group of the historical data;
the similarity discrimination matrix of U M formed by the UserVectors of all users is as follows:
SimMatrix=<UserVector1;UserVector2;...;UserVectoru>
and S6, carrying out user identification on the feature vector generated by the data to be identified according to the similarity discrimination matrix or the machine learning discrimination model to obtain an identification result.
The flowcharts of the recognition method based on the similarity discrimination matrix and the machine learning model are shown in fig. 3 and 4, respectively. In this embodiment, the cosine similarity measure is used in the similarity identification. According to the obtained identified feature vector Identifyvector, calculating the cosine similarity distance of each feature vector Uservector in the Identifyvector and SimMatrix, wherein the calculation method comprises the following stepsAnd taking the Top-N with the largest cos theta as the recognition result.
A user object identification system based on behavior time series big data comprises a data acquisition module, a feature construction module, a model generation module and a user identification module;
the data acquisition module acquires user data from a data source, cleans the data according to a specific filtering rule, retains meaningful data and forms user behavior records which are arranged in an increasing order according to timestamps;
the characteristic construction module is used for calculating the distribution characteristics of the user behaviors according to the collected data and constructing characteristic vectors with consistent scales, wherein each characteristic element value of the characteristic vector is the heat degree of the corresponding characteristic behavior;
the model generation module constructs a characteristic matrix according to the user historical characteristic behavior data or trains according to the user historical characteristic behavior data to obtain a machine learning discrimination model;
the user identification module drives the user historical behavior data feature matrix to perform similarity measurement by the data to be identified to obtain an identification result, or drives a machine learning discrimination model by the data to be identified to obtain the identification result.
In the data acquisition module of the embodiment, historical data and data to be identified are both acquired from a static database by using a music playing record data set of a certain user;
the behavior analysis module uses a user time sequence behavior sequence as a feature during feature extraction, and uses TF-IDF heat of the feature as a selection standard;
the user analysis module divides the data sections and calculates the feature vector group of the user.
The model generation module establishes a machine learning model;
and the user identification module identifies the user by using the established Decision Tree.
In this embodiment, the specific workflow of the system is as follows:
the method comprises the steps of obtaining a user historical data set from a static database, wherein each piece of Source data is recorded as Source ═ Date, Time, user, curSong, Artist, sTag, rTag and nextSong >, wherein Date, Time and user respectively represent the Date and Time of the record and the corresponding user, curSong and nextSong represent the music being listened to by the user and the music being listened to next Time, Artist and sTag represent the singer and the type of the music, and rTag represents the collection, sharing, comment and like operation of the music by the user. Wherein the curSong, sTag, rTag, nextSong contain behavior information.
Formatting data according to rules, reserving behavior information of curSong, sTag, rTag and nextSong, mapping the behavior information into numerical value types respectively, and recording the numerical value types as Operation1,Operation2,Operation3,Operation4And adding a timestamp at the same time, numbering the users to obtain regular data, wherein each record is data ═ UserId, Operation1,Operation2,Operation3,Operation4,Timestamp>。
Selecting all operations per user1Type and Operation4Between types<Operation1,Operation4>The switching pair is taken as a characteristic and is marked as f, the TF-IDF heat of each f to the user is calculated, and the calculation mode is Phfi=TF-IDFfi=TFfi*IDFfiWhereinRepresenting the frequency of the signature within the user data interval;representing all the number D of users and the number D of users containing the featurejLogarithm of the ratio of.
Selecting Top-K according to heat for f of each user, and expressing the feature set of the ith user as feature Ui={fi,1,fi,2,…,fi,KCombining the features of u users to form a feature set with consistent dimension: feature ═ Feature U1∪featureU2∪...∪featureUuAll user features have m Feature elements, i.e. Feature ═ f1,f2,…,fm}。
And dividing the data of each user into P sections according to the time sections.
Calculating the frequency heat value of M characteristics for each section of each user and forming a characteristic vector, UserVector, of the userp=<Phu,1,Phu,2,…,Phu,m>A feature vector representing the pth sector of the user.
And inputting each feature vector of all users as an attribute, performing Decision Tree training by taking the user number UserId as a label, and selecting a proper training parameter. And obtaining a precision Tree discrimination model after training.
And according to the obtained feature vector IdentifVector to be identified, taking the IdentifVector as the attribute input of the Decision Tree, and calculating the user number UserId corresponding to the Top Top-N values of the output (hit probability) as a candidate identification result.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (7)
1. A user object identification method based on behavior time series big data is characterized by comprising the following steps:
acquiring historical data and data to be identified, and cleaning the historical data and the data to be identified according to a cleaning rule;
generating a record with a uniform structure according to the cleaned data;
selecting characteristics according to the data characteristic types, wherein the data characteristic types are divided into behavior characteristics and time sequence characteristics of behaviors; the ith characteristic being denoted as fi;
Behavior characteristics represent the specific type of behavior itself as a characteristic, the ith behavior characteristic being denoted gi,gi=Type(Operationa) Denotes OperationaAs a feature;
the time series characteristic of an action represents the switching between different action types as a characteristic, and the time series characteristic of the ith action is marked as hi,hi=(Type(Operationa)→Type(Operationb) Represents some two operations)aAnd OperationbSwitching behavior between specific types of;
constructing a feature set:
the degree of heat is calculated for all the features f of each user,and selecting Top-K with highest heat as the feature set of the user, wherein the feature set of the ith user is expressed as feature Ui={fi,1,fi,2,...,fi,K};
Combining the features of u users to form a feature set with consistent scale:
Feature=featureU1∪featureU2∪...∪featureUu
all user features have m Feature elements, i.e. Feature f1,f2,...,fm};
Constructing a user Feature vector or a Feature vector group according to the Feature set Feature, comprising:
using the extracted m overall features as components to form a user feature vector, wherein each component value is the heat value of the corresponding feature in the user u, namely
UserVector=<Phu,1,Phu,2,...,Phu,m>
Wherein Ph represents a calorific value;
when the data volume is enough, dividing the data of each user into a plurality of sections according to time sections, respectively generating a characteristic vector for each section, and forming a characteristic vector group by all the characteristic vectors;
respectively generating a similar discrimination matrix or a machine learning discrimination model according to the feature vector or the feature vector group of the historical data;
and carrying out user identification on the feature vector generated by the data to be identified according to the similarity discrimination matrix or the machine learning discrimination model to obtain an identification result.
2. The method for identifying the user object based on the behavioral time series big data according to claim 1, wherein the data cleansing rule comprises user filtering, field filtering and time filtering.
3. The method for identifying the user object based on the behavior time series big data as claimed in claim 1, wherein the step of generating the record with a uniform structure according to the cleaned data comprises:
digitizing the behavior field of the data, adding a timestamp and a user label to each piece of data, and expressing the obtained unified record generated by each piece of data as follows:
Record=<userId,Timestamp,Operation1,Operation2,...,Operationn>
wherein, Operation represents the Operation behavior recorded by the user, UserId represents the user tag, and Timestamp represents the Timestamp.
4. The method for identifying the user object based on the behavioral time series big data according to claim 1, wherein the selected characteristics are frequency heat and TF-IDF heat in the data interval according to the characteristics;
the frequency heat of a feature is a normalized value specifying the number of occurrences of the feature in a time interval, feature fiThe frequency heat is calculated by the formula
Wherein the content of the first and second substances,denotes fiThe number of times of occurrence of the event,representing the number of occurrences of all features;
the TF-IDF heat of the feature indicates that the TF-IDF value of the specified feature in the data interval is taken as the heat, and the feature fiThe heat degree of the TF-IDF is calculated in a mode of
Wherein the content of the first and second substances,representing the frequency of the signature within the user data interval;representing all the number D of users and the number D of users containing the featurejThe logarithm of the ratio of ratios represents a measure of the general importance of a feature within the data interval.
5. The method according to claim 1, wherein in the step of generating a similarity discrimination matrix or a machine learning discrimination model based on the feature vectors or feature vector groups of the historical data,
if the generated feature vectors are all users, the feature vectors of all users can form a discrimination matrix of u × m:
SimMatrix=<UserVector1;UserVector2;...;UserVectoru>
if the generated characteristic vector group is a characteristic vector group, using each characteristic component of the characteristic vector as attribute input, and using a corresponding user number as a label to perform machine learning training to obtain a machine learning discrimination model II; the machine learning method can select KNN, decision tree, random forest,Bayes、GBDT。
6. The method according to claim 1, wherein in the step of obtaining the recognition result by performing user recognition on the feature vector generated by the data to be recognized according to the similarity discriminant matrix or the machine learning discriminant model,
if the similarity discrimination matrix is used, similarity measurement is carried out by using the feature vector to be recognized and each feature vector in the similarity discrimination matrix, and front Top-N users with the most similar features are selected as candidate recognition results; the similarity measure adopts Euclidean distance, cosine similarity or Pearson correlation coefficient;
and if the machine learning discrimination model II is used, using each characteristic component of the characteristic vector to be recognized as the attribute input of the machine learning discrimination model II, and using the user number corresponding to the output front Top-N value calculated by the model II as a candidate recognition result.
7. A system for realizing the behavior time series big data-based user object identification method of any one of claims 1-6, wherein the system comprises a data acquisition module, a feature construction module, a model generation module and a user identification module;
the data acquisition module acquires user data from a data source, cleans the data according to a specific filtering rule, retains meaningful data and forms user behavior records which are arranged in an increasing order according to timestamps;
the characteristic construction module is used for calculating the distribution characteristics of the user behaviors according to the collected data and constructing characteristic vectors with consistent scales, wherein each characteristic element value of the characteristic vector is the heat degree of the corresponding characteristic behavior;
the model generation module constructs a characteristic matrix according to the user historical characteristic behavior data or trains according to the user historical characteristic behavior data to obtain a machine learning discrimination model;
the user identification module drives the user historical behavior data feature matrix to perform similarity measurement by the data to be identified to obtain an identification result, or drives a machine learning discrimination model by the data to be identified to obtain the identification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910284112.7A CN110096499B (en) | 2019-04-10 | 2019-04-10 | User object identification method and system based on behavior time series big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910284112.7A CN110096499B (en) | 2019-04-10 | 2019-04-10 | User object identification method and system based on behavior time series big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110096499A CN110096499A (en) | 2019-08-06 |
CN110096499B true CN110096499B (en) | 2021-08-10 |
Family
ID=67444601
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910284112.7A Active CN110096499B (en) | 2019-04-10 | 2019-04-10 | User object identification method and system based on behavior time series big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110096499B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110795570B (en) * | 2019-10-11 | 2022-06-17 | 上海上湖信息技术有限公司 | Method and device for extracting user time sequence behavior characteristics |
CN111461180A (en) * | 2020-03-12 | 2020-07-28 | 平安科技(深圳)有限公司 | Sample classification method and device, computer equipment and storage medium |
WO2021243534A1 (en) * | 2020-06-02 | 2021-12-09 | 深圳市欢太科技有限公司 | Behavior control method and apparatus and storage medium |
CN112381112B (en) * | 2020-10-16 | 2023-11-07 | 华南理工大学 | User identity recognition method and system based on multi-mode item set of user data |
CN113743103A (en) * | 2021-08-20 | 2021-12-03 | 南京星云数字技术有限公司 | Comment user identity identification method and device, computer equipment and storage medium |
CN116578910B (en) * | 2023-07-13 | 2023-09-15 | 成都航空职业技术学院 | Training action recognition method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103646197A (en) * | 2013-12-12 | 2014-03-19 | 中国石油大学(华东) | User credibility authentication system and method based on user behaviors |
CN104102819A (en) * | 2014-06-27 | 2014-10-15 | 北京奇艺世纪科技有限公司 | Determining method and device for user natural attributes |
CN105577431A (en) * | 2015-12-11 | 2016-05-11 | 青岛云成互动网络有限公司 | User information identification and classification method based on internet application and system thereof |
CN108197190A (en) * | 2017-12-26 | 2018-06-22 | 北京秒针信息咨询有限公司 | A kind of method and apparatus of user's identification |
CN109583472A (en) * | 2018-10-30 | 2019-04-05 | 中国科学院计算技术研究所 | A kind of web log user identification method and system |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103744957A (en) * | 2014-01-06 | 2014-04-23 | 同济大学 | Sequence mode mining method based on Web user time attributes |
US11164089B2 (en) * | 2015-10-12 | 2021-11-02 | International Business Machines Corporation | Transaction data analysis |
CN105306495B (en) * | 2015-11-30 | 2018-06-19 | 百度在线网络技术(北京)有限公司 | user identification method and device |
US9983859B2 (en) * | 2016-04-29 | 2018-05-29 | Intuit Inc. | Method and system for developing and deploying data science transformations from a development computing environment into a production computing environment |
CN106911668B (en) * | 2017-01-10 | 2020-07-14 | 同济大学 | Identity authentication method and system based on user behavior model |
CN107515915B (en) * | 2017-08-18 | 2020-02-18 | 晶赞广告(上海)有限公司 | User identification association method based on user behavior data |
CN108280482B (en) * | 2018-01-30 | 2020-10-16 | 广州小鹏汽车科技有限公司 | Driver identification method, device and system based on user behaviors |
CN108388969A (en) * | 2018-03-21 | 2018-08-10 | 北京理工大学 | Inside threat personage's Risk Forecast Method based on personal behavior temporal aspect |
-
2019
- 2019-04-10 CN CN201910284112.7A patent/CN110096499B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103646197A (en) * | 2013-12-12 | 2014-03-19 | 中国石油大学(华东) | User credibility authentication system and method based on user behaviors |
CN104102819A (en) * | 2014-06-27 | 2014-10-15 | 北京奇艺世纪科技有限公司 | Determining method and device for user natural attributes |
CN105577431A (en) * | 2015-12-11 | 2016-05-11 | 青岛云成互动网络有限公司 | User information identification and classification method based on internet application and system thereof |
CN108197190A (en) * | 2017-12-26 | 2018-06-22 | 北京秒针信息咨询有限公司 | A kind of method and apparatus of user's identification |
CN109583472A (en) * | 2018-10-30 | 2019-04-05 | 中国科学院计算技术研究所 | A kind of web log user identification method and system |
Non-Patent Citations (1)
Title |
---|
基于行为序列的移动智能终端用户身份认证技术研究;徐启寒;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115(第12期);I138-103 * |
Also Published As
Publication number | Publication date |
---|---|
CN110096499A (en) | 2019-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096499B (en) | User object identification method and system based on behavior time series big data | |
CN108363804B (en) | Local model weighted fusion Top-N movie recommendation method based on user clustering | |
Bertin-Mahieux et al. | Automatic tagging of audio: The state-of-the-art | |
CN109408665A (en) | A kind of information recommendation method and device, storage medium | |
JP6435426B1 (en) | Information analysis apparatus, information analysis method, and information analysis program | |
CN109511015B (en) | Multimedia resource recommendation method, device, storage medium and equipment | |
CN105849763A (en) | Systems and methods for dynamically determining influencers in a social data network using weighted analysis | |
CN105426514A (en) | Personalized mobile APP recommendation method | |
JP6033697B2 (en) | Image evaluation device | |
US9245035B2 (en) | Information processing system, information processing method, program, and non-transitory information storage medium | |
CN103886081A (en) | Information sending method and system | |
KR20120101233A (en) | Method for providing sentiment information and method and system for providing contents recommendation using sentiment information | |
CN111177559B (en) | Text travel service recommendation method and device, electronic equipment and storage medium | |
JP5895052B2 (en) | Information analysis system and information analysis method | |
CN114238573B (en) | Text countercheck sample-based information pushing method and device | |
CN111651678B (en) | Personalized recommendation method based on knowledge graph | |
Wu et al. | An incremental community detection method for social tagging systems using locality-sensitive hashing | |
CN108363748B (en) | Topic portrait system and topic portrait method based on knowledge | |
CN113239159B (en) | Cross-modal retrieval method for video and text based on relational inference network | |
EP3340073A1 (en) | Systems and methods for processing of user content interaction | |
CN110958472A (en) | Video click rate rating prediction method and device, electronic equipment and storage medium | |
JP2009116457A (en) | Method and device for analyzing internet site information | |
JP2012168986A (en) | Method of providing selected content items to user | |
CN115456676A (en) | Game advertisement visual delivery data analysis management method and system | |
Bunga et al. | From implicit preferences to ratings: video games recommendation based on collaborative filtering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |