CN108629633A - A kind of method and system for establishing user's portrait based on big data - Google Patents
A kind of method and system for establishing user's portrait based on big data Download PDFInfo
- Publication number
- CN108629633A CN108629633A CN201810438144.3A CN201810438144A CN108629633A CN 108629633 A CN108629633 A CN 108629633A CN 201810438144 A CN201810438144 A CN 201810438144A CN 108629633 A CN108629633 A CN 108629633A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- user
- portrait
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0257—User requested
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method and system for establishing user's portrait based on big data, belong to big data applied technical field.The method for establishing user's portrait based on big data of the present invention includes the following steps:S1:Build user's portrait label system.S2:Data are pre-processed.S3:Sample automatic marking.S4:The processing of user data sample imbalance.S5:Feature Engineering.S6:Model training.It is combined using more disaggregated models and two disaggregated models.S7:Model optimization.The method and system that user's portrait is established based on big data of the invention can improve user's portrait accuracy has good application value so as to build Personalized Intelligent Recommendation system, precision marketing and accurate advertisement.
Description
Technical field
The present invention relates to big data applied technical fields, specifically provide a kind of method for establishing user's portrait based on big data
And system.
Background technology
How effectively with the arrival in big data epoch, the user data of integration is more and more, and information content is increasing, profit
With the data of accumulation, more accurate more valuable data information is obtained, and obtained valuable data information is passed through effective
The exhibition method of labeling is presented, and then establishes accurate user's portrait, is present big data field problem encountered.Existing skill
User's portrait is more the mode that manual intervention labels in art, and labor intensive is more, and it is main with the person of labelling to be labeled with label
Preference gender gap is big, and label accuracy causes anxiety.
Invention content
The technical assignment of the present invention is that in view of the above problems, user's portrait accuracy can be improved by providing one kind,
So as to build the side for establishing user's portrait based on big data of Personalized Intelligent Recommendation system, precision marketing and accurate advertisement
Method.
The further technical assignment of the present invention is to provide a kind of system for establishing user's portrait based on big data.
To achieve the above object, the present invention provides following technical solutions:
A method of user's portrait is established based on big data, the described method comprises the following steps:
S1:Build user's portrait label system
User data is normalized to the label system of target effective, label is divided into structuring and unstructured, structured tag
There is clear level association father and son's classification relation, label is regular, and unstructured label does not have hierarchical relationship, label dispersion;
S2:Data are pre-processed;
S3:Sample automatic marking
Using sample semi-supervised learning automatic marking;
S4:The processing of user data sample imbalance
For data Layer sample carry out over-sampling or lack sampling processing, for algorithm layer sample carry out cost-sensitive and
Integrated study processing;
S5:Feature Engineering
Sample set structure is completed, and feature is extracted from sample, according to specific data type, does tagsort;
S6:Model training
It is combined using more disaggregated models and two disaggregated models;
S7:Model optimization
Analysis model is over-fitting or poor fitting, and is optimized to model.
User's portrait is exactly the user model of the labeling gone out according to the data abstraction of user, i.e. user data label
Change, that is, refines data and tagged.After data prediction, sample is extracted, to user's portrait tag modeling, knot
Close machine learning, deep learning include the methods of deeply study, transfer learning, natural language processing continue to optimize as a result,
Improve user's portrait accuracy.
Model training process needs are trained by many models, then find optimization.With user interest data
For classification, class mesh number is numerous, and has father and son's hierarchical relationship, using model carry out more classification can not meet demand, can
To use hierarchy model, in the assorting process, it is also necessary to consider the dependence between classification tree hierarchy node, and
Classification problem inside level.Model is built with the structure of top-down hierarchy classification tree, is met between hierarchy node
Dependence.Classification problem inside level, the structure that disaggregated model more than one or multiple two disaggregated models may be used are come
It realizes, based on considered below, the structure of multiple two disaggregated models may be used, classification is facilitated to extend, more people edit classification, single class
Mesh extension optimization, does not influence other classifications, for classification cross-cutting issue, a sample is divided into multiple classifications.Classification is drawn
When point not being universal class purpose situation, it should be noted that sample is labelled unjustifiably point outside field, but this structure, in the more situation of classification
Under, the workload of bigger, n subordinate's classification can be brought to classify compared to level more, it can more n-1 models.So according to practical industry
Situation of being engaged in and data cases are generally adopted by the structure that more disaggregated models and two disaggregated models combine.
Preferably, structured tag described in step S1 includes user property label, in short-term interest tags and long-term emerging
Interesting label.
The psychological activity complex of people, sometimes also in contradictory state, the behavior of such people will complexity win the title,
The behavioral data so generated is exactly complex and disperses, it would be desirable to arrive these data normalizations that are complicated and disperseing
The label system of target effective.
User property label includes such as address name, gender, height, in algorithm layer weight meeting height.
Interest tags include the data as browsed commodity bed in short-term, are paid close attention to never again after can having bought, in algorithm mould
Type layer weight can be according to time change rapid decay.
Long-term Interest label includes such as entertainment-cross-talk-Guo De guiding principle special shows.
Unstructured number of labels is huge, and dynamics dispersion is used as personalized labels.
Preferably, described in step S2 to data carry out pretreatment include user data collection, to the data of collection into
Row cleaning.
The user data includes user behavior data, such as navigation patterns;The structural data of generation, such as commodity library, clear
Look at web page library etc.;Knowledge data, such as bibliography system, data dictionary.Precision data in order to obtain carries out the data of collection clear
It washes, includes filtering and the anti-cheating and unstructured data etc. of invalid data, noise data.
Preferably, sample semi-supervised learning automatic marking described in step S3 is by marking sample on a small quantity, to a large amount of
The sample not marked is trained classification, and the higher sample of confidence level is added to training set.
Use Tri-training and CoForest, Tri-training that training data is divided into 3 parts in the present invention,
3 models of training, CoForest are used n grader, are ensured the difference between each grader using random forest.Tri-
Training and CoForest can introduce the convergent condition control noise point sample of error rate, and can be determined by multiple graders
Plan ballot carrys out the addition of less noisy samples.
Preferably, over-sampling described in step S4 is the classification performance for improving minority class by increasing positive sample, owe to adopt
Sample is to reject negative sample;The cost-sensitive is to increase the weight of positive sample, reduces the weight of negative sample, and integrated study is by negative sample
Originally it is divided into more parts, every part is trained with positive sample, obtains multiple models.
The simple positive sample that replicates just belongs to over-sampling, the disadvantage is that being easy to cause over-fitting, it is possible in positive sample
It is random that Gaussian noise is added or generates new synthesis sample, SMOTE algorithms can be used.The random rejecting negative sample of lack sampling operates
It is fairly simple, in actual classification, due to more than negative sample number, random lack sampling is such as carried out according to a certain percentage, it is real
The effect that border generates can be relatively good.Other method of samplings also have Tomek links, NearMiss, One-Sided Selection
Method etc. can be tested according to actual effect and be taken optimal.
Preferably, extracting characteristic procedure in step S5 carries out feature selecting, character subset, training pattern are found.
For the feature extraction of text word, feature, such as China or China etc. are used as in can extract.For other text classes
Extension feature obtains corresponding browsing network address as feature, or calculate text similarity, extends phase using browsing data
It is used as feature like.Specific area feature, the corresponding domain features of text key word, such as region, the type of merchandise, item property
Deng.For theme feature, using topic models such as TopicModel, LDA, it regard the distribution of corresponding topic parameter as its feature.
Commonly used feature selection approach has:
TF-IDF:TF word frequency(Term Frequency), the ability that describes document content for calculating the word.This number is pair
Word number(term count)Normalization, to prevent him to be biased to long file.IDF(Inverse Document Frequency)
Reverse document frequency, the ability for distinguishing document for calculating the word.The value of TF-IDF is bigger, illustrates differentiation of this feature to classification
Ability is stronger.
Chi-square Test:Chi-square value is bigger, illustrates that this word is two stronger words of class discrimination degree.
Mutual information:Mutual information is generally used for the correlation between two words of measurement, can be used for calculating in feature selecting special
Levy the discrimination to classification.
Information gain:Information gain is that the difference of front and back comentropy occur by calculating a certain feature, indicates this feature pair
The importance of classification.
In the case where training sample amount is bigger, the practical effect generated of the above several method is similar, and power is calculated
After weight, weight is ranked up, it generally as needed can be there are two types of selection mode:Select the maximum preceding K feature of weight or
The great feature in some threshold value of person's right to choose.
Preferably, being directed to poor fitting in step S7, the accuracy rate of training set and test set is low, carries out data cleansing, increases
Add validity feature, replaces complicated model.
Data cleansing is carried out for poor fitting, increases validity feature, the penalty coefficient of regular terms can also be reduced, model melts
Close ballot.Complicated model is replaced as linear model, changed nonlinear model into.
Preferably, being directed to over-fitting in step S7, the accuracy rate of training set is high, and the accuracy rate of test set is low, is increased
Add training sample data, replaces simple model.
For over-fitting treating method, the penalty coefficient of regular terms can also be improved, reduces repetitive exercise number.Replace letter
Nonlinear model is such as changed to linear model by single model.
A kind of system for establishing user's portrait based on big data, including user's portrait label system construction module, data are pre-
Processing module, sample automatic marking module, sample imbalance processing module, Feature Engineering module, model training module and
Model optimization module, user draw a portrait label system construction module for building user's portrait label system, data preprocessing module
For preprocessed data, sample automatic marking module is used for automatic marking sample, sample imbalance processing module for pair
Sample imbalance is handled, and Feature Engineering module, according to specific data type, is spy for extracting feature from sample
Sign classification, model training module are used for training pattern, and model optimization module is used for Optimized model.
Compared with prior art, the method for the invention for establishing user's portrait based on big data has following prominent beneficial
Effect:The scheme that the whole series provided by the method for establishing user's portrait based on big data establish accurate user portrait can be with
Quickly, efficiently, accurately structure user portrait, improve the accuracy of data label, Personalized Intelligent Recommendation built with this
System, precision marketing and accurate advertisement, make full use of user data, have good application value.
Specific implementation mode
Below in conjunction with embodiment, the method and system that user's portrait is established based on big data of the present invention are made further
It is described in detail.
Embodiment
The method for establishing user's portrait based on big data of the present invention, includes the following steps:
S1:Build user's portrait label system
User data normalizes to the label system of target effective, label is divided into structuring and unstructured.Structured tag
There is clear level association father and son's classification relation, label is regular, and tree or forest shape can be presented.It is subdivided into use according to actual conditions
Family attribute tags, in short-term interest tags and Long-term Interest label.User property label includes such as address name, gender, height,
It is high in the meeting of algorithm layer weight.Interest tags include the data as browsed commodity bed in short-term, are paid close attention to never again after can having bought
, algorithm model layer weight can be according to time change rapid decay.Long-term Interest label includes such as entertainment-cross-talk-Guo
Moral guiding principle special show.
Unstructured label does not have hierarchical relationship, label dispersion, substantial amounts, dynamics dispersion, as personalized labels.
S2:Data are pre-processed
Pretreatment is carried out to data to include user data collection, clean the data of collection.User data includes user's row
For data, such as navigation patterns;The structural data of generation, such as commodity library, browsing web page library;Knowledge data, as bibliography system,
Data dictionary etc..Precision data in order to obtain cleans the data of collection, includes the filtering of invalid data, noise data
And anti-cheating and unstructured data etc..
S3:Sample automatic marking
Sample marks heavy workload, especially under big data environment, and also has the shortcomings of such as subjective, randomness, the present invention
It is middle to use sample semi-supervised learning automatic marking for by marking sample on a small quantity, the sample not marked largely is trained point
The higher sample of confidence level is added to training set by class.
S4:The processing of user data sample imbalance
In the sample of sampling, it may appear that the case where sample imbalance, and the typically far smaller than negative sample of positive sample
This number.It is divided into two levels in the present invention to illustrate.
First, data Layer, can carry out sample over-sampling or lack sampling is handled.Over-sampling is exactly by increasing positive sample
To improve the classification performance of minority class, the simple positive sample that replicates just belongs to over-sampling, the disadvantage is that it is easy to cause over-fitting, institute
Gaussian noise can be added at random in positive sample or generate new synthesis sample, SMOTE algorithms can be used;Lack sampling processing
Simplest method is random rejecting negative sample, and such operational benefits are fairly simple, in certain actual classifications, due to negative
Reason more than number of samples such as carries out random lack sampling according to a certain percentage, and the effect actually generated can be relatively good.
Second is that algorithm layer, mainly there is cost-sensitive and the method for integrated study.It can be in loss letter(loss function)
The weight of middle adjustment penalty term, such as increases the weight of positive sample, reduces the weight of negative sample, and here it is code sensitivities.And it integrates
The method of study, for example negative sample is divided into more parts, every part is all trained with positive sample, obtains multiple models, final vote
Obtain result.
S5:Feature Engineering
Sample set structure is completed, and feature is extracted from sample, according to specific data type, does tagsort.For text word
Feature extraction, feature, such as China or China etc. are used as in can extract.For other text class extension features, using clear
It lookes at data, obtains corresponding browsing network address as feature, or calculate text similarity, extension similitude is as feature.It is specific
Domain features, text key word corresponding domain features, such as region, the type of merchandise, item property etc..For theme feature,
Using topic models such as TopicModel, LDA, it regard the distribution of corresponding topic parameter as its feature.
Extraction characteristic procedure also needs to carry out feature selecting, finds character subset, training pattern.Commonly used feature selecting
Method has:
TF-IDF:TF word frequency(Term Frequency), the ability that describes document content for calculating the word.This number is pair
Word number(term count)Normalization, to prevent him to be biased to long file.IDF(Inverse Document Frequency)
Reverse document frequency, the ability for distinguishing document for calculating the word.The value of TF-IDF is bigger, illustrates differentiation of this feature to classification
Ability is stronger.
Chi-square Test:Chi-square value is bigger, illustrates that this word is two stronger words of class discrimination degree.
Mutual information:Mutual information is generally used for the correlation between two words of measurement, can be used for calculating in feature selecting special
Levy the discrimination to classification.
Information gain:Information gain is that the difference of front and back comentropy occur by calculating a certain feature, indicates this feature pair
The importance of classification.
In the case where training sample amount is bigger, the practical effect generated of the above several method is similar, and power is calculated
After weight, weight is ranked up, it generally as needed can be there are two types of selection mode:Select the maximum preceding K feature of weight or
The great feature in some threshold value of person's right to choose.
S6:Model training
Model training process needs are trained by many models, then find optimization.Classified with user interest data
For, class mesh number is numerous, and has father and son's hierarchical relationship, using model carry out more classification can not meet demand, can make
With hierarchy model, in the assorting process, it is also necessary to consider the dependence and level between classification tree hierarchy node
Internal classification problem.Model is built with the structure of top-down hierarchy classification tree, meets the dependence between hierarchy node
Relationship.Classification problem inside level, may be used the structure of disaggregated model more than one or multiple two disaggregated models to realize,
Based on considered below, the structure of multiple two disaggregated models may be used, classification is facilitated to extend, more people edit classification, and single classification expands
Exhibition optimization, does not influence other classifications, for classification cross-cutting issue, a sample is divided into multiple classifications.The division of classification is not
When being universal class purpose situation, it should be noted that sample is labelled unjustifiably point outside field, but this structure, in the case where classification is more, meeting
Bring the workload of bigger, n subordinate's classification to classify compared to level more, it can more n-1 models.So according to practical business feelings
Condition and data cases are generally adopted by the structure that more disaggregated models and two disaggregated models combine.
S7:Model optimization
After being trained model, need to see that actual effect can meet anticipated demand, for it is undesirable need to carry out it is excellent
Change, analysis model is over-fitting or poor fitting, to targetedly optimize.
For poor fitting processing, the accuracy rate of training set and test set is relatively low, and training pattern is also without inherence well
Relationship, by optimizing to training sample, there is also some untreated clean noise samples in possible training sample, it is necessary to after
It is continuous to carry out data cleansing, increase validity feature, turns down the penalty coefficient of regular terms, replace relative complex model, such as line
Property model change nonlinear model into, Model Fusion ballot.
For over-fitting processing, the accuracy rate of training set is higher in training process, but test set accuracy rate is relatively low, passes through
Increase training sample data, improve the penalty coefficient of regular terms, reduces repetitive exercise number, replace relatively simple model, such as
Nonlinear model is replaced linear model.
Embodiment described above, the only present invention more preferably specific implementation mode, those skilled in the art is at this
The usual variations and alternatives carried out within the scope of inventive technique scheme should be all included within the scope of the present invention.
Claims (9)
1. a kind of method for establishing user's portrait based on big data, it is characterised in that:It the described method comprises the following steps:
S1:Build user's portrait label system
User data is normalized to the label system of target effective, label is divided into structuring and unstructured, structured tag
There is clear level association father and son's classification relation, label is regular, and unstructured label does not have hierarchical relationship, label dispersion;
S2:Data are pre-processed;
S3:Sample automatic marking
Using sample semi-supervised learning automatic marking;
S4:The processing of user data sample imbalance
For data Layer sample carry out over-sampling or lack sampling processing, for algorithm layer sample carry out cost-sensitive and
Integrated study processing;
S5:Feature Engineering
Sample set structure is completed, and feature is extracted from sample, according to specific data type, does tagsort;
S6:Model training
It is combined using more disaggregated models and two disaggregated models;
S7:Model optimization
Analysis model is over-fitting or poor fitting, and is optimized to model.
2. the method according to claim 1 for establishing user's portrait based on big data, it is characterised in that:Described in step S1
Structured tag includes user property label, in short-term interest tags and Long-term Interest label.
3. the method according to claim 1 or 2 for establishing user's portrait based on big data, it is characterised in that:In step S2
It is described to data carry out pretreatment include user data collection, the data of collection are cleaned.
4. the method according to claim 3 for establishing user's portrait based on big data, it is characterised in that:Described in step S3
Sample semi-supervised learning automatic marking is to be trained classification by marking sample on a small quantity to the sample not marked largely, will set
The higher sample of reliability is added to training set.
5. the method according to claim 4 for establishing user's portrait based on big data, it is characterised in that:Described in step S4
Over-sampling is the classification performance that minority class is improved by increasing positive sample, and lack sampling is to reject negative sample;The cost-sensitive is
The weight for increasing positive sample, reduces the weight of negative sample, negative sample is divided into more parts by integrated study, and every part is instructed with positive sample
Practice, obtains multiple models.
6. the method according to claim 5 for establishing user's portrait based on big data, it is characterised in that:It is extracted in step S5
Characteristic procedure carries out feature selecting, finds character subset, training pattern.
7. the method according to claim 6 for establishing user's portrait based on big data, it is characterised in that:It is directed in step S7
The accuracy rate of poor fitting, training set and test set is low, carries out data cleansing, increases validity feature, replaces complicated model.
8. the method according to claim 7 for establishing user's portrait based on big data, it is characterised in that:It is directed in step S7
The accuracy rate of over-fitting, training set is high, and the accuracy rate of test set is low, carries out increasing training sample data, replaces simple model.
9. a kind of system for establishing user's portrait based on big data, it is characterised in that:Including user's portrait label system construction mould
Block, data preprocessing module, sample automatic marking module, sample imbalance processing module, Feature Engineering module, model instruction
Practice module and model optimization module, user draws a portrait label system construction module for building user's portrait label system, and data are pre-
Processing module is used for preprocessed data, and sample automatic marking module is used for automatic marking sample, and sample imbalance handles mould
Block is for handling sample imbalance, and Feature Engineering module from sample for extracting feature, according to specific data
Type does tagsort, and model training module is used for training pattern, and model optimization module is used for Optimized model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810438144.3A CN108629633A (en) | 2018-05-09 | 2018-05-09 | A kind of method and system for establishing user's portrait based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810438144.3A CN108629633A (en) | 2018-05-09 | 2018-05-09 | A kind of method and system for establishing user's portrait based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108629633A true CN108629633A (en) | 2018-10-09 |
Family
ID=63692337
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810438144.3A Pending CN108629633A (en) | 2018-05-09 | 2018-05-09 | A kind of method and system for establishing user's portrait based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108629633A (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359137A (en) * | 2018-10-16 | 2019-02-19 | 大连理工大学 | Based on user's growth of Feature Selection and semi-supervised learning portrait construction method |
CN109636047A (en) * | 2018-12-17 | 2019-04-16 | 江苏满运软件科技有限公司 | User activity prediction model training method, system, equipment and storage medium |
CN109766000A (en) * | 2018-12-25 | 2019-05-17 | 重庆和贯科技有限公司 | A kind of wisdom education system and method based on virtual reality |
CN109785034A (en) * | 2018-11-13 | 2019-05-21 | 北京码牛科技有限公司 | User's portrait generation method, device, electronic equipment and computer-readable medium |
CN109919469A (en) * | 2019-02-27 | 2019-06-21 | 浪潮软件集团有限公司 | A kind of holography science data processing method |
CN110111143A (en) * | 2019-04-28 | 2019-08-09 | 上海二三四五移动科技有限公司 | A kind of control method and control device for establishing mobile end subscriber portrait |
CN110225009A (en) * | 2019-05-27 | 2019-09-10 | 四川大学 | It is a kind of that user's detection method is acted on behalf of based on communication behavior portrait |
CN110245296A (en) * | 2019-06-14 | 2019-09-17 | 浙江华坤道威数据科技有限公司 | A kind of PAS user's portrait analysis system and its method based on big data |
CN111429184A (en) * | 2020-03-27 | 2020-07-17 | 北京睿科伦智能科技有限公司 | User portrait extraction method based on text information |
WO2020147259A1 (en) * | 2019-01-16 | 2020-07-23 | 平安科技(深圳)有限公司 | User portait method and apparatus, readable storage medium, and terminal device |
CN111723257A (en) * | 2020-06-24 | 2020-09-29 | 山东建筑大学 | User portrait drawing method and system based on water usage law |
CN111753026A (en) * | 2020-06-28 | 2020-10-09 | 中国银行股份有限公司 | User portrait generation system, method, device, equipment and medium |
CN111932035A (en) * | 2020-09-22 | 2020-11-13 | 南京福佑在线电子商务有限公司 | Data processing method and device based on multiple models and classified modeling method |
CN111985553A (en) * | 2020-08-18 | 2020-11-24 | 北京云从科技有限公司 | Feature construction method and device, machine readable medium and equipment |
CN112001739A (en) * | 2019-05-27 | 2020-11-27 | 广东小天才科技有限公司 | Method and system for generating user learning portrait |
CN112667714A (en) * | 2021-03-17 | 2021-04-16 | 腾讯科技(深圳)有限公司 | User portrait optimization method and device based on deep learning and storage medium |
CN112883931A (en) * | 2021-03-29 | 2021-06-01 | 动者科技(杭州)有限责任公司 | Real-time true and false motion judgment method based on long and short term memory network |
CN112883930A (en) * | 2021-03-29 | 2021-06-01 | 动者科技(杭州)有限责任公司 | Real-time true and false motion judgment method based on full-connection network |
CN113032556A (en) * | 2019-12-25 | 2021-06-25 | 厦门铠甲网络股份有限公司 | Method for forming user portrait based on natural language processing |
CN113191825A (en) * | 2021-05-26 | 2021-07-30 | 上海悟景信息科技有限公司 | Client portrait model modeling method, system and equipment based on artificial intelligence |
WO2021179544A1 (en) * | 2020-03-12 | 2021-09-16 | 平安科技(深圳)有限公司 | Sample classification method and apparatus, computer device, and storage medium |
WO2021218336A1 (en) * | 2020-04-30 | 2021-11-04 | 深圳壹账通智能科技有限公司 | User information discrimination method and apparatus, and device and computer readable storage medium |
CN113763093A (en) * | 2020-11-12 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | User portrait-based item recommendation method and device |
CN114119058A (en) * | 2021-08-10 | 2022-03-01 | 国家电网有限公司 | User portrait model construction method and device and storage medium |
CN117252647A (en) * | 2023-09-21 | 2023-12-19 | 深圳市聚海通达科技有限公司 | Digitalized integrated marketing service system |
CN111461180B (en) * | 2020-03-12 | 2024-07-09 | 平安科技(深圳)有限公司 | Sample classification method, device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718938A (en) * | 2015-12-30 | 2016-06-29 | 深圳先进技术研究院 | Encoding method of user label and encoding apparatus of user label |
CN106503015A (en) * | 2015-09-07 | 2017-03-15 | 国家计算机网络与信息安全管理中心 | A kind of method for building user's portrait |
CN107633036A (en) * | 2017-09-08 | 2018-01-26 | 广州汪汪信息技术有限公司 | A kind of microblog users portrait method, electronic equipment, storage medium, system |
CN107688831A (en) * | 2017-09-04 | 2018-02-13 | 五邑大学 | A kind of unbalanced data sorting technique based on cluster down-sampling |
-
2018
- 2018-05-09 CN CN201810438144.3A patent/CN108629633A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106503015A (en) * | 2015-09-07 | 2017-03-15 | 国家计算机网络与信息安全管理中心 | A kind of method for building user's portrait |
CN105718938A (en) * | 2015-12-30 | 2016-06-29 | 深圳先进技术研究院 | Encoding method of user label and encoding apparatus of user label |
CN107688831A (en) * | 2017-09-04 | 2018-02-13 | 五邑大学 | A kind of unbalanced data sorting technique based on cluster down-sampling |
CN107633036A (en) * | 2017-09-08 | 2018-01-26 | 广州汪汪信息技术有限公司 | A kind of microblog users portrait method, electronic equipment, storage medium, system |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359137B (en) * | 2018-10-16 | 2021-03-26 | 大连理工大学 | User growth portrait construction method based on feature screening and semi-supervised learning |
CN109359137A (en) * | 2018-10-16 | 2019-02-19 | 大连理工大学 | Based on user's growth of Feature Selection and semi-supervised learning portrait construction method |
CN109785034A (en) * | 2018-11-13 | 2019-05-21 | 北京码牛科技有限公司 | User's portrait generation method, device, electronic equipment and computer-readable medium |
CN109636047A (en) * | 2018-12-17 | 2019-04-16 | 江苏满运软件科技有限公司 | User activity prediction model training method, system, equipment and storage medium |
CN109766000A (en) * | 2018-12-25 | 2019-05-17 | 重庆和贯科技有限公司 | A kind of wisdom education system and method based on virtual reality |
WO2020147259A1 (en) * | 2019-01-16 | 2020-07-23 | 平安科技(深圳)有限公司 | User portait method and apparatus, readable storage medium, and terminal device |
CN109919469A (en) * | 2019-02-27 | 2019-06-21 | 浪潮软件集团有限公司 | A kind of holography science data processing method |
CN110111143A (en) * | 2019-04-28 | 2019-08-09 | 上海二三四五移动科技有限公司 | A kind of control method and control device for establishing mobile end subscriber portrait |
CN110225009B (en) * | 2019-05-27 | 2020-06-05 | 四川大学 | Proxy user detection method based on communication behavior portrait |
CN110225009A (en) * | 2019-05-27 | 2019-09-10 | 四川大学 | It is a kind of that user's detection method is acted on behalf of based on communication behavior portrait |
CN112001739A (en) * | 2019-05-27 | 2020-11-27 | 广东小天才科技有限公司 | Method and system for generating user learning portrait |
CN110245296A (en) * | 2019-06-14 | 2019-09-17 | 浙江华坤道威数据科技有限公司 | A kind of PAS user's portrait analysis system and its method based on big data |
CN113032556A (en) * | 2019-12-25 | 2021-06-25 | 厦门铠甲网络股份有限公司 | Method for forming user portrait based on natural language processing |
CN111461180B (en) * | 2020-03-12 | 2024-07-09 | 平安科技(深圳)有限公司 | Sample classification method, device, computer equipment and storage medium |
WO2021179544A1 (en) * | 2020-03-12 | 2021-09-16 | 平安科技(深圳)有限公司 | Sample classification method and apparatus, computer device, and storage medium |
CN111429184A (en) * | 2020-03-27 | 2020-07-17 | 北京睿科伦智能科技有限公司 | User portrait extraction method based on text information |
WO2021218336A1 (en) * | 2020-04-30 | 2021-11-04 | 深圳壹账通智能科技有限公司 | User information discrimination method and apparatus, and device and computer readable storage medium |
CN111723257B (en) * | 2020-06-24 | 2023-05-02 | 山东建筑大学 | User portrayal method and system based on water usage rule |
CN111723257A (en) * | 2020-06-24 | 2020-09-29 | 山东建筑大学 | User portrait drawing method and system based on water usage law |
CN111753026B (en) * | 2020-06-28 | 2023-09-12 | 中国银行股份有限公司 | User portrait generation system, method, device, equipment and medium |
CN111753026A (en) * | 2020-06-28 | 2020-10-09 | 中国银行股份有限公司 | User portrait generation system, method, device, equipment and medium |
CN111985553A (en) * | 2020-08-18 | 2020-11-24 | 北京云从科技有限公司 | Feature construction method and device, machine readable medium and equipment |
CN111932035B (en) * | 2020-09-22 | 2021-01-08 | 南京福佑在线电子商务有限公司 | Data processing method and device based on multiple models and classified modeling method |
CN111932035A (en) * | 2020-09-22 | 2020-11-13 | 南京福佑在线电子商务有限公司 | Data processing method and device based on multiple models and classified modeling method |
CN113763093A (en) * | 2020-11-12 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | User portrait-based item recommendation method and device |
CN112667714A (en) * | 2021-03-17 | 2021-04-16 | 腾讯科技(深圳)有限公司 | User portrait optimization method and device based on deep learning and storage medium |
CN112883930A (en) * | 2021-03-29 | 2021-06-01 | 动者科技(杭州)有限责任公司 | Real-time true and false motion judgment method based on full-connection network |
CN112883931A (en) * | 2021-03-29 | 2021-06-01 | 动者科技(杭州)有限责任公司 | Real-time true and false motion judgment method based on long and short term memory network |
CN113191825A (en) * | 2021-05-26 | 2021-07-30 | 上海悟景信息科技有限公司 | Client portrait model modeling method, system and equipment based on artificial intelligence |
CN114119058A (en) * | 2021-08-10 | 2022-03-01 | 国家电网有限公司 | User portrait model construction method and device and storage medium |
CN114119058B (en) * | 2021-08-10 | 2023-09-26 | 国家电网有限公司 | User portrait model construction method, device and storage medium |
CN117252647A (en) * | 2023-09-21 | 2023-12-19 | 深圳市聚海通达科技有限公司 | Digitalized integrated marketing service system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108629633A (en) | A kind of method and system for establishing user's portrait based on big data | |
CN110633409B (en) | Automobile news event extraction method integrating rules and deep learning | |
CN103559262B (en) | Community-based author and scientific paper commending system thereof and recommend method | |
CN101794311B (en) | Fuzzy data mining based automatic classification method of Chinese web pages | |
CN102708096B (en) | Network intelligence public sentiment monitoring system based on semantics and work method thereof | |
CN110162706A (en) | A kind of personalized recommendation method and system based on interaction data cluster | |
CN102789498B (en) | Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning | |
CN105930411A (en) | Classifier training method, classifier and sentiment classification system | |
CN106250513A (en) | A kind of event personalization sorting technique based on event modeling and system | |
CN103914478A (en) | Webpage training method and system and webpage prediction method and system | |
CN101609450A (en) | Web page classification method based on training set | |
CN106326212A (en) | Method for analyzing implicit type discourse relation based on hierarchical depth semantics | |
CN103226578A (en) | Method for identifying websites and finely classifying web pages in medical field | |
CN103823824A (en) | Method and system for automatically constructing text classification corpus by aid of internet | |
CN108897778A (en) | A kind of image labeling method based on multi-source big data analysis | |
CN110134765A (en) | A kind of dining room user comment analysis system and method based on sentiment analysis | |
CN110442728A (en) | Sentiment dictionary construction method based on word2vec automobile product field | |
CN106815310A (en) | A kind of hierarchy clustering method and system to magnanimity document sets | |
CN105740382A (en) | Aspect classification method for short comment texts | |
CN111191099B (en) | User activity type identification method based on social media | |
CN103886020A (en) | Quick search method of real estate information | |
CN106777193A (en) | A kind of method for writing specific contribution automatically | |
TWI828928B (en) | Highly scalable, multi-label text classification methods and devices | |
CN108021715A (en) | Isomery tag fusion system based on semantic structure signature analysis | |
CN109063045A (en) | A kind of financial service method and financial service terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181009 |
|
RJ01 | Rejection of invention patent application after publication |