CN108629633A - A kind of method and system for establishing user's portrait based on big data - Google Patents

A kind of method and system for establishing user's portrait based on big data Download PDF

Info

Publication number
CN108629633A
CN108629633A CN201810438144.3A CN201810438144A CN108629633A CN 108629633 A CN108629633 A CN 108629633A CN 201810438144 A CN201810438144 A CN 201810438144A CN 108629633 A CN108629633 A CN 108629633A
Authority
CN
China
Prior art keywords
sample
data
user
portrait
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810438144.3A
Other languages
Chinese (zh)
Inventor
张铁舰
付安龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201810438144.3A priority Critical patent/CN108629633A/en
Publication of CN108629633A publication Critical patent/CN108629633A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0257User requested

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method and system for establishing user's portrait based on big data, belong to big data applied technical field.The method for establishing user's portrait based on big data of the present invention includes the following steps:S1:Build user's portrait label system.S2:Data are pre-processed.S3:Sample automatic marking.S4:The processing of user data sample imbalance.S5:Feature Engineering.S6:Model training.It is combined using more disaggregated models and two disaggregated models.S7:Model optimization.The method and system that user's portrait is established based on big data of the invention can improve user's portrait accuracy has good application value so as to build Personalized Intelligent Recommendation system, precision marketing and accurate advertisement.

Description

A kind of method and system for establishing user's portrait based on big data
Technical field
The present invention relates to big data applied technical fields, specifically provide a kind of method for establishing user's portrait based on big data And system.
Background technology
How effectively with the arrival in big data epoch, the user data of integration is more and more, and information content is increasing, profit With the data of accumulation, more accurate more valuable data information is obtained, and obtained valuable data information is passed through effective The exhibition method of labeling is presented, and then establishes accurate user's portrait, is present big data field problem encountered.Existing skill User's portrait is more the mode that manual intervention labels in art, and labor intensive is more, and it is main with the person of labelling to be labeled with label Preference gender gap is big, and label accuracy causes anxiety.
Invention content
The technical assignment of the present invention is that in view of the above problems, user's portrait accuracy can be improved by providing one kind, So as to build the side for establishing user's portrait based on big data of Personalized Intelligent Recommendation system, precision marketing and accurate advertisement Method.
The further technical assignment of the present invention is to provide a kind of system for establishing user's portrait based on big data.
To achieve the above object, the present invention provides following technical solutions:
A method of user's portrait is established based on big data, the described method comprises the following steps:
S1:Build user's portrait label system
User data is normalized to the label system of target effective, label is divided into structuring and unstructured, structured tag There is clear level association father and son's classification relation, label is regular, and unstructured label does not have hierarchical relationship, label dispersion;
S2:Data are pre-processed;
S3:Sample automatic marking
Using sample semi-supervised learning automatic marking;
S4:The processing of user data sample imbalance
For data Layer sample carry out over-sampling or lack sampling processing, for algorithm layer sample carry out cost-sensitive and Integrated study processing;
S5:Feature Engineering
Sample set structure is completed, and feature is extracted from sample, according to specific data type, does tagsort;
S6:Model training
It is combined using more disaggregated models and two disaggregated models;
S7:Model optimization
Analysis model is over-fitting or poor fitting, and is optimized to model.
User's portrait is exactly the user model of the labeling gone out according to the data abstraction of user, i.e. user data label Change, that is, refines data and tagged.After data prediction, sample is extracted, to user's portrait tag modeling, knot Close machine learning, deep learning include the methods of deeply study, transfer learning, natural language processing continue to optimize as a result, Improve user's portrait accuracy.
Model training process needs are trained by many models, then find optimization.With user interest data For classification, class mesh number is numerous, and has father and son's hierarchical relationship, using model carry out more classification can not meet demand, can To use hierarchy model, in the assorting process, it is also necessary to consider the dependence between classification tree hierarchy node, and Classification problem inside level.Model is built with the structure of top-down hierarchy classification tree, is met between hierarchy node Dependence.Classification problem inside level, the structure that disaggregated model more than one or multiple two disaggregated models may be used are come It realizes, based on considered below, the structure of multiple two disaggregated models may be used, classification is facilitated to extend, more people edit classification, single class Mesh extension optimization, does not influence other classifications, for classification cross-cutting issue, a sample is divided into multiple classifications.Classification is drawn When point not being universal class purpose situation, it should be noted that sample is labelled unjustifiably point outside field, but this structure, in the more situation of classification Under, the workload of bigger, n subordinate's classification can be brought to classify compared to level more, it can more n-1 models.So according to practical industry Situation of being engaged in and data cases are generally adopted by the structure that more disaggregated models and two disaggregated models combine.
Preferably, structured tag described in step S1 includes user property label, in short-term interest tags and long-term emerging Interesting label.
The psychological activity complex of people, sometimes also in contradictory state, the behavior of such people will complexity win the title, The behavioral data so generated is exactly complex and disperses, it would be desirable to arrive these data normalizations that are complicated and disperseing The label system of target effective.
User property label includes such as address name, gender, height, in algorithm layer weight meeting height.
Interest tags include the data as browsed commodity bed in short-term, are paid close attention to never again after can having bought, in algorithm mould Type layer weight can be according to time change rapid decay.
Long-term Interest label includes such as entertainment-cross-talk-Guo De guiding principle special shows.
Unstructured number of labels is huge, and dynamics dispersion is used as personalized labels.
Preferably, described in step S2 to data carry out pretreatment include user data collection, to the data of collection into Row cleaning.
The user data includes user behavior data, such as navigation patterns;The structural data of generation, such as commodity library, clear Look at web page library etc.;Knowledge data, such as bibliography system, data dictionary.Precision data in order to obtain carries out the data of collection clear It washes, includes filtering and the anti-cheating and unstructured data etc. of invalid data, noise data.
Preferably, sample semi-supervised learning automatic marking described in step S3 is by marking sample on a small quantity, to a large amount of The sample not marked is trained classification, and the higher sample of confidence level is added to training set.
Use Tri-training and CoForest, Tri-training that training data is divided into 3 parts in the present invention, 3 models of training, CoForest are used n grader, are ensured the difference between each grader using random forest.Tri- Training and CoForest can introduce the convergent condition control noise point sample of error rate, and can be determined by multiple graders Plan ballot carrys out the addition of less noisy samples.
Preferably, over-sampling described in step S4 is the classification performance for improving minority class by increasing positive sample, owe to adopt Sample is to reject negative sample;The cost-sensitive is to increase the weight of positive sample, reduces the weight of negative sample, and integrated study is by negative sample Originally it is divided into more parts, every part is trained with positive sample, obtains multiple models.
The simple positive sample that replicates just belongs to over-sampling, the disadvantage is that being easy to cause over-fitting, it is possible in positive sample It is random that Gaussian noise is added or generates new synthesis sample, SMOTE algorithms can be used.The random rejecting negative sample of lack sampling operates It is fairly simple, in actual classification, due to more than negative sample number, random lack sampling is such as carried out according to a certain percentage, it is real The effect that border generates can be relatively good.Other method of samplings also have Tomek links, NearMiss, One-Sided Selection Method etc. can be tested according to actual effect and be taken optimal.
Preferably, extracting characteristic procedure in step S5 carries out feature selecting, character subset, training pattern are found.
For the feature extraction of text word, feature, such as China or China etc. are used as in can extract.For other text classes Extension feature obtains corresponding browsing network address as feature, or calculate text similarity, extends phase using browsing data It is used as feature like.Specific area feature, the corresponding domain features of text key word, such as region, the type of merchandise, item property Deng.For theme feature, using topic models such as TopicModel, LDA, it regard the distribution of corresponding topic parameter as its feature.
Commonly used feature selection approach has:
TF-IDF:TF word frequency(Term Frequency), the ability that describes document content for calculating the word.This number is pair Word number(term count)Normalization, to prevent him to be biased to long file.IDF(Inverse Document Frequency) Reverse document frequency, the ability for distinguishing document for calculating the word.The value of TF-IDF is bigger, illustrates differentiation of this feature to classification Ability is stronger.
Chi-square Test:Chi-square value is bigger, illustrates that this word is two stronger words of class discrimination degree.
Mutual information:Mutual information is generally used for the correlation between two words of measurement, can be used for calculating in feature selecting special Levy the discrimination to classification.
Information gain:Information gain is that the difference of front and back comentropy occur by calculating a certain feature, indicates this feature pair The importance of classification.
In the case where training sample amount is bigger, the practical effect generated of the above several method is similar, and power is calculated After weight, weight is ranked up, it generally as needed can be there are two types of selection mode:Select the maximum preceding K feature of weight or The great feature in some threshold value of person's right to choose.
Preferably, being directed to poor fitting in step S7, the accuracy rate of training set and test set is low, carries out data cleansing, increases Add validity feature, replaces complicated model.
Data cleansing is carried out for poor fitting, increases validity feature, the penalty coefficient of regular terms can also be reduced, model melts Close ballot.Complicated model is replaced as linear model, changed nonlinear model into.
Preferably, being directed to over-fitting in step S7, the accuracy rate of training set is high, and the accuracy rate of test set is low, is increased Add training sample data, replaces simple model.
For over-fitting treating method, the penalty coefficient of regular terms can also be improved, reduces repetitive exercise number.Replace letter Nonlinear model is such as changed to linear model by single model.
A kind of system for establishing user's portrait based on big data, including user's portrait label system construction module, data are pre- Processing module, sample automatic marking module, sample imbalance processing module, Feature Engineering module, model training module and Model optimization module, user draw a portrait label system construction module for building user's portrait label system, data preprocessing module For preprocessed data, sample automatic marking module is used for automatic marking sample, sample imbalance processing module for pair Sample imbalance is handled, and Feature Engineering module, according to specific data type, is spy for extracting feature from sample Sign classification, model training module are used for training pattern, and model optimization module is used for Optimized model.
Compared with prior art, the method for the invention for establishing user's portrait based on big data has following prominent beneficial Effect:The scheme that the whole series provided by the method for establishing user's portrait based on big data establish accurate user portrait can be with Quickly, efficiently, accurately structure user portrait, improve the accuracy of data label, Personalized Intelligent Recommendation built with this System, precision marketing and accurate advertisement, make full use of user data, have good application value.
Specific implementation mode
Below in conjunction with embodiment, the method and system that user's portrait is established based on big data of the present invention are made further It is described in detail.
Embodiment
The method for establishing user's portrait based on big data of the present invention, includes the following steps:
S1:Build user's portrait label system
User data normalizes to the label system of target effective, label is divided into structuring and unstructured.Structured tag There is clear level association father and son's classification relation, label is regular, and tree or forest shape can be presented.It is subdivided into use according to actual conditions Family attribute tags, in short-term interest tags and Long-term Interest label.User property label includes such as address name, gender, height, It is high in the meeting of algorithm layer weight.Interest tags include the data as browsed commodity bed in short-term, are paid close attention to never again after can having bought , algorithm model layer weight can be according to time change rapid decay.Long-term Interest label includes such as entertainment-cross-talk-Guo Moral guiding principle special show.
Unstructured label does not have hierarchical relationship, label dispersion, substantial amounts, dynamics dispersion, as personalized labels.
S2:Data are pre-processed
Pretreatment is carried out to data to include user data collection, clean the data of collection.User data includes user's row For data, such as navigation patterns;The structural data of generation, such as commodity library, browsing web page library;Knowledge data, as bibliography system, Data dictionary etc..Precision data in order to obtain cleans the data of collection, includes the filtering of invalid data, noise data And anti-cheating and unstructured data etc..
S3:Sample automatic marking
Sample marks heavy workload, especially under big data environment, and also has the shortcomings of such as subjective, randomness, the present invention It is middle to use sample semi-supervised learning automatic marking for by marking sample on a small quantity, the sample not marked largely is trained point The higher sample of confidence level is added to training set by class.
S4:The processing of user data sample imbalance
In the sample of sampling, it may appear that the case where sample imbalance, and the typically far smaller than negative sample of positive sample This number.It is divided into two levels in the present invention to illustrate.
First, data Layer, can carry out sample over-sampling or lack sampling is handled.Over-sampling is exactly by increasing positive sample To improve the classification performance of minority class, the simple positive sample that replicates just belongs to over-sampling, the disadvantage is that it is easy to cause over-fitting, institute Gaussian noise can be added at random in positive sample or generate new synthesis sample, SMOTE algorithms can be used;Lack sampling processing Simplest method is random rejecting negative sample, and such operational benefits are fairly simple, in certain actual classifications, due to negative Reason more than number of samples such as carries out random lack sampling according to a certain percentage, and the effect actually generated can be relatively good.
Second is that algorithm layer, mainly there is cost-sensitive and the method for integrated study.It can be in loss letter(loss function) The weight of middle adjustment penalty term, such as increases the weight of positive sample, reduces the weight of negative sample, and here it is code sensitivities.And it integrates The method of study, for example negative sample is divided into more parts, every part is all trained with positive sample, obtains multiple models, final vote Obtain result.
S5:Feature Engineering
Sample set structure is completed, and feature is extracted from sample, according to specific data type, does tagsort.For text word Feature extraction, feature, such as China or China etc. are used as in can extract.For other text class extension features, using clear It lookes at data, obtains corresponding browsing network address as feature, or calculate text similarity, extension similitude is as feature.It is specific Domain features, text key word corresponding domain features, such as region, the type of merchandise, item property etc..For theme feature, Using topic models such as TopicModel, LDA, it regard the distribution of corresponding topic parameter as its feature.
Extraction characteristic procedure also needs to carry out feature selecting, finds character subset, training pattern.Commonly used feature selecting Method has:
TF-IDF:TF word frequency(Term Frequency), the ability that describes document content for calculating the word.This number is pair Word number(term count)Normalization, to prevent him to be biased to long file.IDF(Inverse Document Frequency) Reverse document frequency, the ability for distinguishing document for calculating the word.The value of TF-IDF is bigger, illustrates differentiation of this feature to classification Ability is stronger.
Chi-square Test:Chi-square value is bigger, illustrates that this word is two stronger words of class discrimination degree.
Mutual information:Mutual information is generally used for the correlation between two words of measurement, can be used for calculating in feature selecting special Levy the discrimination to classification.
Information gain:Information gain is that the difference of front and back comentropy occur by calculating a certain feature, indicates this feature pair The importance of classification.
In the case where training sample amount is bigger, the practical effect generated of the above several method is similar, and power is calculated After weight, weight is ranked up, it generally as needed can be there are two types of selection mode:Select the maximum preceding K feature of weight or The great feature in some threshold value of person's right to choose.
S6:Model training
Model training process needs are trained by many models, then find optimization.Classified with user interest data For, class mesh number is numerous, and has father and son's hierarchical relationship, using model carry out more classification can not meet demand, can make With hierarchy model, in the assorting process, it is also necessary to consider the dependence and level between classification tree hierarchy node Internal classification problem.Model is built with the structure of top-down hierarchy classification tree, meets the dependence between hierarchy node Relationship.Classification problem inside level, may be used the structure of disaggregated model more than one or multiple two disaggregated models to realize, Based on considered below, the structure of multiple two disaggregated models may be used, classification is facilitated to extend, more people edit classification, and single classification expands Exhibition optimization, does not influence other classifications, for classification cross-cutting issue, a sample is divided into multiple classifications.The division of classification is not When being universal class purpose situation, it should be noted that sample is labelled unjustifiably point outside field, but this structure, in the case where classification is more, meeting Bring the workload of bigger, n subordinate's classification to classify compared to level more, it can more n-1 models.So according to practical business feelings Condition and data cases are generally adopted by the structure that more disaggregated models and two disaggregated models combine.
S7:Model optimization
After being trained model, need to see that actual effect can meet anticipated demand, for it is undesirable need to carry out it is excellent Change, analysis model is over-fitting or poor fitting, to targetedly optimize.
For poor fitting processing, the accuracy rate of training set and test set is relatively low, and training pattern is also without inherence well Relationship, by optimizing to training sample, there is also some untreated clean noise samples in possible training sample, it is necessary to after It is continuous to carry out data cleansing, increase validity feature, turns down the penalty coefficient of regular terms, replace relative complex model, such as line Property model change nonlinear model into, Model Fusion ballot.
For over-fitting processing, the accuracy rate of training set is higher in training process, but test set accuracy rate is relatively low, passes through Increase training sample data, improve the penalty coefficient of regular terms, reduces repetitive exercise number, replace relatively simple model, such as Nonlinear model is replaced linear model.
Embodiment described above, the only present invention more preferably specific implementation mode, those skilled in the art is at this The usual variations and alternatives carried out within the scope of inventive technique scheme should be all included within the scope of the present invention.

Claims (9)

1. a kind of method for establishing user's portrait based on big data, it is characterised in that:It the described method comprises the following steps:
S1:Build user's portrait label system
User data is normalized to the label system of target effective, label is divided into structuring and unstructured, structured tag There is clear level association father and son's classification relation, label is regular, and unstructured label does not have hierarchical relationship, label dispersion;
S2:Data are pre-processed;
S3:Sample automatic marking
Using sample semi-supervised learning automatic marking;
S4:The processing of user data sample imbalance
For data Layer sample carry out over-sampling or lack sampling processing, for algorithm layer sample carry out cost-sensitive and Integrated study processing;
S5:Feature Engineering
Sample set structure is completed, and feature is extracted from sample, according to specific data type, does tagsort;
S6:Model training
It is combined using more disaggregated models and two disaggregated models;
S7:Model optimization
Analysis model is over-fitting or poor fitting, and is optimized to model.
2. the method according to claim 1 for establishing user's portrait based on big data, it is characterised in that:Described in step S1 Structured tag includes user property label, in short-term interest tags and Long-term Interest label.
3. the method according to claim 1 or 2 for establishing user's portrait based on big data, it is characterised in that:In step S2 It is described to data carry out pretreatment include user data collection, the data of collection are cleaned.
4. the method according to claim 3 for establishing user's portrait based on big data, it is characterised in that:Described in step S3 Sample semi-supervised learning automatic marking is to be trained classification by marking sample on a small quantity to the sample not marked largely, will set The higher sample of reliability is added to training set.
5. the method according to claim 4 for establishing user's portrait based on big data, it is characterised in that:Described in step S4 Over-sampling is the classification performance that minority class is improved by increasing positive sample, and lack sampling is to reject negative sample;The cost-sensitive is The weight for increasing positive sample, reduces the weight of negative sample, negative sample is divided into more parts by integrated study, and every part is instructed with positive sample Practice, obtains multiple models.
6. the method according to claim 5 for establishing user's portrait based on big data, it is characterised in that:It is extracted in step S5 Characteristic procedure carries out feature selecting, finds character subset, training pattern.
7. the method according to claim 6 for establishing user's portrait based on big data, it is characterised in that:It is directed in step S7 The accuracy rate of poor fitting, training set and test set is low, carries out data cleansing, increases validity feature, replaces complicated model.
8. the method according to claim 7 for establishing user's portrait based on big data, it is characterised in that:It is directed in step S7 The accuracy rate of over-fitting, training set is high, and the accuracy rate of test set is low, carries out increasing training sample data, replaces simple model.
9. a kind of system for establishing user's portrait based on big data, it is characterised in that:Including user's portrait label system construction mould Block, data preprocessing module, sample automatic marking module, sample imbalance processing module, Feature Engineering module, model instruction Practice module and model optimization module, user draws a portrait label system construction module for building user's portrait label system, and data are pre- Processing module is used for preprocessed data, and sample automatic marking module is used for automatic marking sample, and sample imbalance handles mould Block is for handling sample imbalance, and Feature Engineering module from sample for extracting feature, according to specific data Type does tagsort, and model training module is used for training pattern, and model optimization module is used for Optimized model.
CN201810438144.3A 2018-05-09 2018-05-09 A kind of method and system for establishing user's portrait based on big data Pending CN108629633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810438144.3A CN108629633A (en) 2018-05-09 2018-05-09 A kind of method and system for establishing user's portrait based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810438144.3A CN108629633A (en) 2018-05-09 2018-05-09 A kind of method and system for establishing user's portrait based on big data

Publications (1)

Publication Number Publication Date
CN108629633A true CN108629633A (en) 2018-10-09

Family

ID=63692337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810438144.3A Pending CN108629633A (en) 2018-05-09 2018-05-09 A kind of method and system for establishing user's portrait based on big data

Country Status (1)

Country Link
CN (1) CN108629633A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359137A (en) * 2018-10-16 2019-02-19 大连理工大学 Based on user's growth of Feature Selection and semi-supervised learning portrait construction method
CN109636047A (en) * 2018-12-17 2019-04-16 江苏满运软件科技有限公司 User activity prediction model training method, system, equipment and storage medium
CN109766000A (en) * 2018-12-25 2019-05-17 重庆和贯科技有限公司 A kind of wisdom education system and method based on virtual reality
CN109785034A (en) * 2018-11-13 2019-05-21 北京码牛科技有限公司 User's portrait generation method, device, electronic equipment and computer-readable medium
CN109919469A (en) * 2019-02-27 2019-06-21 浪潮软件集团有限公司 A kind of holography science data processing method
CN110111143A (en) * 2019-04-28 2019-08-09 上海二三四五移动科技有限公司 A kind of control method and control device for establishing mobile end subscriber portrait
CN110225009A (en) * 2019-05-27 2019-09-10 四川大学 It is a kind of that user's detection method is acted on behalf of based on communication behavior portrait
CN110245296A (en) * 2019-06-14 2019-09-17 浙江华坤道威数据科技有限公司 A kind of PAS user's portrait analysis system and its method based on big data
CN111429184A (en) * 2020-03-27 2020-07-17 北京睿科伦智能科技有限公司 User portrait extraction method based on text information
WO2020147259A1 (en) * 2019-01-16 2020-07-23 平安科技(深圳)有限公司 User portait method and apparatus, readable storage medium, and terminal device
CN111723257A (en) * 2020-06-24 2020-09-29 山东建筑大学 User portrait drawing method and system based on water usage law
CN111753026A (en) * 2020-06-28 2020-10-09 中国银行股份有限公司 User portrait generation system, method, device, equipment and medium
CN111932035A (en) * 2020-09-22 2020-11-13 南京福佑在线电子商务有限公司 Data processing method and device based on multiple models and classified modeling method
CN111985553A (en) * 2020-08-18 2020-11-24 北京云从科技有限公司 Feature construction method and device, machine readable medium and equipment
CN112001739A (en) * 2019-05-27 2020-11-27 广东小天才科技有限公司 Method and system for generating user learning portrait
CN112667714A (en) * 2021-03-17 2021-04-16 腾讯科技(深圳)有限公司 User portrait optimization method and device based on deep learning and storage medium
CN112883931A (en) * 2021-03-29 2021-06-01 动者科技(杭州)有限责任公司 Real-time true and false motion judgment method based on long and short term memory network
CN112883930A (en) * 2021-03-29 2021-06-01 动者科技(杭州)有限责任公司 Real-time true and false motion judgment method based on full-connection network
CN113032556A (en) * 2019-12-25 2021-06-25 厦门铠甲网络股份有限公司 Method for forming user portrait based on natural language processing
CN113191825A (en) * 2021-05-26 2021-07-30 上海悟景信息科技有限公司 Client portrait model modeling method, system and equipment based on artificial intelligence
WO2021179544A1 (en) * 2020-03-12 2021-09-16 平安科技(深圳)有限公司 Sample classification method and apparatus, computer device, and storage medium
WO2021218336A1 (en) * 2020-04-30 2021-11-04 深圳壹账通智能科技有限公司 User information discrimination method and apparatus, and device and computer readable storage medium
CN113763093A (en) * 2020-11-12 2021-12-07 北京沃东天骏信息技术有限公司 User portrait-based item recommendation method and device
CN114119058A (en) * 2021-08-10 2022-03-01 国家电网有限公司 User portrait model construction method and device and storage medium
CN117252647A (en) * 2023-09-21 2023-12-19 深圳市聚海通达科技有限公司 Digitalized integrated marketing service system
CN111461180B (en) * 2020-03-12 2024-07-09 平安科技(深圳)有限公司 Sample classification method, device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718938A (en) * 2015-12-30 2016-06-29 深圳先进技术研究院 Encoding method of user label and encoding apparatus of user label
CN106503015A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of method for building user's portrait
CN107633036A (en) * 2017-09-08 2018-01-26 广州汪汪信息技术有限公司 A kind of microblog users portrait method, electronic equipment, storage medium, system
CN107688831A (en) * 2017-09-04 2018-02-13 五邑大学 A kind of unbalanced data sorting technique based on cluster down-sampling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503015A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of method for building user's portrait
CN105718938A (en) * 2015-12-30 2016-06-29 深圳先进技术研究院 Encoding method of user label and encoding apparatus of user label
CN107688831A (en) * 2017-09-04 2018-02-13 五邑大学 A kind of unbalanced data sorting technique based on cluster down-sampling
CN107633036A (en) * 2017-09-08 2018-01-26 广州汪汪信息技术有限公司 A kind of microblog users portrait method, electronic equipment, storage medium, system

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359137B (en) * 2018-10-16 2021-03-26 大连理工大学 User growth portrait construction method based on feature screening and semi-supervised learning
CN109359137A (en) * 2018-10-16 2019-02-19 大连理工大学 Based on user's growth of Feature Selection and semi-supervised learning portrait construction method
CN109785034A (en) * 2018-11-13 2019-05-21 北京码牛科技有限公司 User's portrait generation method, device, electronic equipment and computer-readable medium
CN109636047A (en) * 2018-12-17 2019-04-16 江苏满运软件科技有限公司 User activity prediction model training method, system, equipment and storage medium
CN109766000A (en) * 2018-12-25 2019-05-17 重庆和贯科技有限公司 A kind of wisdom education system and method based on virtual reality
WO2020147259A1 (en) * 2019-01-16 2020-07-23 平安科技(深圳)有限公司 User portait method and apparatus, readable storage medium, and terminal device
CN109919469A (en) * 2019-02-27 2019-06-21 浪潮软件集团有限公司 A kind of holography science data processing method
CN110111143A (en) * 2019-04-28 2019-08-09 上海二三四五移动科技有限公司 A kind of control method and control device for establishing mobile end subscriber portrait
CN110225009B (en) * 2019-05-27 2020-06-05 四川大学 Proxy user detection method based on communication behavior portrait
CN110225009A (en) * 2019-05-27 2019-09-10 四川大学 It is a kind of that user's detection method is acted on behalf of based on communication behavior portrait
CN112001739A (en) * 2019-05-27 2020-11-27 广东小天才科技有限公司 Method and system for generating user learning portrait
CN110245296A (en) * 2019-06-14 2019-09-17 浙江华坤道威数据科技有限公司 A kind of PAS user's portrait analysis system and its method based on big data
CN113032556A (en) * 2019-12-25 2021-06-25 厦门铠甲网络股份有限公司 Method for forming user portrait based on natural language processing
CN111461180B (en) * 2020-03-12 2024-07-09 平安科技(深圳)有限公司 Sample classification method, device, computer equipment and storage medium
WO2021179544A1 (en) * 2020-03-12 2021-09-16 平安科技(深圳)有限公司 Sample classification method and apparatus, computer device, and storage medium
CN111429184A (en) * 2020-03-27 2020-07-17 北京睿科伦智能科技有限公司 User portrait extraction method based on text information
WO2021218336A1 (en) * 2020-04-30 2021-11-04 深圳壹账通智能科技有限公司 User information discrimination method and apparatus, and device and computer readable storage medium
CN111723257B (en) * 2020-06-24 2023-05-02 山东建筑大学 User portrayal method and system based on water usage rule
CN111723257A (en) * 2020-06-24 2020-09-29 山东建筑大学 User portrait drawing method and system based on water usage law
CN111753026B (en) * 2020-06-28 2023-09-12 中国银行股份有限公司 User portrait generation system, method, device, equipment and medium
CN111753026A (en) * 2020-06-28 2020-10-09 中国银行股份有限公司 User portrait generation system, method, device, equipment and medium
CN111985553A (en) * 2020-08-18 2020-11-24 北京云从科技有限公司 Feature construction method and device, machine readable medium and equipment
CN111932035B (en) * 2020-09-22 2021-01-08 南京福佑在线电子商务有限公司 Data processing method and device based on multiple models and classified modeling method
CN111932035A (en) * 2020-09-22 2020-11-13 南京福佑在线电子商务有限公司 Data processing method and device based on multiple models and classified modeling method
CN113763093A (en) * 2020-11-12 2021-12-07 北京沃东天骏信息技术有限公司 User portrait-based item recommendation method and device
CN112667714A (en) * 2021-03-17 2021-04-16 腾讯科技(深圳)有限公司 User portrait optimization method and device based on deep learning and storage medium
CN112883930A (en) * 2021-03-29 2021-06-01 动者科技(杭州)有限责任公司 Real-time true and false motion judgment method based on full-connection network
CN112883931A (en) * 2021-03-29 2021-06-01 动者科技(杭州)有限责任公司 Real-time true and false motion judgment method based on long and short term memory network
CN113191825A (en) * 2021-05-26 2021-07-30 上海悟景信息科技有限公司 Client portrait model modeling method, system and equipment based on artificial intelligence
CN114119058A (en) * 2021-08-10 2022-03-01 国家电网有限公司 User portrait model construction method and device and storage medium
CN114119058B (en) * 2021-08-10 2023-09-26 国家电网有限公司 User portrait model construction method, device and storage medium
CN117252647A (en) * 2023-09-21 2023-12-19 深圳市聚海通达科技有限公司 Digitalized integrated marketing service system

Similar Documents

Publication Publication Date Title
CN108629633A (en) A kind of method and system for establishing user's portrait based on big data
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN103559262B (en) Community-based author and scientific paper commending system thereof and recommend method
CN101794311B (en) Fuzzy data mining based automatic classification method of Chinese web pages
CN102708096B (en) Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN110162706A (en) A kind of personalized recommendation method and system based on interaction data cluster
CN102789498B (en) Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning
CN105930411A (en) Classifier training method, classifier and sentiment classification system
CN106250513A (en) A kind of event personalization sorting technique based on event modeling and system
CN103914478A (en) Webpage training method and system and webpage prediction method and system
CN101609450A (en) Web page classification method based on training set
CN106326212A (en) Method for analyzing implicit type discourse relation based on hierarchical depth semantics
CN103226578A (en) Method for identifying websites and finely classifying web pages in medical field
CN103823824A (en) Method and system for automatically constructing text classification corpus by aid of internet
CN108897778A (en) A kind of image labeling method based on multi-source big data analysis
CN110134765A (en) A kind of dining room user comment analysis system and method based on sentiment analysis
CN110442728A (en) Sentiment dictionary construction method based on word2vec automobile product field
CN106815310A (en) A kind of hierarchy clustering method and system to magnanimity document sets
CN105740382A (en) Aspect classification method for short comment texts
CN111191099B (en) User activity type identification method based on social media
CN103886020A (en) Quick search method of real estate information
CN106777193A (en) A kind of method for writing specific contribution automatically
TWI828928B (en) Highly scalable, multi-label text classification methods and devices
CN108021715A (en) Isomery tag fusion system based on semantic structure signature analysis
CN109063045A (en) A kind of financial service method and financial service terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181009

RJ01 Rejection of invention patent application after publication