CN108256548A - A kind of user's portrait depicting method and system based on Emoji service conditions - Google Patents
A kind of user's portrait depicting method and system based on Emoji service conditions Download PDFInfo
- Publication number
- CN108256548A CN108256548A CN201711261393.1A CN201711261393A CN108256548A CN 108256548 A CN108256548 A CN 108256548A CN 201711261393 A CN201711261393 A CN 201711261393A CN 108256548 A CN108256548 A CN 108256548A
- Authority
- CN
- China
- Prior art keywords
- emoji
- user
- portrait
- model
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Abstract
The step of present invention offer a kind of user's portrait depicting method and system based on Emoji service conditions, this method, includes:The text data for obtaining the portrait information of a collection of user and inputting;The initial data of Emoji is extracted from the text data using regular expression;The Emoji that the user is obtained according to the initial data of the Emoji uses feature;The user is divided into training set and test set;Emoji by the use of the user in the training set uses feature as independent variable, and portrait information is as dependent variable, training pattern;The model trained is applied on the test set, the best model of evaluation index is selected and portrays model as final user's portrait.Emoji present invention utilizes user version portrays user's portrait, and the sensitive content inputted in text without analyzing user can protect privacy of user.
Description
Technical field
The invention belongs to software technology field, specifically a kind of user's portrait depicting method based on Emoji service conditions
And system.
Background technology
In the big data epoch, user's portrait technology of portraying refers to enterprise by analyzing user data, so as to be sketched the contours for user
Go out the process of a labeling model.The core work for portraying user's portrait is that user sticks " label ", and rational structure goes out use
The virtual representations at family." label " can be various information such as gender, age, the religious belief of user.User's portrait is being looked forward to
Tool has been widely used in industry.On the one hand, user's portrait contributes to enterprise to establish data assets, and mining data is worth, while can
To carry out data trade, promote data circulation.On the other hand, user's portrait can help enterprise's progress market to see clearly, estimate city
Field scale so as to comprehensive understanding user demand, realizes accurate positionin and the precision marketing of product.For example, online advertisement platform
User can be utilized to draw a portrait and recommend the advertisement for best suiting user preference to user, so as to maximize ad click rate, expand advertisement
Income.
Currently a popular user's portrait portrays that there are many technology, such as the text message that analysis user inputs in the application, point
The photo that user uploads is analysed, analyzes the network pet name of user, analysis user mode of interaction etc. in webpage and application.Its
In, most of enterprise is still portrayed by analyzing content of text input by user to carry out user's portrait.Enterprise uses nature language
For the technology of speech processing, information retrieval etc. to analyze text input by user, extracting has the text of information content and taste spy
Sign.Finally, by the use of the user of known personal considerations and their text feature as the dependent variable and independent variable of model, machine is used
Device learns, the method for deep learning exercises supervision or semi-supervised learning, finally obtains the model for portraying user's portrait.For new
User, enterprise need to only extract corresponding text feature, and in the user's portrait model being put into, model can export prediction
Obtained user property.
But the text analyzing of this mainstream portray user portrait technology there are some drawbacks.First, these technologies are very
The privacy of user is destroyed in big degree, because often comprising address, mailbox, telephone number, wealth in text input by user
The sensitive contents such as information of being engaged in.Second, these technologies are not pervasive to language.Mainstream portrays user's portrait using text analyzing
Method is all based on greatly English text to implement.In recent years, the work of natural language processing field finds that these are based on English text
Originally the technology found out is implemented into relatively difficult on other language, and carving effect is poor.For example, English is cut using space
Divide word, therefore can be easily using technologies such as Bag of Words, the Unigram in natural language processing.But day
The language such as language, Chinese are simultaneously segmented without using space, therefore the process for extracting text feature is numerous and diverse and effect is poor.Even with
Advanced participle technique successfully segments, and user can not be also carried out in high quality using the technology of the natural language processing of those mainstreams
Portrait is portrayed.
The development of Emoji and universal brings new thinking for the text based user technology of portraying of drawing a portrait.First,
Emoji is vivid, and without aphasis, strong by various countries user is had deep love for.Expression of crying is laughed at even in quilt in 2015《Oxford
Dictionary》It is chosen as annual vocabulary.Secondly, as a kind of pervasive language, Emoji be by Unicode official definitions, can be easily
It filters and extracts from text.The technology that user's portrait is portrayed is carried out compared to using content of text, utilizes Emoji's
Service condition carries out having portrayed significant advantage.On the one hand, analysis Emoji will not relate to the sensitive information of user, substantially
Degree improves the secret protection to user.On the other hand, Emoji extractions are simple, do not need to the cutting word technology of complexity to extract, and
It is all widely used in every country.It means that transnational enterprise no longer needs all to design one to the language that each is related to
Complicated, the completely new user of set, which draws a portrait, portrays technology, but directly can train one simply and to various countries using Emoji
User's portrait that user is all suitable for portrays technology.Therefore, the present invention proposes a kind of user's portrait based on Emoji service conditions
Depicting method.
Invention content
The problem of user's portrait is portrayed is carried out according to content of text for existing, the purpose of the present invention is to propose to
A kind of user's portrait depicting method and system based on Emoji service conditions, the Emoji that user version is utilized draw a portrait to user
It portrays, the sensitive content inputted in text without analyzing user can protect privacy of user.
In order to achieve the above objectives, the technical solution adopted by the present invention is as follows:
A kind of user's portrait depicting method based on Emoji service conditions, step include:
The text data for obtaining the portrait information of a collection of user and inputting;
The initial data of Emoji is extracted from the text data using regular expression;
The Emoji that the user is obtained according to the initial data of the Emoji uses feature;
The user is divided into training set and test set;
Emoji by the use of the user in the training set uses feature as independent variable, portrait information as dependent variable,
Training pattern;
The model trained is applied on the test set, selects the best model of evaluation index as final user
Portrait portrays model.
Further, the portrait information includes the information such as age, the gender of user.
Further, go out the regular expression using the Emoji code constructions of Unicode official definitions.
Further, the Emoji includes Emoji frequency of use feature using feature, Emoji uses preference profiles,
Emoji emotion intent features.
Further, the Emoji frequency of use feature include the ratio accounted in text using the text of Emoji, one
In sentence Emoji using in number and a sentence use multiple Emoji when pattern.
Further, the Emoji uses preference profiles to include the Colour selection of Emoji, the use of Emoji selects,
The continuous use selection of Emoji.
Further, the Emoji emotions intent features include the Emoji's with positive, negative Sentiment orientation
Service condition.
Further, the algorithm that the training pattern uses includes predicting that the grader of the category attributes such as gender is calculated
Method, the regression algorithm for predicting the numerical attributes such as age.
Further, the evaluation index includes calculating suitable for indexs such as the accuracys rate of classifier algorithm, suitable for returning
The indexs such as the mean square error of method.
Further, it is determined that after the evaluation index, parameter selection is carried out to the algorithm of use, the algorithm is made to exist to find out
The one group of parameter to behave oneself best in evaluation index, it by training set random division is k deciles that the method for parameter selection, which is, is rolled over using k
The mode of cross validation carrys out selection parameter, and step includes:
For every group of parameter, carry out training pattern using k-1 parts of data every time;
With remaining a data come test model, then it can train and test k times;
K performance of comprehensive every group of parameter, selects one group of best parameter of mean apparent.
A kind of user's portrait describing system based on Emoji service conditions, including memory and processor, the memory
Computer program is stored, described program is configured as being performed by the processor, and described program includes each step of the above method
Instruction.
Compared with prior art, the positive effect of the present invention is:
The Emoji that user version is mainly utilized in the method for the present invention portrays user's portrait, without analyzing user's input
Sensitive content in text, greatly protects privacy of user.In addition, since Emoji is that a kind of whole world is general and flow very much
Capable language, the universality of this method are preferable.Using the method for the present invention, the transnational enterprise of user's throughout world various regions no longer needs
A set of dedicated user's portrait is all designed for each complicated language and portrays technology.
Description of the drawings
Fig. 1 is a kind of user's portrait depicting method flow chart based on Emoji service conditions of the present invention.
Specific embodiment
Features described above and advantage to enable the present invention are clearer and more comprehensible, special embodiment below, and institute's attached drawing is coordinated to make
Detailed description are as follows.
The present embodiment provides a kind of user's portrait depicting method based on Emoji service conditions, as shown in Figure 1, can be divided into
Following two large divisions.
First, the core of this method is to extract the service condition of Emoji based on user has identification to user's portrait
The feature of degree, the part mainly include including two steps:
1st, Text Pretreatment
It is cleaned firstly the need of the user version data to collection, structure is encoded using the Emoji of Unicode official definitions
The regular expression that Emoji is matched in the text is produced, the regular expression is applied on every text, so as to mistake
Filter be not Emoji content of text, obtain the initial data for the Emoji that each user uses.
2nd, characterizing definition
It is constructed from the initial data of the Emoji available for machine learning and the quantization of deep learning algorithm
Emoji uses feature, specifically includes:
1) Emoji frequency of use feature
From overall particle size, the text using Emoji can be obtained during Emoji is washed out from the text
This ratio.From more fine granularity, for each sentence, it can extract wherein using going out in the number of Emoji and sentence
Various patterns during existing multiple Emoji, for example whether multiple Emoji etc. are used continuously in a sentence.
2) Emoji uses preference profiles
Emoji is using preference, that is, different user in the text using the different selections of Emoji, for example women is compared to man
Property, more love uses bright Emoji.The initial data of Emoji is used by user, it can be deduced that each user is for every
The access times of a Emoji use preference profiles as it.Deeper into ground, it is also contemplated that each user most likes to be used continuously
Which Emoji etc. is used as feature.
3) Emoji emotions intent features
People online lower exchange when, can be expressed using facial expression and limb action etc. to enrich the language of itself, allow other side
Easily more accurately understand the intention of oneself.When exchanging on line, due to missings such as facial expressions, user transfers to use
Emoji serves as this clue.In view of sociology and psychology find different types of people for facial expression etc. using
Difference, such as women than men are more frequent using facial expression, and the emotion intention that Emoji is used is also served as a kind of area by the present invention
Divide the feature of user.First, official definitions of the analysis Unicode for each Emoji is gone using the sentiment analysis tool such as LIWC,
The affective tag of each Emoji is obtained, so as to mark off the Emoji of positive, negative, ameleia tendency.Finally, for every
A user, it can be deduced that its service condition for positive, negative Emoji is intended to special as its Emoji emotions used
Sign.
2nd, it can be drawn a portrait with training user using feature based on above-mentioned Emoji and portray model, the tool that training process is related to
Body step is as follows:
1st, data set divides
For training pattern, the user using attribute information (such as age, gender etc.) known to a batch is needed, and extract
They use the feature of Emoji, by these users according to a certain percentage (such as 5:1) training set and test set are divided into.Wherein
The attribute information of user and Emoji are used for training pattern using feature in training set.Then, by the user's in test set
Emoji is put into using feature in model, and model can provide the prediction result of the attribute to user each in test set, this is tied
The real property of fruit and user are compared and are calculated specific evaluation index, you can the effect of scoring model.
2nd, evaluation index selects
Cheng Qian is crossed in model training, it is thus necessary to determine that the evaluation index of good model.If using classifier algorithm predictability
Not Deng category attributes, evaluation index can select the indexs such as accuracy rate (Accuracy).Accuracy rate refers to predicting correct sample
Number accounts for the ratio of total number of samples.Certainly, for different application scenarios, it can also consider different evaluation indexes.For example, one special
The advertiser in male market is attacked, target user is male, and advertiser concern will be that algorithm can reflect in much degree
Do not go out male user, the evaluation index of model is that male user is predicted accurate ratio.It is if pre- using regression algorithm
The numerical attributes such as age are surveyed, mean square error (MSE) etc. can be selected to be used as evaluation index.Mean square error refers to the actual value of attribute
With the variance of predicted value, mean square error is bigger, illustrates that numerical value that model prediction goes out and actual value gap are bigger.
3rd, algorithms selection
User's portrait portrays the problem of being a various dimensions, can portray the much informations such as gender, the age of user.Such as
Fruit is to portray the category attributes such as gender, can be with selection sort device algorithm, and for gender, the output result of algorithm is man, female two
Classification.If portraying the numerical attributes such as age, regression algorithm can be selected, for the age, the output of algorithm is the number at age
Value.After evaluation index is determined, for each algorithm selected, parameter selection is all carried out, finding out can comment each algorithm
The one group of parameter to behave oneself best in valency index.It is k deciles by training set random division during parameter selection, is then rolled over and intersected using k
The mode of verification carrys out selection parameter.Specifically, for every group of parameter, carry out training pattern using k-1 parts of data every time, then with surplus
Under that part of data carry out test model, can train and test k times in this way.For each algorithm, k table of comprehensive every group of parameter
It is existing, select the optimal parameter combination of mean apparent.Finally, the best parameter group of each algorithm has been obtained, as based on this
The optimal models that algorithm obtains.These optimal models are applied on test set, the Emoji of user in test set is used into feature
Model is put into, model can provide prediction result, and the true portrait information of prediction result and these users are calculated evaluation index,
Select the model to behave oneself best in evaluation index user's portrait model the most final.
The user's portrait model obtained using above-mentioned training, it would be desirable to which the Emoji of the user portrayed inputs mould using feature
Type, output are prediction result of the model for the portrait of the user.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field
Personnel can be modified or replaced equivalently technical scheme of the present invention, without departing from the spirit and scope of the present invention, this
The protection domain of invention should be subject to described in claims.
Claims (10)
- The depicting method 1. a kind of user based on Emoji service conditions draws a portrait, step include:The text data for obtaining the portrait information of a collection of user and inputting;The initial data of Emoji is extracted from the text data using regular expression;The Emoji that the user is obtained according to the initial data of the Emoji uses feature;The user is divided into training set and test set;Emoji by the use of the user in the training set uses feature as independent variable, and portrait information is trained as dependent variable Model;The model trained is applied on the test set, the best model of evaluation index is selected and draws a portrait as final user Portray model.
- 2. according to the method described in claim 1, it is characterized in that, the Emoji code constructions using Unicode official definitions go out The regular expression.
- 3. according to the method described in claim 1, it is characterized in that, the Emoji includes Emoji frequency of use spy using feature Sign, Emoji use preference profiles, Emoji emotion intent features.
- 4. according to the method described in claim 3, it is characterized in that, the Emoji frequency of use feature includes using in text Emoji's uses mould when using multiple Emoji in number and a sentence in ratio that the text of Emoji accounts for, a sentence Type.
- 5. according to the method described in claim 3, it is characterized in that, the Emoji includes the color of Emoji using preference profiles Selection, the use of Emoji selection, the continuous use of Emoji selection.
- 6. according to the method described in claim 3, it is characterized in that, the Emoji emotions intent features include have it is positive, The service condition of the Emoji of negative Sentiment orientation.
- 7. according to the method described in claim 1, it is characterized in that, the algorithm that the training pattern uses includes predicting class Classifier algorithm, the regression algorithm for predicting numerical attribute of other attribute, the category attribute include gender, the numerical value category Property include the age.
- 8. the method according to the description of claim 7 is characterized in that the evaluation index includes:Suitable for the index of classifier algorithm, which includes accuracy rate;Suitable for the index of regression algorithm, which includes mean square error.
- 9. method according to claim 7 or 8, which is characterized in that after determining the evaluation index, to the algorithm of use into Row parameter selection, to find out the algorithm is made to behave oneself best in evaluation index one group of parameter, the method for parameter selection is will to instruct White silk integrates random division as k deciles, includes in a manner that k rolls over cross validation come selection parameter, step:For every group of parameter, carry out training pattern using k-1 parts of data every time;With remaining a data come test model, then it can train and test k times;K performance of comprehensive every group of parameter, selects one group of optimal parameter of mean apparent.
- Describing system, including memory and processor, the memory 10. a kind of user based on Emoji service conditions draws a portrait Computer program is stored, described program is configured as being performed by the processor, and described program includes the claims 1 to 9 Each step instruction of any the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711261393.1A CN108256548A (en) | 2017-12-04 | 2017-12-04 | A kind of user's portrait depicting method and system based on Emoji service conditions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711261393.1A CN108256548A (en) | 2017-12-04 | 2017-12-04 | A kind of user's portrait depicting method and system based on Emoji service conditions |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108256548A true CN108256548A (en) | 2018-07-06 |
Family
ID=62722364
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711261393.1A Pending CN108256548A (en) | 2017-12-04 | 2017-12-04 | A kind of user's portrait depicting method and system based on Emoji service conditions |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108256548A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929285A (en) * | 2019-12-10 | 2020-03-27 | 支付宝(杭州)信息技术有限公司 | Method and device for processing private data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101419527A (en) * | 2007-10-19 | 2009-04-29 | 株式会社理光 | Information processing, outputting and forming device, and user property judgement method |
CN105160016A (en) * | 2015-09-25 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring user attributes |
CN105701074A (en) * | 2016-01-04 | 2016-06-22 | 北京京东尚科信息技术有限公司 | Character processing method and apparatus |
WO2016113967A1 (en) * | 2015-01-14 | 2016-07-21 | ソニー株式会社 | Information processing system, and control method |
CN106708983A (en) * | 2016-12-09 | 2017-05-24 | 竹间智能科技(上海)有限公司 | Dialogue interactive information-based user portrait construction system and method |
CN107423442A (en) * | 2017-08-07 | 2017-12-01 | 火烈鸟网络(广州)股份有限公司 | Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis |
-
2017
- 2017-12-04 CN CN201711261393.1A patent/CN108256548A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101419527A (en) * | 2007-10-19 | 2009-04-29 | 株式会社理光 | Information processing, outputting and forming device, and user property judgement method |
WO2016113967A1 (en) * | 2015-01-14 | 2016-07-21 | ソニー株式会社 | Information processing system, and control method |
CN105160016A (en) * | 2015-09-25 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring user attributes |
CN105701074A (en) * | 2016-01-04 | 2016-06-22 | 北京京东尚科信息技术有限公司 | Character processing method and apparatus |
CN106708983A (en) * | 2016-12-09 | 2017-05-24 | 竹间智能科技(上海)有限公司 | Dialogue interactive information-based user portrait construction system and method |
CN107423442A (en) * | 2017-08-07 | 2017-12-01 | 火烈鸟网络(广州)股份有限公司 | Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929285A (en) * | 2019-12-10 | 2020-03-27 | 支付宝(杭州)信息技术有限公司 | Method and device for processing private data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108491377A (en) | A kind of electric business product comprehensive score method based on multi-dimension information fusion | |
CN108984530A (en) | A kind of detection method and detection system of network sensitive content | |
CN109376251A (en) | A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model | |
CN109829166B (en) | People and host customer opinion mining method based on character-level convolutional neural network | |
CN107391483A (en) | A kind of comment on commodity data sensibility classification method based on convolutional neural networks | |
CN108364199B (en) | Data analysis method and system based on Internet user comments | |
CN105205699A (en) | User label and hotel label matching method and device based on hotel comments | |
CN108388660B (en) | Improved E-commerce product pain point analysis method | |
CN107657056B (en) | Method and device for displaying comment information based on artificial intelligence | |
CN107862087A (en) | Sentiment analysis method, apparatus and storage medium based on big data and deep learning | |
CN106815194A (en) | Model training method and device and keyword recognition method and device | |
CN106407235B (en) | A kind of semantic dictionary construction method based on comment data | |
CN107391575A (en) | A kind of implicit features recognition methods of word-based vector model | |
CN108319734A (en) | A kind of product feature structure tree method for auto constructing based on linear combiner | |
KR20120109943A (en) | Emotion classification method for analysis of emotion immanent in sentence | |
CN108108468A (en) | A kind of short text sentiment analysis method and apparatus based on concept and text emotion | |
CN109033166B (en) | Character attribute extraction training data set construction method | |
CN111666761A (en) | Fine-grained emotion analysis model training method and device | |
CN106909572A (en) | A kind of construction method and device of question and answer knowledge base | |
CN110321549B (en) | New concept mining method based on sequential learning, relation mining and time sequence analysis | |
CN110569354A (en) | Barrage emotion analysis method and device | |
CN106569996B (en) | A kind of Sentiment orientation analysis method towards Chinese microblogging | |
CN108090099A (en) | A kind of text handling method and device | |
CN115017320A (en) | E-commerce text clustering method and system combining bag-of-words model and deep learning model | |
Schmøkel et al. | FBAdLibrarian and Pykognition: open science tools for the collection and emotion detection of images in Facebook political ads with computer vision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180706 |