CN110197389A - A kind of user identification method and device - Google Patents

A kind of user identification method and device Download PDF

Info

Publication number
CN110197389A
CN110197389A CN201910161169.8A CN201910161169A CN110197389A CN 110197389 A CN110197389 A CN 110197389A CN 201910161169 A CN201910161169 A CN 201910161169A CN 110197389 A CN110197389 A CN 110197389A
Authority
CN
China
Prior art keywords
user
information
vector
social
propagation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910161169.8A
Other languages
Chinese (zh)
Inventor
邝展豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910161169.8A priority Critical patent/CN110197389A/en
Publication of CN110197389A publication Critical patent/CN110197389A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Abstract

The present invention relates to a kind of user identification method and devices, which comprises obtains the social behavior information of user, wherein the social behavior information includes: user's corpus information, user social relationship information and user's operation information;Obtain target text vector corresponding with current area;According to user's corpus information and the target text vector, the text feature of user is determined;The user social relationship information is inputted in preset propagation model, group's propagation characteristic of user is obtained;The user's operation information is inputted in preset prediction model, the behavioural characteristic of user is obtained;The text feature, group's propagation characteristic and the behavioural characteristic are merged, user's recognition result is obtained.The present invention, can be in different field, and according to the social behavior information of user, integrated multidimensional degree user characteristics are identified before user carries out specific activity to whether user is related to the field relevant operation.

Description

A kind of user identification method and device
Technical field
The present invention relates to depth learning technology field more particularly to a kind of user identification methods and device.
Background technique
As electric business ox is increasingly savage, electric business platform and brand quotient sustain a loss increasing, and existing electric business is anti- In the technical solution of ox, the basic method being polymerize using order identifies ox, i.e. electric business platform is by detecting the same kind of goods No a large amount of polymerizations identify ox to an identical logistics area.The existing method being polymerize by order identifies the technology of ox Scheme suffers from the drawback that one, subsequent retardance, and electric business platform will can just polymerize problematic order after beef cattle places an order successfully It is single and cannot be single under client when, ox attack is identified in advance, to miss the opportunity of best prevention loss;Two, sentence black Dimension is single, and general electric business platform is due to lacking user's Figure Characteristics, so that ox order can only be identified from geographic area, it cannot The relevant Figure Characteristics of beef cattle are portrayed from user perspective, it is single to polymerize from the same kind of goods to the knowledge of identical this dimension of logistics region Other ox is easy normal order being mistaken for ox order, and accuracy rate is low, manslaughters rate height.
Summary of the invention
Technical problem to be solved by the present invention lies in, a kind of user identification method and device are provided, it can be in different necks In domain, according to the social behavior information of user, integrated multidimensional degree user characteristics, before user carries out specific activity to user whether It is related to the field relevant operation to be identified.
In order to solve the above-mentioned technical problem, on the one hand, the present invention provides a kind of user identification method, the method packets It includes:
Obtain the social behavior information of user, wherein the social behavior information includes: user's corpus information, Yong Hushe Hand over relation information and user's operation information;
Obtain target text vector corresponding with current area;
According to user's corpus information and the target text vector, the text feature of user is determined;
The user social relationship information is inputted in preset propagation model, group's propagation characteristic of user is obtained;
The user's operation information is inputted in preset prediction model, the behavioural characteristic of user is obtained;
The text feature, group's propagation characteristic and the behavioural characteristic are merged, user's recognition result is obtained.
On the other hand, the present invention provides a kind of customer identification device, described device includes:
User profile acquisition module, for obtaining the social behavior information of user, wherein the social behavior information packet It includes: user's corpus information, user social relationship information and user's operation information;
Object vector obtains module, for obtaining target text vector corresponding with current area;
Text feature determining module, for determining user according to user's corpus information and the target text vector Text feature;
Group's propagation characteristic determining module is obtained for inputting the user social relationship information in preset propagation model To group's propagation characteristic of user;
Behavioural characteristic determining module obtains user for inputting the user's operation information in preset prediction model Behavioural characteristic;
Fusion Features module is obtained for merging the text feature, group's propagation characteristic and the behavioural characteristic User's recognition result.
The implementation of the embodiments of the present invention has the following beneficial effects:
The present invention obtains corresponding user characteristics by obtaining the social behavior information of user, wherein the social activity row It include user's corpus information, user social relationship information and user's operation information for information;For current application field, obtain Target text vector corresponding with current area determines the text of user according to user's corpus information and target text vector Feature;The user social relationship information is inputted in preset propagation model, group's propagation characteristic of user is obtained;By the use Family operation information inputs in preset prediction model, obtains the behavioural characteristic of user;Merge the text feature, the group propagates Feature and the behavioural characteristic, obtain user's recognition result.The present invention can be directed to different fields, the social activity based on user Behavioural information identifies before user carries out concrete operations to whether user is related to the field relevant operation, so that phase Pass personnel carry out corresponding counter-measure according to recognition result;It solves the problems, such as to judge that dimension is single in the prior art, pass through The various dimensions feature based on user social contact behavioural information is obtained, to depict user's Figure Characteristics, the accuracy rate of identification is high.
Detailed description of the invention
Fig. 1 is application scenarios schematic diagram provided in an embodiment of the present invention;
Fig. 2 is a kind of user identification method flow chart provided in an embodiment of the present invention;
Fig. 3 is a kind of generation method flow chart of target text vector provided in an embodiment of the present invention;
Fig. 4 is the text feature calculation method flow chart of user provided in an embodiment of the present invention a kind of;
Fig. 5 is population propagation characteristic acquisition methods flow chart provided in an embodiment of the present invention;
Fig. 6 is a kind of user behavior characteristics acquisition methods flow chart provided in an embodiment of the present invention;
Fig. 7 is a kind of multimodal information fusion neural network model schematic diagram provided in an embodiment of the present invention;
Fig. 8 is the network model schematic diagram of LSTM provided in an embodiment of the present invention;
Fig. 9 is a kind of textual classification model schematic diagram based on LSTM provided in an embodiment of the present invention;
Figure 10 is a kind of customer identification device schematic diagram provided in an embodiment of the present invention;
Figure 11 is text feature determining module schematic diagram provided in an embodiment of the present invention;
Figure 12 is object vector generation module schematic diagram provided in an embodiment of the present invention;
Figure 13 is group's propagation characteristic determining module schematic diagram provided in an embodiment of the present invention;
Figure 14 is behavioural characteristic determining module schematic diagram provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, the present invention is made into one below in conjunction with attached drawing Step ground detailed description.Obviously, described embodiment is only a part of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, those of ordinary skill in the art are obtained without making creative work Every other embodiment, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be understood that, term " first ", " second " are used for description purposes only, and cannot It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include one or more of the features.Moreover, term " first ", " second " etc. is suitable for distinguishing similar object, without being used to describe a particular order or precedence order.It should be understood that in this way The data used are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein can be in addition to scheming herein Sequence other than those of showing or describe is implemented.
Relational language involved in the embodiment of the present invention is done first explained below:
Ox: illegal intermediary refers specifically to monopolize and sell other than legitimate sales approach limitation right to participate in or commodity to scheme The intermediary of benefit.
Ox head: the organizer that ox crowdsourcing is stored goods is initiated.
Beef cattle: the ox attacker of practical execution of order purchase.
Declaration form: ox head and beef cattle cash knot declaration form, actually a cash disengaging deposit report.
RNN:Recurrent neural Network, Recognition with Recurrent Neural Network are a kind of people of node orientation connection cyclization Artificial neural networks, its internal state can show dynamic time sequence behavior, can use its internal memory come when handling any The list entries of sequence.
LSTM:Long Short-Term Memory is shot and long term memory network, is a kind of time recurrent neural network, is fitted Together in processing and predicted time sequence in be spaced and postpone relatively long critical event.
Attention: also known as attention mechanism is a kind of model to be allowed to pay close attention to important integration stress and sufficiently learn The technology of absorption.
Referring to Figure 1, it illustrates application scenarios schematic diagrams provided in an embodiment of the present invention, including several user terminals 110 and server 120, the user terminal 110 includes but is not limited to smart phone, tablet computer, laptop, desktop Brain etc..User can log in related application APP by user terminal 110 or website carries out network of relation activity, when user passes through When user terminal 110 sends network service request to server 120, server 120 is and same in response to the network service request When the history social behavior information of the user is obtained according to the login account of the user, by history social behavior information Analysis identification is carried out, the Activity recognition result of the user is finally obtained.When the Activity recognition result of the user meets network service When request condition, then server 120 can issue corresponding business information so that user to the corresponding user terminal 110 of the user Complete relevant network activity;When the Activity recognition result of the user is unsatisfactory for service request condition, then server 120 can be refused Network of relation business service is provided for the corresponding user terminal 110 of the user absolutely, so that the user can not carry out relevant network Activity.
Fig. 2 is referred to, it illustrates a kind of user identification methods, can be applied to server side, which comprises
S210. the social behavior information of user is obtained, wherein the social behavior information includes: user's corpus information, uses Family social relationship information and user's operation information.
In embodiments of the present invention, the social behavior information for obtaining user is that operation triggering is requested based on user network, Specifically, when sending service request with user orientation server, server obtains current use according to current usersaccount information The social behavior information at family.
Here social behavior information includes: user's corpus information, user social relationship information and user's operation information. Wherein, user's corpus information can specifically include: the text envelope of group's title of social group, user's chat content that user is added The text information expression related to user such as the related article information that breath, user deliver;User social relationship information may include: Type, the user liveness and group members series in social group of the social group that user is added, user social friend relation Deng;User's operation information may include: user click link, user browsing page info, user click browsing mutually inside the Pass The number etc. of appearance.The above acquired social behavior information can be used as the foundation identified to user.
S220. target text vector corresponding with current area is obtained.
For different fields, the embodiment of the present invention can provide the priori data of different field as foundation.For each Field can collect corpus information relevant to the field, and generate corresponding text vector as subsequent reference, specifically The generation method of target text vector can be found in Fig. 3, which comprises
S310. source corresponding with current area corpus information is obtained.
After the application field being currently related to has been determined, source corresponding with field corpus information is obtained, it is right here The source corpus information in Mr. Yu field, which refers to, to be phase with some representative words, word or the sentence etc. in the specific characterization field Pass personnel are obtained according to the experience accumulation of early period.
S320. the source corpus information is segmented, generates the term vector of each word in the source corpus information.
Since the source corpus information of acquisition is one whole section of word, needs to segment it, can realize to language in the prior art Expect that the method segmented can be applied in the present embodiment, such as the storage of Trie tree and longest match principle, the participle based on HMM Method, probability participle model etc..
After being segmented to source corpus, to the word after cutting with insertion (embedding) coding form carry out to Amount expression, i.e. term vector, it is possible to understand that are as follows: by some word in text space, by certain method, maps or be embedded into Another numerical value vector space.Term vector in the present embodiment can be realized by word2vec.
S330. the term vector of each word in the source corpus information is overlapped, obtains the target text vector.
Term vector corresponding to each word in step S320 is overlapped, target text corresponding with the field is obtained This vector embeddingObject vector
Since the source corpus information in each field here is obtained by experience accumulation, so pushing away with the time It moves, needs to be updated source corpus information according to newest situation, to ensure that current corpus information can be comprehensively and accurate Portray the characteristics of current area in ground.
S230. according to user's corpus information and the target text vector, the text feature of user is determined.
Fig. 4 is referred to, it illustrates the text feature calculation methods of user a kind of, which comprises
S410. user's corpus information is segmented, generate the word of each word in user's corpus information to Amount.
User's corpus information of acquisition is segmented, and generates corresponding term vector, concrete implementation process can be found in Step S320.
S420. the term vector for calculating each word in user's corpus information is similar to the target text vector Degree.
Theoretically any method that can calculate two vector similarities can be using similarity in this present embodiment It calculates, such as:
1. cosine formula can be used directly and calculate in target text vector and user's corpus in order to reduce the training of parameter The similarity of the term vector of each word;
2. by a simple neural network, inputs as a and b, export as similarity c;
3. obtaining similarity by matrixing.
Cosine formula can be selected in the present embodiment and calculate similarity, specific formula is as follows:
By above-mentioned formula, the term vector w of each word in user's corpus is calculated separatelyiWith target text vector embeddingObject vectorSimilarity αi
S430. using the similarity as the weight of corresponding term vector, the text feature of the user is calculated.
Using the similarity of each term vector being calculated in step S420 and target text vector as the power of the term vector Value, is weighted all term vectors in user's corpus, detailed process is as follows:
Wherein, n is the number of the term vector obtained according to user's corpus, embeddingUser characteristicsTo be believed according to user's corpus Cease finally obtained user version feature.
S240. the user social relationship information is inputted in preset propagation model, obtains group's propagation characteristic of user.
Fig. 5 is referred to, it illustrates a population propagation characteristic acquisition methods, which comprises
S510. the user information with label and the user information without the label are obtained.
S520. label propagation algorithm is used, according to the user information with label and described without the label User information the propagation model is trained.
S530. the user social contact relationship is input in the propagation model, is generated by the propagation model and institute State the corresponding vector of user social contact relationship.
The similarity of active user and target group are judged by label propagation.Due to only a small amount of use for having label Family is needed to be diffused out more users potentially relevant to target group using the user of these tape labels, supervised used here as half The method that educational inspector practises allows label to propagate.
Semi-supervised learning, as the term suggests it is exactly only a small amount of labeled data, it is intended to the labeled number a small amount of from this Useful information is arrived according to study in a large amount of unlabeled data.It is based on three big hypothesis: 1) Smoothness is smoothly false If: similar data label having the same;2) Cluster cluster is assumed: the data under the same cluster have identical label;3) Manifold manifold is assumed: the data under same manifold structure have identical label.
The core concept of label propagation algorithm (label propagation) is very simple: LP algorithm is based on Graph , it is therefore desirable to first construct a figure.A figure is constructed for all data first, the node of figure is exactly a data point, packet Data containing labeled and unlabeled.The side of node i and node j indicate their similarity, and label propagation algorithm passes through Label is propagated on side between node, and the weight on side is bigger, indicates that two nodes are more similar, is propagated through then label is easier It goes.It is to take that class of maximum probability as its classification when determining the classification of node.Step is divided into simple terms: 1) propagation is executed;2) label of labeled sample is reset;3) step 1) is repeated and 2) until F restrains.With labeled data Constantly the label of oneself is blazed abroad, last class boundary can pass through high-density region, and rest on the interval of low-density In, it is equivalent to each different classes of labeled sample and has divided the sphere of influence.
Wherein, the user social contact relationship includes social groups, the friend relation of user and the user that user participates in Social active degree.It by user basic information, crawls out whether user participates in target group, and enlivens journey in group Degree, wherein the active degree of group chat can judge according to group members series.Customer relationship graph is constructed by social networks, Using label propagation algorithm, vector of the available active user in propagation model is indicated.
S250. the user's operation information is inputted in preset prediction model, obtains the behavioural characteristic of user.
Fig. 6 is referred to, it illustrates a kind of user behavior characteristics acquisition methods, which comprises
S610. in predetermined period, when detecting the click skip operation of user, the page letter after jumping is obtained Breath.
S620. the page info is inputted into the prediction model, exports the result predicted the page info.
Here prediction result is the specific value between one 0~1, can specifically refer to that the page info of input is different In the probability of target pages information, a settable threshold value determines final prediction result, such as in the present embodiment, threshold value It is set as 0.5, i.e., when the probability of prediction is less than 0.5, judges the page info of input for target information;When the probability of prediction is greater than When equal to 0.5, judge that the page info of input is not target information.
S630. it is recorded in the page info that user clicks after jumping in the predetermined period and is predicted to be target information Number.
The comprehensive probability and number that target pages are predicted to be in predetermined period, the final behavioural characteristic letter for determining user Breath.
Here user's style of writing feature is primarily referred to as user and whether there is the behavior of certain specific operations, clicks and jumps to user The page info and specific objective information turned carries out similarity prediction, uses open source technology fastText in the present embodiment to make For classifier, the sequence of one word of fastText mode input exports the probability that this word sequence belongs to a different category. CBOW model in fastText model framework and Word2Vec is much like.The difference is that fastText prediction label, and CBOW model prediction medium term.FastText is also added into N-gram feature.Bag of words feature in " I likes her " the words It is " I " " love ", " she ".As these features are characterized in sentence " she likes me ".If 2-Ngram is added, first There are also " I-love " and " like-she ", this two word " I likes her " and " she likes me " can be distinguished the feature of words.Due to It has used vector characterization word N-gram to take into account local word order in fastText, has been more suitable for the application of the present embodiment Scene.
S260. the text feature, group's propagation characteristic and the behavioural characteristic are merged, user is obtained and identifies knot Fruit.
After having obtained the text feature above with respect to user, group's propagation characteristic and behavioural characteristic, need these spies Sign is merged to obtain final recognition result.
A kind of neural network model is present embodiments provided, multi-modal information can be merged, be detailed in Fig. 7, wherein User's Figure Characteristics in figure include the user behavior characteristics in the present embodiment, in addition to this further include the ASSOCIATE STATISTICS spy of user Sign, such as gender, age, region, the login frequency, the brush amount behavior information of user equipment.
If above-mentioned characteristic information is only carried out linear superposition to obtain recognition result, it will lose significantly original Information content, Fig. 7 are extracted each modal characteristics of user by deep neural network and are fused into a feature vector, and nerve net is utilized The order of information of the multiple modal characteristics of the nonlinear feature extraction of network exports recognition result finally by output layer.
Present invention can apply to the field of the anti-ox of electric business, the mode input in Fig. 7 is the various dimensions feature of user, mould The output of type is ox fraud point, and ox fraud point here can be used for indicating that active user is the suspicion degree of ox, point Number is higher, then it is bigger for the suspicion of ox.
In the service of the anti-ox of electric business, ox fraud sub-service business is supplied to caller by SaaS mode, and caller is only Relevant user information need to be provided, SaaS service can return to corresponding ox fraud point, to assess the degree that user is ox.
First by collect ox field related corpus information, obtain target text corresponding with ox field to Amount, for carrying out the calculating of similarity with the user information got.
It by user basic information, crawls out whether user participates in ox group, for how to identify ox group, can lead to It crosses and crawls group chat content and group's title to analyze;For other text informations of user, can equally be obtained by crawling.According to User's corpus information of acquisition and the corpus information in ox field, determine the text feature of user.
More potential ox users are spread by the user with ox label, can specifically be believed by social networks It ceases to carry out the diffusion of label, eventually finds out which user has been transmitted to ox label.Ox is general in social group There is aggregation, the tightness degree for judging user Yu ox group is propagated by equipment diffusion, label
For user's operation information, be primarily referred to as here user whether the movable specific link of actual participation ox, such as The operation of ox declaration form is executed, when user clicks related link, obtains the page info after jumping, and page info is inputted Trained declaration form classifier, the page info inputted are not the probability of ox placard information, while user being assisted to be judged to Break to execute the number of ox declaration form operation, so that it is determined that whether user executes the operation of ox declaration form, such as it is believed that A possibility that in predetermined period, the number for being judged as executing the operation of ox declaration form is more, is ox are bigger.
Whether the present invention participates in ox corporations in social networks and participates in active degree from user, and whether user is in social activity The behavior of ox declaration form and frequent degree occur in network, whether user uses the single software of brush, if it is multiple to there is number suspicion of supporting etc. Social networks dimension judges whether user is ox beef cattle.
By the user characteristics being calculated above, other related essential characteristics input mould shown in Fig. 7 to the user of acquisition Type finally obtains the ox fraud point of active user.
It is by by user's term vector and target text vector to the processing mode of user's corpus information in above-described embodiment Similarity calculation is carried out, and using similarity as the weight of corresponding term vector, calculates the weighting of term vector, the text as user Feature is input in identification model.For user's corpus information, there are also another processing modes, i.e., by LSTM model to The corpus information at family carries out text classification.
Analysis point generally is carried out to text using RNN (Recognition with Recurrent Neural Network) model in natural language Language Processing Class, but since chat content is generally long, RNN is difficult to compress the general information of whole section of chat content, so using being based on Memory network (LSTM) model is trained the improved length of RNN in short-term, and LSTM controls front output by the concept of " door " On subsequent influence, the connection between sentence word can be linked well, extracted long text general idea, improved the correct of classifier Property, the network model of LSTM refers to Fig. 8.
Textual classification model based on LSTM can be found in Fig. 9, and the realization process for carrying out text classification to user's corpus is as follows:
The a large amount of corpus informations collected in advance are manually marked, 0 represent this section dialogue it is unrelated with ox, 1 represent this Section dialogue is related to ox, and deep learning is handled in text classification problem, using the form of identification of term vector, for term vector Distribution mark both reduces dimension, also embodies semantic information, and the distributed expression of most common term vector is exactly Word2vec, is a kind of unsupervised training, and the term vector trained has dense, the characteristics of including semantic information.To acquisition User's corpus information segments, and the term vector of each word is sent into order inside LSTM, the output of LSTM is exactly this section The expression of words, and can include the timing information of sentence, textual classification model is recently entered, is classified to current sentence, Judge whether it is related to ox field.
For the present invention by when with user orientation server initiating business request, server obtains the social behavior information of user, User's various dimensions feature is obtained according to social behavior information, judges whether the user meets business eventually by feature fusion Request condition, and when user does not meet service request condition, refusal provides a user corresponding service;Specifically, in electric business Anti- ox field, with user orientation server transmission place an order request when, server obtain user social contact behavioural information, it is finally obtained Ox fraud point then judges active user for ox user, to refuse the ox when ox fraud point is greater than some threshold value Lower single service request of user, so that the ox user not can be carried out the operation that places an order.The present invention is based on social datas to capture User participates in the behavior of ox in social networks, just differentiates whether the user is accused of ox and takes advantage of when can be advanced under user single Swindleness is accomplished to identify that ox is attacked in advance, holds the best opportunity for taking precautions against loss.
Secondly, the present invention analyzes multiple social dimensional characteristics of user according to user social contact behavioural information, to portray The accuracy rate of user's Figure Characteristics out, identification is high.
In addition, can generally take when handling user's corpus information and carry out the term vector of user's corpus directly The mode of superposition is combined to express the text feature of user, but because prefers to extract and the neck in a certain specific area The relevant information in domain, so the present invention provides a kind of improved attention mechanism, first with the object vector in the field into Row similarity calculation, then summation is weighted to user's term vector and obtains user version feature, with this come extract more with the neck The relevant information in domain.Different with the attention of series model to be, general attention is concerned with context letter Breath, the similarity of each embedding is calculated by the status information of context, but disadvantage is user's language once us Expect it is constant, regardless of being fixed and invariable in the similarity that ox field or other field are calculated.And it is mentioned based on the present invention The method of confession can accomplish for different field, even if with portion user's corpus, similarity is also different;Based on phase of the present invention When in the priori knowledge that joined field in ox field, so that calculated similarity focuses more on ox field.
The embodiment of the invention also provides a kind of customer identification device, referring to Figure 10, described device includes:
User profile acquisition module 1010, for obtaining the social behavior information of user, wherein the social behavior information It include: user's corpus information, user social relationship information and user's operation information.
Object vector obtains module 1020, for obtaining target text vector corresponding with current area.
Text feature determining module 1030, for determining according to user's corpus information and the target text vector The text feature of user.
Group's propagation characteristic determining module 1040, for the user social relationship information to be inputted preset propagation model In, obtain group's propagation characteristic of user.
Behavioural characteristic determining module 1050 is obtained for inputting the user's operation information in preset prediction model The behavioural characteristic of user.
Fusion Features module 1060, for merging the text feature, group's propagation characteristic and the behavioural characteristic, Obtain user's recognition result.
Referring to Figure 11, the text feature determining module 1030 includes:
First participle module 1110 generates user's corpus information for segmenting to user's corpus information In each word term vector.
Similarity calculation module 1120, for calculating the term vector and the mesh of each word in user's corpus information Mark the similarity of text vector.
Text feature computing module 1130, for calculating the use using the similarity as the weight of corresponding term vector The text feature at family.
Referring to Figure 12, described device further includes object vector generation module, and the object vector generation module includes:
Source corpus obtains module 1210, for obtaining source corresponding with current area corpus information.
Second word segmentation module 1220 generates in the source corpus information for segmenting to the source corpus information The term vector of each word.
Vector laminating module 1230 is obtained for the term vector of each word in the source corpus information to be overlapped The target text vector.
Referring to Figure 13, group's propagation characteristic determining module 1040 includes:
First obtains module 1310, has the user information of label and without user's letter of the label for obtaining Breath.
Propagation model training module 1320, for using label propagation algorithm, according to the user information with label The propagation model is trained with the user information without the label.
Vector calculation module 1330 is propagated to pass through for the user social contact relationship to be input in the propagation model The propagation model generates vector corresponding with the user social contact relationship.
Wherein, the user social contact relationship includes social groups, the friend relation of user and the user that user participates in Social active degree.
Referring to Figure 14, the behavioural characteristic determining module 1050 includes:
Detection module 1410 is operated, for when detecting the click skip operation of user, obtaining and jumping in predetermined period Page info after turning.
Prediction module 1420 exports pre- to the page info for the page info to be inputted the prediction model The result of survey.
Number logging modle 1430 clicks page info quilt after jumping for being recorded in user in the predetermined period It is predicted as the number of target information.
Any embodiment of that present invention institute providing method can be performed in the device provided in above-described embodiment, has execution this method Corresponding functional module and beneficial effect.The not technical detail of detailed description in the above-described embodiments, reference can be made to the present invention is any Method provided by embodiment.
The present embodiment additionally provides a kind of computer readable storage medium, and computer is stored in the storage medium to be held Row instruction, the computer executable instructions are loaded by processor and execute the above-mentioned any means of the present embodiment.
The present embodiment additionally provides a kind of equipment, and the equipment includes processor and memory, wherein the memory is deposited Computer program is contained, the computer program is suitable for being loaded by the processor and executing the above-mentioned any side of the present embodiment Method.
Present description provides the method operating procedures as described in embodiment or flow chart, but based on routine or without creation The labour of property may include more or less operating procedure.The step of enumerating in embodiment and sequence are only numerous steps One of execution sequence mode, does not represent and unique executes sequence.System in practice or when interrupting product and executing, can be with It is executed according to embodiment or method shown in the drawings sequence or parallel executes (such as parallel processor or multiple threads Environment).
Structure shown in the present embodiment, only part-structure relevant to application scheme, is not constituted to this The restriction for the equipment that application scheme is applied thereon, specific equipment may include more or fewer components than showing, Perhaps certain components or the arrangement with different components are combined.It is to be understood that method disclosed in the present embodiment, Device etc., may be implemented in other ways.For example, the apparatus embodiments described above are merely exemplary, for example, The division of the module is only a kind of division of logic function, and there may be another division manner in actual implementation, such as more A unit or assembly can be combined or can be integrated into another system, or some features can be ignored or not executed.It is another Point, shown or discussed mutual coupling, direct-coupling or communication connection can be through some interfaces, device or The indirect coupling or communication connection of unit module.
Based on this understanding, technical solution of the present invention substantially in other words the part that contributes to existing technology or The all or part of person's technical solution can be embodied in the form of software products, which is stored in one In a storage medium, including some instructions are used so that computer equipment (it can be personal computer, server, or Network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), the various media that can store program code such as magnetic or disk.
Those skilled in the art further appreciate that, respectively show in conjunction with what embodiment disclosed in this specification described Example unit and algorithm steps, being implemented in combination with electronic hardware, computer software or the two, in order to clearly demonstrate The interchangeability of hardware and software generally describes each exemplary composition and step according to function in the above description Suddenly.These functions are implemented in hardware or software actually, the specific application and design constraint item depending on technical solution Part.Professional technician can use different methods to achieve the described function each specific application, but this reality Now it should not be considered as beyond the scope of the present invention.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to before Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of user identification method characterized by comprising
Obtain the social behavior information of user, wherein the social behavior information includes: user's corpus information, user social contact pass It is information and user's operation information;
Obtain target text vector corresponding with current area;
According to user's corpus information and the target text vector, the text feature of user is determined;
The user social relationship information is inputted in preset propagation model, group's propagation characteristic of user is obtained;
The user's operation information is inputted in preset prediction model, the behavioural characteristic of user is obtained;
The text feature, group's propagation characteristic and the behavioural characteristic are merged, user's recognition result is obtained.
2. a kind of user identification method according to claim 1, which is characterized in that described according to user's corpus information With target text vector, determine that the text feature of user includes:
User's corpus information is segmented, the term vector of each word in user's corpus information is generated;
Calculate the term vector of each word in user's corpus information and the similarity of the target text vector;
Using the similarity as the weight of corresponding term vector, the text feature of the user is calculated.
3. a kind of user identification method according to claim 2, which is characterized in that the generation side of the target text vector Method includes:
Obtain source corresponding with current area corpus information;
The source corpus information is segmented, the term vector of each word in the source corpus information is generated;
The term vector of each word in the source corpus information is overlapped, the target text vector is obtained.
4. a kind of user identification method according to claim 1, which is characterized in that described to believe the user social contact relationship Breath inputs in preset propagation model, and the group's propagation characteristic for obtaining user includes:
Obtain the user information with label and the user information without the label;
Using label propagation algorithm, according to the user information with label and the user information without the label The propagation model is trained;
The user social contact relationship is input in the propagation model, is generated and the user social contact by the propagation model The corresponding vector of relationship;
Wherein, the user social contact relationship includes social groups, the friend relation of user and the social activity of user that user participates in Active degree.
5. a kind of user identification method according to claim 1, which is characterized in that described that the user's operation information is defeated Enter in preset prediction model, the behavioural characteristic for obtaining user includes:
In predetermined period, when detecting the click skip operation of user, the page info after jumping is obtained;
The page info is inputted into the prediction model, exports the result predicted the page info;
It is recorded in the number that the page info that user clicks after jumping in the predetermined period is predicted to be target information.
6. a kind of customer identification device characterized by comprising
User profile acquisition module, for obtaining the social behavior information of user, wherein the social behavior information includes: use Family corpus information, user social relationship information and user's operation information;
Object vector obtains module, for obtaining target text vector corresponding with current area;
Text feature determining module, for determining the text of user according to user's corpus information and the target text vector Eigen;
Group's propagation characteristic determining module is used for inputting the user social relationship information in preset propagation model Group's propagation characteristic at family;
Behavioural characteristic determining module obtains the row of user for inputting the user's operation information in preset prediction model It is characterized;
Fusion Features module obtains user for merging the text feature, group's propagation characteristic and the behavioural characteristic Recognition result.
7. a kind of customer identification device according to claim 6, which is characterized in that the text feature determining module packet It includes:
First participle module generates each of described user's corpus information for segmenting to user's corpus information The term vector of word;
Similarity calculation module, for calculate each word in user's corpus information term vector and the target text to The similarity of amount;
Text feature computing module, for calculating the text of the user using the similarity as the weight of corresponding term vector Feature.
8. a kind of customer identification device according to claim 7, which is characterized in that described device further includes that object vector is raw At module, comprising:
Source corpus obtains module, for obtaining source corresponding with current area corpus information;
Second word segmentation module generates each word in the source corpus information for segmenting to the source corpus information Term vector;
Vector laminating module obtains the target for the term vector of each word in the source corpus information to be overlapped Text vector.
9. a kind of customer identification device according to claim 6, which is characterized in that group's propagation characteristic determining module packet It includes:
First obtains module, has the user information of label and without the user information of the label for obtaining;
Propagation model training module, for use label propagation algorithm, according to the user information with label and it is described not User information with the label is trained the propagation model;
It propagates vector calculation module and passes through the propagation for the user social contact relationship to be input in the propagation model Model generates vector corresponding with the user social contact relationship;
Wherein, the user social contact relationship includes social groups, the friend relation of user and the social activity of user that user participates in Active degree.
10. a kind of customer identification device according to claim 6, which is characterized in that the behavioural characteristic determining module packet It includes:
Detection module is operated, for when detecting the click skip operation of user, obtaining after jumping in predetermined period Page info;
Prediction module exports the result predicted the page info for the page info to be inputted the prediction model;
Number logging modle is predicted to be mesh for being recorded in the page info that user clicks after jumping in the predetermined period Mark the number of information.
CN201910161169.8A 2019-03-04 2019-03-04 A kind of user identification method and device Pending CN110197389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910161169.8A CN110197389A (en) 2019-03-04 2019-03-04 A kind of user identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910161169.8A CN110197389A (en) 2019-03-04 2019-03-04 A kind of user identification method and device

Publications (1)

Publication Number Publication Date
CN110197389A true CN110197389A (en) 2019-09-03

Family

ID=67751725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910161169.8A Pending CN110197389A (en) 2019-03-04 2019-03-04 A kind of user identification method and device

Country Status (1)

Country Link
CN (1) CN110197389A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781407A (en) * 2019-10-21 2020-02-11 腾讯科技(深圳)有限公司 User label generation method and device and computer readable storage medium
CN111368552A (en) * 2020-02-26 2020-07-03 北京市公安局 Network user group division method and device for specific field
CN111737456A (en) * 2020-05-15 2020-10-02 恩亿科(北京)数据科技有限公司 Corpus information processing method and apparatus
CN113204622A (en) * 2021-05-25 2021-08-03 广州三星通信技术研究有限公司 Electronic device and information processing method thereof
CN113361198A (en) * 2021-06-09 2021-09-07 南京大学 Public and private information mining-based crowdsourcing test report fusion method
CN114422207A (en) * 2021-12-30 2022-04-29 中国人民解放军战略支援部队信息工程大学 Multi-mode-based C & C communication flow detection method and device
CN116127204A (en) * 2023-04-17 2023-05-16 中国科学技术大学 Multi-view user portrayal method, multi-view user portrayal system, apparatus, and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136226A (en) * 2011-11-25 2013-06-05 深圳市腾讯计算机系统有限公司 Method and device capable of searching user
CN103577549A (en) * 2013-10-16 2014-02-12 复旦大学 Crowd portrayal system and method based on microblog label
CN106484764A (en) * 2016-08-30 2017-03-08 江苏名通信息科技有限公司 User's similarity calculating method based on crowd portrayal technology
CN107330709A (en) * 2016-04-29 2017-11-07 阿里巴巴集团控股有限公司 Determine the method and device of destination object
CN108932669A (en) * 2018-06-27 2018-12-04 北京工业大学 A kind of abnormal account detection method based on supervised analytic hierarchy process (AHP)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136226A (en) * 2011-11-25 2013-06-05 深圳市腾讯计算机系统有限公司 Method and device capable of searching user
CN103577549A (en) * 2013-10-16 2014-02-12 复旦大学 Crowd portrayal system and method based on microblog label
CN107330709A (en) * 2016-04-29 2017-11-07 阿里巴巴集团控股有限公司 Determine the method and device of destination object
CN106484764A (en) * 2016-08-30 2017-03-08 江苏名通信息科技有限公司 User's similarity calculating method based on crowd portrayal technology
CN108932669A (en) * 2018-06-27 2018-12-04 北京工业大学 A kind of abnormal account detection method based on supervised analytic hierarchy process (AHP)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781407A (en) * 2019-10-21 2020-02-11 腾讯科技(深圳)有限公司 User label generation method and device and computer readable storage medium
CN111368552A (en) * 2020-02-26 2020-07-03 北京市公安局 Network user group division method and device for specific field
CN111737456A (en) * 2020-05-15 2020-10-02 恩亿科(北京)数据科技有限公司 Corpus information processing method and apparatus
CN113204622A (en) * 2021-05-25 2021-08-03 广州三星通信技术研究有限公司 Electronic device and information processing method thereof
CN113361198A (en) * 2021-06-09 2021-09-07 南京大学 Public and private information mining-based crowdsourcing test report fusion method
CN113361198B (en) * 2021-06-09 2023-11-03 南京大学 Crowd-sourced test report fusion method based on public and private information mining
CN114422207A (en) * 2021-12-30 2022-04-29 中国人民解放军战略支援部队信息工程大学 Multi-mode-based C & C communication flow detection method and device
CN114422207B (en) * 2021-12-30 2023-06-02 中国人民解放军战略支援部队信息工程大学 C & C communication flow detection method and device based on multiple modes
CN116127204A (en) * 2023-04-17 2023-05-16 中国科学技术大学 Multi-view user portrayal method, multi-view user portrayal system, apparatus, and medium

Similar Documents

Publication Publication Date Title
CN110197389A (en) A kind of user identification method and device
US11494648B2 (en) Method and system for detecting fake news based on multi-task learning model
US20220156175A1 (en) Mapping of test cases to test data for computer software testing
CN106030571A (en) Dynamically modifying elements of user interface based on knowledge graph
CN105574067A (en) Item recommendation device and item recommendation method
CN110287314B (en) Long text reliability assessment method and system based on unsupervised clustering
CN112231570B (en) Recommendation system support attack detection method, device, equipment and storage medium
CN107844533A (en) A kind of intelligent Answer System and analysis method
CN109213843A (en) A kind of detection method and device of rubbish text information
CN106354818B (en) Social media-based dynamic user attribute extraction method
Safara et al. An author gender detection method using whale optimization algorithm and artificial neural network
CN106537387B (en) Retrieval/storage image associated with event
CN111125360B (en) Emotion analysis method and device in game field and model training method and device thereof
CN108509793A (en) A kind of user's anomaly detection method and device based on User action log data
Edwards et al. Identifying wildlife observations on twitter
WO2023108980A1 (en) Information push method and device based on text adversarial sample
Olabenjo Applying naive bayes classification to google play apps categorization
CN110516210A (en) The calculation method and device of text similarity
CN110110218A (en) A kind of Identity Association method and terminal
Yuan et al. Perceiving more truth: A dilated-block-based convolutional network for rumor identification
Raja et al. Fake news detection on social networks using Machine learning techniques
CN107688594B (en) The identifying system and method for risk case based on social information
Shrivastava et al. A research on fake news detection using machine learning algorithm
CN110489552B (en) Microblog user suicide risk detection method and device
Liu et al. A network-based CNN model to identify the hidden information in text data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination