Invention content
This specification embodiment provides a kind of method, apparatus and electronic equipment of identification identity, realizes to user identity
Semantics recognition improves the discrimination of identification.
In a first aspect, this specification embodiment provides a kind of method of identification identity, this method includes:
Based on social networks, n remark information of the user to be identified by remarks, n >=2 are obtained;
According to the n remark information, the user vector of the user to be identified is obtained;
User vector described in identity-based vector sum carries out semantics recognition, and identification obtains the identity of the user to be identified.
Optionally, the method further includes:
Obtain m remark information of the user by remarks for having confirmed that identity, m >=2;
According to the m remark information, the identity vector of identity is had confirmed that described in acquisition.
Optionally, described that the user vector of the user to be identified is obtained according to the n remark information, including:
The n remark information is pre-processed, the remark information without real justice is removed;
Pretreated remark information is segmented, and each participle is converted into term vector;
Based on all term vectors, the user vector of the user to be identified is obtained.
Optionally, according to the n remark information, before the user vector for obtaining the user to be identified, the side
Method further includes:
Obtain the identity key in the n remark information;
Whether within a preset range to judge the ratio of remark information item number k and n that the identity key occur;
If the ratio of k and n in the first preset range, confirms that the identity of the user to be identified is that the identity is crucial
Word;
If the ratio of k and n not in the first preset range, according to the n remark information, obtains the user to be identified
User vector.
Optionally, user vector described in identity-based vector sum carries out semantics recognition, and identification obtains the user to be identified
Identity, including:
Obtain the target identities vector that the similarity between the identity vector and the user vector is more than given threshold;
Based on the corresponding target identities of target identities vector, the identity of the user to be identified is obtained.
Optionally, the corresponding target identities of target identities vector are based on, the identity of the user to be identified, packet are obtained
It includes:
Using the corresponding target identities of target identities vector as the identity of the user to be identified;Alternatively,
Judge to have confirmed that whether identity belongs to for user's ratio of the target identities in the affiliated group of user to be identified
In the second preset range, if the ratio belongs to the second preset range, determine that the identity to be identified is the target body
Part.
Optionally, the method further includes:
After the identity for obtaining the user to be identified, the mesh for predetermined keyword by the user annotation to be identified is obtained
User is marked, the predetermined keyword is to characterize the keyword of common identity;
Confirm that the identity of the target user is identical as the user to be identified.
Optionally, the method further includes:
Judge that identity is total number of users of the user of common identity and the group in the affiliated group of user to be identified
Between ratio whether belong to third preset range;
If so, confirming that the identity of the user to be identified is the common identity.
Second aspect, this specification embodiment provide a kind of device of identification identity, including:
Acquiring unit obtains n remark information of the user to be identified by remarks, n >=2 for being based on social networks;
Converting unit, for according to the n remark information, obtaining the user vector of the user to be identified;
Recognition unit carries out semantics recognition for user vector described in identity-based vector sum, waits knowing described in identification acquisition
The identity of other user.
Optionally, described device further includes:
Creating unit, for obtaining m remark information of the user for having confirmed that identity by remarks, m >=2;According to the m items
Remark information has confirmed that the identity vector of identity described in acquisition.
Optionally, the acquiring unit is used for:
The n remark information is pre-processed, the remark information without real justice is removed;
Pretreated remark information is segmented, and each participle is converted into term vector;
Based on all term vectors, the user vector of the user to be identified is obtained.
Optionally, described device further includes:
Matching unit, for according to the n remark information, before the user vector for obtaining the user to be identified,
Obtain the identity key in the n remark information;Judge remark information item number k and n that the identity key occur
Within a preset range whether ratio;If the ratio of k and n in the first preset range, confirms that the identity of the user to be identified is
The identity key;If the ratio of k and n is not in the first preset range, according to the n remark information, waited for described in acquisition
Identify the user vector of user.
Optionally, the recognition unit is used for:
Obtain the target identities vector that the similarity between the identity vector and the user vector is more than given threshold;
Based on the corresponding target identities of target identities vector, the identity of the user to be identified is obtained.
Optionally, the recognition unit is used for:
Using the corresponding target identities of target identities vector as the identity of the user to be identified;Alternatively,
Judge to have confirmed that whether identity belongs to for user's ratio of the target identities in the affiliated group of user to be identified
In the second preset range, if the ratio belongs to the second preset range, determine that the identity to be identified is the target body
Part.
Optionally, described device further includes:
Expanding element is used for after the identity for obtaining the user to be identified, and acquisition is by the user annotation to be identified
The target user of predetermined keyword, the predetermined keyword are to characterize the keyword of common identity;Confirm the target user's
Identity is identical as the user to be identified.
Optionally, described device further includes:
Expanding element, for judging that identity is the user of common identity and the group in the affiliated group of user to be identified
Whether the ratio between total number of users of body belongs to third preset range;If so, confirming that the identity of the user to be identified is institute
State common identity.
The third aspect, this specification embodiment also provide a kind of computer readable storage medium, are stored thereon with computer
Program, the program realize following steps when being executed by processor:
Based on social networks, n remark information of the user to be identified by remarks, n >=2 are obtained;
According to the n remark information, the user vector of the user to be identified is obtained;
User vector described in identity-based vector sum carries out semantics recognition, and identification obtains the identity of the user to be identified.
Fourth aspect, this specification embodiment provide a kind of electronic equipment, include memory and one or one
Above program, one of them either more than one program be stored in memory and be configured to by one or one with
It includes the instruction for being operated below that upper processor, which executes the one or more programs,:
Based on social networks, n remark information of the user to be identified by remarks, n >=2 are obtained;
According to the n remark information, the user vector of the user to be identified is obtained;
User vector described in identity-based vector sum carries out semantics recognition, and identification obtains the identity of the user to be identified.
Said one in this specification embodiment or multiple technical solutions, at least have the following technical effect that:
This specification embodiment provides a kind of method of identification identity, is based on social networks, it is standby to obtain user to be identified
The a plurality of remark information of note;According to a plurality of remark information of acquisition, the user vector of user to be identified is obtained;In turn, it is based on each
The user vector of the identity vector sum user to be identified of a identity carries out semantics recognition, significantly increases and is identified based on remark information
Go out the possibility of user identity, in the case of default identity key is not included especially in remark information, solves existing
Keywords matching carries out the technical problem that discrimination is relatively low existing for identification in technology, improves the identification of identification
Rate.
Specific implementation mode
To keep the purpose, technical scheme and advantage of this specification embodiment clearer, below in conjunction with this specification reality
The attached drawing in example is applied, the technical solution in this specification embodiment is clearly and completely described, it is clear that described reality
It is this specification a part of the embodiment to apply example, instead of all the embodiments.The embodiment of base in this manual, this field are general
The every other embodiment that logical technical staff is obtained without creative efforts belongs to this specification protection
Range.
Below in conjunction with the accompanying drawings to the main realization principle of this specification embodiment technical solution, specific implementation mode and its right
The advantageous effect that should be able to reach is explained in detail.
Referring to FIG. 1, a kind of method for identification identity that this specification embodiment provides, including:
S110:Based on social networks, n remark information of the user to be identified by remarks, n >=2 are obtained;
S120:According to the n remark information, the user vector of the user to be identified is obtained;
S130:User vector described in identity-based vector sum carries out semantics recognition, and identification obtains the user's to be identified
Identity.
Such as wechat, microblogging, nail nail in social networks, often mutual plusing good friend between user, and good friend is carried out standby
Note is in order to differentiate, wherein many remark informations are all related to the identity information of user, such as:" happy strong Xiao Liu ", " chain man Xiao Li ",
" permanent size younger sister " etc..Remark information of the embodiment of the present application based on user in social networks carries out identification, first carries out S110
Based on social networks, n remark information of the user to be identified by remarks is obtained.
Specifically, can obtain multiple users from one or more social platforms carries out a plurality of of remarks to user to be identified
Remark information.For different social platforms, it can identify whether it is same user by the essential information of user, such as:
Same user is may be considered by what same cell-phone number, mailbox or account were registered, the account that is mutually related may be considered
Same user.One user is more by the remark information of other remarks, and the accuracy for thus carrying out identification is higher, thus may be used
With its remark information of acquisition as much as possible, i.e. n is the bigger the better.
After n remark information for obtaining user to be identified, S120, S130 can be executed to obtain user's to be identified
Identity can also first pass through identity key matching to obtain the identity of user to be identified.Specifically, identity can be utilized crucial
Word dictionary matching remark information obtains the identity key in n remark information;Judge the remark information of existing identity key
The ratio of item number k and n whether within a preset range, if the ratio of k and n in the first preset range, confirms user's to be identified
Identity is identity key;If the ratio of k and n not in the first preset range, according to n remark information, obtains use to be identified
The user vector at family.
S120 obtains the user vector of user to be identified, can specifically be segmented to every remark information of acquisition, will
Each participle is converted to term vector, such as utilizes word2vec algorithms, is converted to the term vector V that each in remarks segmentsword。
All term vectors of user to be identified are based further on, the user vector of user to be identified is obtained.
Wherein it is possible to which the corresponding term vector of all remarks of user to be identified combines to obtain the user vector of user to be identified
Vcandidate_person, i.e.,
Vcandidate_person=f (Vword)
Combined method f (Vword) may be selected all term vectors of user to be identified being averaged per one-dimensional adduction.f
(Vword) can also select all term vectors of user to be identified being averaged per one-dimensional adduction as with reference to vector, removal away from
K (k >=1) a term vector farthest from reference vector, then every one-dimensional adduction again of remaining term vector after removal is averaged
Obtain user vector.Term vector is screened apart from farthest term vector by removing, removal deviates considerably from user identity
Term vector, to improve user vector acquisition accuracy.
In actual application, the remark information got may include the information without real justice, obtain user vector
When, first remark information can also be pre-processed, remove the remark information without real justice, such as empty remarks, mess code remarks, symbol are standby
Note etc..If remark information is that its format obtained from different social networks may be different, for the ease of subsequent identifying processing, also
It can be by pretreated remark information coding normalization.After remark information pretreatment and/or coding normalization, to processing
Remark information afterwards segmented, vector conversion, with more rapidly, accurately obtain user vector.
After obtaining user vector, further executes S130 identity-based vector sum user vectors and carries out semantics recognition,
Identification obtains the identity of user to be identified.Wherein, the identity of the present embodiment meaning can be specifically occupation, post, native place, age
Equal any informations, are below illustrated the method for identification by taking occupation as an example.Each identity corresponds to an identity
Vector needs to establish an identity vector to each identity before carrying out identification, specifically may be used following any
Mode establishes identity vector.
Mode one converts acquisition identity vector by identity key.Such as:Assuming that a certain identity key is " rule
Teacher " can utilize word2vec algorithms conversion " lawyer " to obtain its identity vector.
Mode two, acquisition have confirmed that m remark information of the user by remarks of identity, m >=2;According to having confirmed that identity
User obtains the identity vector for having confirmed that identity by m remark information of remarks.By having confirmed that the user of identity by remarks
A plurality of remark information is extensive, comprehensively covers and have confirmed that the relevant information of identity, is established according to a plurality of remark information
Have confirmed that the identity vector of identity is more accurate.Specifically, the user for having confirmed that identity can be obtained according to m remark information
Then vector obtains the identity vector for having confirmed that identity, you can with by all users for having determined that identity according to the user vector
Vector combination obtains professional vector Vjob_i.Wherein, the combined method of user vector may be selected to have confirmed that all users of identity
Vector is averaged per one-dimensional adduction:
Vjob_i=g (Vcandidate_person),where Vcandidate_person∈job_i
Wherein job_i expressions have confirmed that identity, Vcandidate_personIndicate the user vector got according to remark information.
Further, it can also have confirmed that the identity of identity is vectorial using the user vector for having confirmed that identity come continuous update so that
The identity vector consciousness that is more and more accurate, and being constantly close to the users of each identity.
Mode three, identity-based keyword and have confirmed that the user of identity by the remark information of remarks come establish identity to
Amount.Specifically, the term vector V of identity key can be obtainedcandidate_key, and according to having confirmed that the user of identity by remarks
A plurality of remark information obtains its user vector Vcandidate_person, the user of identity is had confirmed that further according to identity key vector sum
The linear adduction of vector obtains identity vector Vjob_i:
Vjob_i=Vcandidate_key+λ·g(Vcandidate_person),where Vcandidate_person∈job_i
Wherein, λ is weights, can be obtained according to Experiment Training.Likewise, the term vector V of identity keycandidate_key
It can be summed it up by a kind of term vector of the corresponding all keywords of identity and be averaged to obtain.
User vector based on the identity vector sum user to be identified established by any of the above-described mode carries out identification,
The target identities vector that the similarity between identity vector and user vector is more than given threshold can be first obtained, then, is based on
The corresponding target identities of target identities vector of acquisition, obtain the identity of user to be identified.
Specifically, can be by the user vector V of user to be identifiedcandidate_personWith identity vector VjobCosin is sought, is obtained
To the similarity ρ of the user vector and identity vector of user to be identified, when similarity ρ is more than given threshold θ:
ρ=cos (Vjob, Vcandidate_person)>θ
It is target identities vector to obtain the identity vector, is obtained based on the corresponding target identities of target identities vector to be identified
The identity of user.It should be noted that the present embodiment is not intended to limit the specific meter of similarity between user vector and identity vector
Calculation method can not only be obtained with the method for cosin, can also use Euclidean distance, Tanimoto coefficients, manhatton distance
The methods of obtain.
When obtaining the identity of user to be identified based on target identities, the body of user to be identified can be confirmed as with target identities
Part, it can also be further according to the acquisition of its identity of the social networks topological structure optimization of user to be identified.
Social networks topological structure
(1), for certain special occupations, such as intermediary's class occupation, temperature φ (the number D stored by peoplein+ storage people
Number Dout, i.e., " friend " number in social networks) and it is higher than ordinary people, therefore can be in ρ=cos for such occupation
(Vjob, Vcandidate_person)>The condition of the big Mr. Yu's threshold gammas of φ is added on the basis of θ, i.e., not only needs to obtain target identities,
When target identities are preset kind identity, user φ to be identified is obtained, as φ > γ, confirmation target identities are use to be identified
The identity at family, by keeping the identity of user to be identified more accurate the further judgement of user's temperature to be identified.
(2), user will inevitably generate various social passes as a member in social networks with other users
System, such as Peer Relationships, master and apprentice, the superior and the subordinate etc..Identification can be optimized with social networks:
I. it after the identity for obtaining user to be identified, obtains and is used by the target that user annotation to be identified is predetermined keyword
Family, the predetermined keyword are to characterize the keyword of common identity, such as " colleague ", " master worker ", " assistant ", confirm the target user
Identity it is identical as user to be identified.Such as:By the semantics recognition in above-described embodiment go out the professional λ of user A to be identified with
Afterwards, if obviously mark B is " colleague " to A, the occupation of that B is natural identical as A, can be confirmed that the occupation of B is also λ.As B is by other
When the people of occupation identical as A is labeled as the quantity increase of colleague, the confidence level that the occupation of B is λ can increase therewith, can also be more
When people's mark B of a identical occupation is " colleague ", confirm that the occupation of B is λ.
Ii. judge to have confirmed that whether identity belongs to second for user's ratio of target identities in the affiliated group of user to be identified
Preset range determines that the identity of user to be identified is target identities if ratio belongs to the second preset range.Such as:Assuming that second
Preset range is >=3, and Xiao Ming belongs to " 10 grades 3 classes " chat groups, is identified by user vector to Xiao Ming and identity vector,
It is " doctor " to obtain target identities, is judged in " 10 grades 3 classes " this chat group, if it is also " doctor " to have the identity of other users
And its quantity >=3 utilizes common identity in the affiliated group of user to be identified if so, determining that the identity of Xiao Ming is " doctor "
The judgement of quantity, to improve the accuracy rate of user identity identification.
Iii. before carrying out identification, it is similar to the thought of cluster, judges that the affiliated group of user to be identified is (such as a certain
The network platform, chat group etc.) in identity be common identity user and total number of users of group between ratio whether belong to the
Three preset ranges (are more than a certain ratio);It, can be with if so, show that the probability that user in the group is the common identity is very big
Confirm that the identity of user to be identified is common identity.It is of course also possible to after identifying the identity λ of user to be identified, obtain
The identity of other users can if when accounting for relatively high of the number of identity λ in the affiliated group of a user to be identified such as social subnet
With compare confidence the people for thinking the unrecognized identity of other in this subnet identity also for λ.
By 2 points of i as above and iii, the propagation of identity is carried out using social networks topological structure, to make entire identity
The coverage rate of recognition methods increases.
Method based on a kind of identification that above-described embodiment provides, the present embodiment is also corresponding to provide a kind of identification
Device, referring to FIG. 2, the device includes:
Acquiring unit 21 obtains n remark information of the user to be identified by remarks, n >=2 for being based on social networks;
Converting unit 22, for according to the n remark information, obtaining the user vector of the user to be identified;
Recognition unit 23 carries out semantics recognition for user vector described in identity-based vector sum, is waited for described in identification acquisition
Identify the identity of user.
As an alternative embodiment, described device further includes:Creating unit 24 has confirmed that identity for obtaining
User is by m remark information of remarks, m >=2;According to the m remark information, had confirmed that described in acquisition the identity of identity to
Amount.
As an alternative embodiment, the acquiring unit 21 is used for:The n remark information is located in advance
Reason removes the remark information without real justice;Pretreated remark information is segmented, and by it is each participle be converted to word to
Amount;Based on all term vectors, the user vector of the user to be identified is obtained.
As an alternative embodiment, described device further includes:
Matching unit 25, for the user vector for according to the n remark information, obtaining the user to be identified it
Before, obtain the identity key in the n remark information;Judge remark information item number k and n that the identity key occur
Ratio whether within a preset range;If the ratio of k and n in the first preset range, confirms the identity of the user to be identified
For the identity key;If the ratio of k and n is not in the first preset range, according to the n remark information, described in acquisition
The user vector of user to be identified.
As an alternative embodiment, the recognition unit 23 is used for:
Obtain the target identities vector that the similarity between the identity vector and the user vector is more than given threshold;
Based on the corresponding target identities of target identities vector, the identity of the user to be identified is obtained.Optionally, the identification is single
Member is additionally operable to:Using the corresponding target identities of target identities vector as the identity of the user to be identified;Alternatively, judging institute
It states and has confirmed that identity is whether user's ratio of the target identities belongs to the second preset range in the affiliated group of user to be identified,
If the ratio belongs to the second preset range, determine that the identity to be identified is the target identities.
As an alternative embodiment, described device further includes:
Expanding element 26, for after the identity for obtaining the user to be identified, obtaining by the user annotation to be identified
For the target user of predetermined keyword, the predetermined keyword is to characterize the keyword of common identity;Confirm the target user
Identity it is identical as the user to be identified.
As an alternative embodiment, expanding element 26 can be also used for judging the affiliated group of user to be identified
Middle identity is whether the ratio between the user and total number of users of the group of common identity belongs to third preset range;If
It is to confirm that the identity of the user to be identified is the common identity.
About the device in above-described embodiment, wherein each unit executes the concrete mode of operation in method
It is described in detail in embodiment, no longer elaborates herein.
Referring to FIG. 3, being that a kind of electronics for realizing data query method shown according to an exemplary embodiment is set
Standby 700 block diagram.For example, electronic equipment 700 can be computer, and database console, tablet device, personal digital assistant
Deng.
With reference to Fig. 3, electronic equipment 700 may include following one or more components:Processing component 702, memory 704,
Power supply module 706, multimedia component 708, the interface 710 and communication component 712 of input/output (I/O).
The integrated operation of 702 usual control electronics 700 of processing component is such as grasped with display, data communication, and record
Make associated operation.Processing element 702 may include one or more processors 720 to execute instruction, above-mentioned to complete
The all or part of step of method.In addition, processing component 702 may include one or more modules, it is convenient for 702 He of processing component
Interaction between other assemblies.
Memory 704 is configured as storing various types of data to support the operation in equipment 700.These data are shown
Example includes the instruction for any application program or method that are operated on electronic equipment 700, contact data, telephone directory number
According to, message, picture, video etc..Memory 704 can by any kind of volatibility or non-volatile memory device or they
Combination realize, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable
Programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, quick flashing
Memory, disk or CD.
Power supply module 706 provides electric power for the various assemblies of electronic equipment 700.Power supply module 706 may include power supply pipe
Reason system, one or more power supplys and other generated with for electronic equipment 700, management and the associated component of distribution electric power.
I/O interfaces 710 provide interface between processing component 702 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Communication component 712 is configured to facilitate the communication of wired or wireless way between electronic equipment 700 and other equipment.
Electronic equipment 700 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.Show at one
In example property embodiment, communication component 712 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, the communication component 712 further includes near-field communication (NFC) module, short to promote
Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module
(UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, electronic equipment 700 can be by one or more application application-specific integrated circuit (ASIC), number
Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 704 of instruction, above-metioned instruction can be executed by the processor 720 of electronic equipment 700 to complete the above method.Example
Such as, the non-transitorycomputer readable storage medium can be ROM, it is random access memory (RAM), CD-ROM, tape, soft
Disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of mobile terminal
When device executes so that electronic equipment is able to carry out a kind of data query method, the method includes:
Based on social networks, n remark information of the user to be identified by remarks, n >=2 are obtained;Believed according to the n remarks
Breath obtains the user vector of the user to be identified;User vector described in identity-based vector sum carries out semantics recognition, and identification obtains
Obtain the identity of the user to be identified.
It should be understood that the invention is not limited in the precision architectures for being described above and being shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims,
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, all within the spirits and principles of the present invention,
Any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.