CN109962907B - User identity recognition method based on big data and terminal equipment - Google Patents

User identity recognition method based on big data and terminal equipment Download PDF

Info

Publication number
CN109962907B
CN109962907B CN201910039490.9A CN201910039490A CN109962907B CN 109962907 B CN109962907 B CN 109962907B CN 201910039490 A CN201910039490 A CN 201910039490A CN 109962907 B CN109962907 B CN 109962907B
Authority
CN
China
Prior art keywords
user
website
registration
preset
dimensional vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910039490.9A
Other languages
Chinese (zh)
Other versions
CN109962907A (en
Inventor
唐晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201910039490.9A priority Critical patent/CN109962907B/en
Publication of CN109962907A publication Critical patent/CN109962907A/en
Application granted granted Critical
Publication of CN109962907B publication Critical patent/CN109962907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1073Registration or de-registration
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a user identity recognition method and terminal equipment based on big data, comprising the following steps: aiming at any website in a preset website list, acquiring registration information of all users of the website, wherein the registration information of the users comprises n registration items; acquiring n-dimensional vectors corresponding to the user according to n registration items of the user; acquiring an n-dimensional vector corresponding to a first user; the method comprises the steps that through n-dimensional vectors corresponding to a first user, correlation matching is conducted on the n-dimensional vectors corresponding to each user in a website in sequence, and the correlation of the first user and each user in the website is obtained in sequence; and if the user with the correlation degree higher than the preset value exists in the website, the identification of the website is sent to the first user. The method comprises the steps of acquiring registration information of each website user in a preset website list, establishing a large database, and determining all registered websites in the preset website list according to common registration information of the user so as to realize management of the registered websites by the user.

Description

User identity recognition method based on big data and terminal equipment
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a user identity recognition method based on big data and terminal equipment.
Background
In the age of rapid information growth, basically, each software or website has a set of own account system, and many functions need to register an account number of a user and can be used after logging in through the account number. The existing account systems among all websites or software are isolated, and data and information are not shared, so that a plurality of websites or software accounts belonging to the same user are not explicitly associated, and a plurality of difficulties are caused for account management of the user.
For example, in the existing software or website registration process, the mobile phone number of the user is usually required to be bound, or the user directly uses the mobile phone number as an account number/user name, when the user registers in a plurality of websites or software, if the user needs to change the mobile phone number, the registered website or registration information of the software needs to be sequentially modified, at this time, the user usually modifies the software or registration information of the website with higher use frequency, and forgets or ignores the modification of the software or registration information of the website with lower use frequency, so that the software or the website cannot be logged in later. The prior art lacks a method for identifying and managing various accounts of the same user.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a user identity recognition method and terminal equipment based on big data, so as to solve the problem that various network accounts of the same user cannot be recognized and uniformly managed in the prior art.
A first aspect of an embodiment of the present invention provides a method for identifying a user identity based on big data, including:
calling an API interface of a website aiming at any website in a preset website list, acquiring registration information of all users of the website through a web crawler, wherein the registration information of the users comprises n registration items aiming at any user, wherein n is more than or equal to 2;
aiming at any user, according to n registration items of the user, acquiring an n-dimensional vector corresponding to the user, wherein the value of the ith bit of the n-dimensional vector is a character string corresponding to the ith registration item of the user, i is more than or equal to 1 and less than or equal to n;
acquiring n registration items of a first user, and acquiring n-dimensional vectors corresponding to the first user according to the n registration items of the first user;
for any website in the preset website list, performing relevance matching with n-dimensional vectors corresponding to each user in the website sequentially through n-dimensional vectors corresponding to the first user, and obtaining relevance of each user in the first user and the website sequentially;
and if the user with the correlation degree higher than the preset value exists in the website, sending the identification of the website to the first user.
A second aspect of embodiments of the present invention provides a computer-readable storage medium storing computer-readable instructions that when executed by a processor perform the steps of:
calling an API interface of a website aiming at any website in a preset website list, acquiring registration information of all users of the website through a web crawler, wherein the registration information of the users comprises n registration items aiming at any user, wherein n is more than or equal to 2;
aiming at any user, according to n registration items of the user, acquiring an n-dimensional vector corresponding to the user, wherein the value of the ith bit of the n-dimensional vector is a character string corresponding to the ith registration item of the user, i is more than or equal to 1 and less than or equal to n;
acquiring n registration items of a first user, and acquiring n-dimensional vectors corresponding to the first user according to the n registration items of the first user;
for any website in the preset website list, performing relevance matching with n-dimensional vectors corresponding to each user in the website sequentially through n-dimensional vectors corresponding to the first user, and obtaining relevance of each user in the first user and the website sequentially;
and if the user with the correlation degree higher than the preset value exists in the website, sending the identification of the website to the first user.
A third aspect of an embodiment of the present invention provides a terminal device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer readable instructions:
calling an API interface of a website aiming at any website in a preset website list, acquiring registration information of all users of the website through a web crawler, wherein the registration information of the users comprises n registration items aiming at any user, wherein n is more than or equal to 2;
aiming at any user, according to n registration items of the user, acquiring an n-dimensional vector corresponding to the user, wherein the value of the ith bit of the n-dimensional vector is a character string corresponding to the ith registration item of the user, i is more than or equal to 1 and less than or equal to n;
acquiring n registration items of a first user, and acquiring n-dimensional vectors corresponding to the first user according to the n registration items of the first user;
for any website in the preset website list, performing relevance matching with n-dimensional vectors corresponding to each user in the website sequentially through n-dimensional vectors corresponding to the first user, and obtaining relevance of each user in the first user and the website sequentially;
and if the user with the correlation degree higher than the preset value exists in the website, sending the identification of the website to the first user.
The invention provides a user identity recognition method and terminal equipment based on big data, which are characterized in that a big database is established by acquiring registration information of each website user in a preset website list, and correlation matching is carried out according to common registration information of the user and registration information of each user in the preset website list, so that websites in all registered preset website lists of the user are determined, and management of registered websites by the user is realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a user identification method based on big data provided by an embodiment of the invention;
FIG. 2 is a block diagram of a user identification device based on big data according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a terminal device for user identification based on big data according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
The embodiment of the invention provides a user identity recognition method based on big data. Referring to fig. 1, the method includes:
s101, calling an API interface of a website aiming at any website in a preset website list, acquiring registration information of all users of the website through a web crawler, wherein the registration information of the users comprises n registration items aiming at any user, and n is more than or equal to 2.
Existing websites provide a variety of open API interfaces, such as a social networking site, including but not limited to API interfaces for retrieving user information, API interfaces for retrieving user post content, and the like. For a website, an API interface of the website is called, and the registration information of the website user can be crawled through a web crawler, wherein the registration information comprises a plurality of registration items such as mobile phone numbers, electronic mailboxes, user names, addresses, interests and hobbies of the user.
A preset website list is established, wherein the preset website list comprises website information of a plurality of websites, and identifiers of the websites, such as website name information and the like. And acquiring the user registration information of each website in turn, and establishing a large database of the user registration information.
S102, aiming at any user, according to n registration items of the user, acquiring an n-dimensional vector corresponding to the user, wherein the value of the ith bit of the n-dimensional vector is a character string corresponding to the ith registration item of the user, i is more than or equal to 1 and less than or equal to n.
For example, for any user, the registration information includes 5 registration items such as a mobile phone number, an email box, a user name, an address, an interest, and the like of the user, in this embodiment of the present invention, a 5-dimensional vector is created for each user, where a value of a first element of the 5-dimensional vector is a string corresponding to the mobile phone number of the user, a value of a second element is a string corresponding to the email box of the user, a value of a third element is a string corresponding to the user name of the user, a value of a fourth element is a string corresponding to the address of the user, and a value of a fifth element is a string corresponding to content such as the interest, the self-introduction, and the like of the user.
For any two users in the large database, the registration items corresponding to the elements in the same position in the corresponding n-dimensional vector are the same.
Further, if the registration information of a user does not include one or more of the n registration items, the position of the corresponding n-dimensional element is set to be null.
S103, acquiring n registration items of a first user, and acquiring n-dimensional vectors corresponding to the first user according to the n registration items of the first user.
Specifically, an instruction may be sent to the first user to instruct the first user to input registration items such as a mobile phone number, an email box, a user name, and the like that are commonly used by the user. For the creation process of the n-dimensional vector corresponding to the first user, refer to step S102, which is not described in detail in the embodiment of the present invention.
S104, aiming at any website in the preset website list, carrying out relevance matching with n-dimensional vectors corresponding to each user in the website in sequence through n-dimensional vectors corresponding to the first user, and obtaining relevance of the first user and each user in the website in sequence.
Specifically, for any website in a preset website list, the method realizes the calculation of the correlation between the first user and each user in the website by the following method:
sequentially calculating the similarity of two character strings positioned at the same position in an n-dimensional vector corresponding to the user and an n-dimensional vector corresponding to the first user aiming at any user in the website to obtain n similarity values;
according to the preset weight of each registration item in the n registration items, calculating the relevance between the first user and the user m according to the following formula aiming at any user m in the website:
Figure BDA0001947044140000061
wherein S is the correlation degree of the first user and the user m, w i Is the ith of the n registrationsThe preset weight value corresponding to the registration item,
Figure BDA0001947044140000062
is the similarity of the ith registration item in the n-dimensional vector corresponding to the first user and the user m.
It should be noted that in the embodiment of the present invention, since each registry has different roles in the process of identifying the user identity, if the registry of the mobile phone number of two users is identical, it can be determined that the two users are identical, but if the interests of the two users are identical, it cannot be determined that the two users are identical, so for different registries, the weight corresponding to the registry is preset, and if the preset weight corresponding to the registry of the mobile phone number of the user is higher than the weight corresponding to the registry of the interest of the user.
Furthermore, for character strings with different lengths, the embodiment of the invention provides two methods for obtaining the similarity of registration items:
in a first method, if the length of the character string of the i-th registration item in the n-dimensional vector corresponding to the first user is less than or equal to a first preset length, calculating the similarity of the i-th registration item in the n-dimensional vector corresponding to the first user and the user m according to the following formula
Figure BDA0001947044140000063
Figure BDA0001947044140000064
Wherein, the I X I is the length of the character string of the i-th registration item of the first user, the I Y I is the length of the character string of the i-th registration item of the user m, and the I X U Y I is the length of the character string after intersecting the character string of the i-th registration item of the first user and the i-th registration item of the user m according to the character sequence.
When the length of the character string of the i-th registration item in the n-dimensional vector corresponding to the first user and the length of the character string of the i-th registration item in the n-dimensional vector corresponding to the user m are smaller than or equal to a first preset length, the lengths of two character strings needing to calculate the similarity are short, the corresponding registration items are generally user names, mobile phone numbers, mailboxes and the like, and the similarity of the two character strings is calculated by the method, so that the calculation speed is high.
When the lengths of two character strings needing to calculate the similarity are longer, for example, the character string corresponding to the registration item of interest or self-evaluation of the user is generally longer, the character strings corresponding to the registration item of the two users are matched by the method, the calculation difficulty is higher, and at the moment, the similarity of the two character strings is calculated by the following method, namely:
if the character string length of the i-th registration item in the n-dimensional vector corresponding to the first user and the character string length of the i-th registration item in the n-dimensional vector corresponding to the user m are both larger than a first preset length, respectively extracting a key word of the i-th registration item of the first user and a key word of the i-th registration item of the user m through a word frequency-inverse document word frequency algorithm, wherein a is more than or equal to 2;
respectively acquiring a first word vector corresponding to the a key words of the i-th registration item of the first user and a second word vector corresponding to the a key words of the i-th registration item of the user m;
and obtaining the similarity of the ith registration item in the n-dimensional vector corresponding to the first user and the user m by calculating the cosine similarity of the first word vector and the second word vector.
S105, if the user with the correlation degree higher than the preset value exists in the website, the identification of the website is sent to the first user.
After the identification of the website is sent to the first user, the first user further determines whether the website is a registered user of the website, and further the user can identify and manage the registered website.
Still further, for the case that the same social networking site may be registered with multiple social networking sites, the user may post the same content to multiple social networking sites simultaneously, and attach a URL (Uniform Resource Locator ) of the post content of one social networking site to the post content of another social networking site.
Based on the above, the embodiment of the invention provides another user identity recognition method based on big data, which comprises the following steps: if the preset website list comprises a plurality of social networking sites and the first user is a registered user of the first social networking site, crawling the content published by the first user in the first social networking site within a preset time through a web crawler; if the content contains the Uniform Resource Locator (URL), acquiring the content corresponding to the URL through the URL; if the content corresponding to the URL is the content posted by the third user on the second social network site and the content posted by the first user on the first social network site is the same as the content corresponding to the URL, pushing the third user to the first user to instruct the first user to confirm whether the third user and the first user are the same user or not.
For example, in a preset period of time, the user SS posts content in a microblog (for understanding, referred to herein as a first social networking site), in the period of time, two social networking sites in other multiple social networking sites, which may be referred to herein as a second social networking site and a third social networking site, both post the same content, the second social networking site posts the content to a user with a user name SSA, and the third social networking site posts the content to a user with a user name/nickname SSX, it may be determined preliminarily that the user SS of the first social networking site, the user SSA of the second social networking site, and the user SSX of the third social networking site are the same user. At this time, the user SSA of the second social network site and the SSX of the third social network site are pushed to the first user SS, and the first user confirms whether the SSA and the SSX are the same user as the first user.
On the basis of the method, because the social network site users are all corresponding to the head portrait information, in order to further identify the user identity, the method provided by the embodiment of the invention further comprises the steps of obtaining the user head portrait of the first user and the user head portrait of the third user, carrying out face recognition on the user head portrait of the first user and the user head portrait of the third user, and judging that the first user and the third user are the same user if the user head portrait of the first user and the user head portrait of the third user are the same person.
Further, if the user publishes the same content on at least two social networking sites at the same time, the geographic locations of the published same content are generally the same, and based on this, to further identify the identity of the user, the method provided by the embodiment of the invention further includes: acquiring a first geographic position corresponding to the first user when the first social network site publishes the content and a second geographic position corresponding to the third user when the second social network site publishes the content; and if the distance between the first geographic position and the second geographic position on the map is smaller than the preset distance, judging that the first user and the third user are the same user.
On the basis of the embodiment, the embodiment of the invention also provides a social friend recommending method based on the user identification of big data, which comprises the following steps: if the first user and the fourth user are in a friend relationship on the first social network site, the third user and the fifth user are in a non-friend relationship on the second social network site, the first user and the third user are the same user, and the fourth user and the fifth user are the same user, pushing the fifth user to the third user on the second social network site.
For example, in the existing social network, friend recommendation is performed by the number of common friends, but the user a and the user B may not be in a friend relationship, but only have common friends, so that recommendation is inaccurate and user experience is poor. According to the social friend recommending method based on the user identity recognition of the big data, accurate recommendation of the social friends can be achieved, and user experience of a user on a social network is improved.
The embodiment of the invention provides a user identity recognition method based on big data, which is characterized in that registration information of each website user in a preset website list is acquired, a big database is established, and websites in all registered preset website lists of the user are determined according to common registration information of the user, so that the user can manage the registered websites.
Fig. 2 is a schematic diagram of a user identification device based on big data according to an embodiment of the present invention, and in combination with fig. 2, the device includes: a first acquisition unit 21, a second acquisition unit 22, a third acquisition unit 23 and a pushing unit 24;
the first obtaining unit 21 is configured to call an API interface of a preset website for any website in the list of websites, obtain registration information of all users of the websites through a web crawler, where the registration information of the users includes n registration items for any user, where n is greater than or equal to 2;
the second obtaining unit 22 is configured to obtain, for any user, according to n registration items of the user, an n-dimensional vector corresponding to the user, where a value of an ith bit of the n-dimensional vector is a string corresponding to the ith registration item of the user, where i is greater than or equal to 1 and less than or equal to n;
the second obtaining unit 22 is further configured to obtain n registration items of a first user, and obtain an n-dimensional vector corresponding to the first user according to the n registration items of the first user;
the third obtaining unit 23 is configured to sequentially match, for any website in the preset website list, the correlation between the first user and each user in the website by using the n-dimensional vector corresponding to the first user, and the n-dimensional vector corresponding to each user in the website;
the pushing unit 24 is configured to send, if a user with a relevance to the first user being higher than a preset value exists in the website, an identifier of the website to the first user.
Further, the third obtaining unit 23 is specifically configured to:
sequentially calculating the similarity of two character strings positioned at the same position in an n-dimensional vector corresponding to the user and an n-dimensional vector corresponding to the first user aiming at any user in the website to obtain n similarity values;
according to the preset weight of each registration item in the n registration items, calculating the relevance between the first user and the user m according to the following formula aiming at any user m in the website:
Figure BDA0001947044140000101
wherein S is the correlation degree of the first user and the user m, w i Is a preset weight corresponding to an ith registration item in the n registration items,
Figure BDA0001947044140000102
is the similarity of the ith registration item in the n-dimensional vector corresponding to the first user and the user m.
Further, the third obtaining unit 23 is further configured to:
if the character string length of the i-th registration item in the n-dimensional vector corresponding to the first user and the character string length of the i-th registration item in the n-dimensional vector corresponding to the user m are both larger than a first preset length, respectively extracting a key word of the i-th registration item of the first user and a key word of the i-th registration item of the user m through a word frequency-inverse document word frequency algorithm, wherein a is more than or equal to 2;
respectively acquiring a first word vector corresponding to the a key words of the i-th registration item of the first user and a second word vector corresponding to the a key words of the i-th registration item of the user m;
and obtaining the similarity of the ith registration item in the n-dimensional vector corresponding to the first user and the user m by calculating the cosine similarity of the first word vector and the second word vector.
Further, the third obtaining unit is further configured to:
if the character string length of the i-th registration item in the n-dimensional vector corresponding to the first user and the character string length of the i-th registration item in the n-dimensional vector corresponding to the user m are smaller than or equal to a first preset length, calculating the i-th registration item in the n-dimensional vector corresponding to the first user and the user m according to the following formulaSimilarity of items
Figure BDA0001947044140000111
Figure BDA0001947044140000112
Wherein, the I X I is the length of the character string of the i-th registration item of the first user, the I Y I is the length of the character string of the i-th registration item of the user m, and the I X U Y I is the length of the character string after intersecting the character string of the i-th registration item of the first user and the i-th registration item of the user m according to the character sequence.
Further, the apparatus further comprises a fourth obtaining unit 25 for: if the preset website list comprises a plurality of social networking sites and the first user is a registered user of the first social networking site, crawling the content published by the first user in the first social networking site within a preset time through a web crawler; if the content contains the Uniform Resource Locator (URL), acquiring the content corresponding to the URL through the URL; if the content corresponding to the URL is the content posted by the third user on the second social network site and the content posted by the first user on the first social network site is the same as the content corresponding to the URL, pushing the third user to the first user by the pushing unit 24 to instruct the first user to confirm whether the third user and the first user are the same user.
Further, the fourth obtaining unit 25 is further configured to: acquiring the user head portraits of the first user and the third user, carrying out face recognition on the user head portraits of the first user and the third user, judging that the first user and the third user are the same user if the user head portraits of the first user and the third user are the same person, or,
acquiring a first geographic position corresponding to the first user when the first social network site publishes the content and a second geographic position corresponding to the third user when the second social network site publishes the content;
and if the distance between the first geographic position and the second geographic position on the map is smaller than the preset distance, judging that the first user and the third user are the same user.
Further, if the first user and the fourth user are in a friend relationship on the first social networking site, the third user and the fifth user are in a non-friend relationship on the second social networking site, the first user and the third user are the same user, and the fourth user and the fifth user are the same user, the fifth user is pushed to the third user on the second social networking site through the pushing unit 24.
The embodiment of the invention provides a user identity recognition device based on big data, which is characterized in that a big database is built by acquiring registration information of each website user in a preset website list, and the registration information of each user in the big database is matched with the correlation degree according to the common registration information of the user in sequence, so that websites in the preset website list which are registered by the user are determined, and the management of the registered websites by the user is realized.
Fig. 3 is a schematic diagram of a terminal device based on user identification of big data according to an embodiment of the present invention. As shown in fig. 3, the terminal device 3 of this embodiment includes: a processor 30, a memory 31 and a computer program 32 stored in said memory 31 and executable on said processor 30, for example a user identification program based on big data. The processor 30, when executing the computer program 32, implements the steps of the embodiments of the big data based user identification method described above, such as steps 101 to 105 shown in fig. 1. Alternatively, the processor 30 may perform the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 21 to 25 shown in fig. 2, when executing the computer program 32.
Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 32 in the terminal device 3.
The terminal device 3 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The big data based user identification terminal device may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of a big data based subscriber identity terminal device 3 and does not constitute a limitation of the terminal device 3, and may comprise more or less components than shown, or may be combined with certain components, or different components, e.g. the big data based subscriber identity terminal device may further comprise input and output devices, network access devices, buses, etc.
The processor 30 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may be an internal storage unit of the terminal device 3, such as a hard disk or a memory of the terminal device 3. The memory 31 may be an external storage device of the terminal device 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the terminal device 3. The memory 31 is used for storing the computer program and other programs and data required for the big data based user identification terminal device. The memory 31 may also be used for temporarily storing data that has been output or is to be output.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the steps of the user identification method based on big data in any embodiment when being executed by a processor.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. The user identification method based on big data is characterized by comprising the following steps:
calling an API interface of a website aiming at any website in a preset website list, acquiring registration information of all users of the website through a web crawler, wherein the registration information of the users comprises n registration items aiming at any user, wherein n is more than or equal to 2;
aiming at any user, according to n registration items of the user, acquiring an n-dimensional vector corresponding to the user, wherein the value of the ith bit of the n-dimensional vector is a character string corresponding to the ith registration item of the user, i is more than or equal to 1 and less than or equal to n;
acquiring n registration items of a first user, and acquiring n-dimensional vectors corresponding to the first user according to the n registration items of the first user;
for any website in the preset website list, performing relevance matching with n-dimensional vectors corresponding to each user in the website sequentially through n-dimensional vectors corresponding to the first user, and obtaining relevance of each user in the first user and the website sequentially;
and if the user with the correlation degree higher than the preset value exists in the website, the identification of the website is sent to the first user, so that the user can identify and manage the website.
2. The method for identifying a user based on big data as claimed in claim 1, wherein the sequentially obtaining the relevance of the first user and each user in the website comprises:
sequentially calculating the similarity of two character strings positioned at the same position in an n-dimensional vector corresponding to the user and an n-dimensional vector corresponding to the first user aiming at any user in the website to obtain n similarity values;
according to the preset weight of each registration item in the n registration items, calculating the relevance between the first user and the user m according to the following formula aiming at any user m in the website:
Figure FDA0004027733970000011
wherein S is the correlation degree of the first user and the user m, w i Is a preset weight corresponding to an ith registration item in the n registration items,
Figure FDA0004027733970000021
is the similarity of the ith registration item in the n-dimensional vector corresponding to the first user and the user m.
3. The big data based user identification method of claim 2, further comprising:
if the character string length of the i-th registration item in the n-dimensional vector corresponding to the first user and the character string length of the i-th registration item in the n-dimensional vector corresponding to the user m are both larger than a first preset length, respectively extracting a key word of the i-th registration item of the first user and a key word of the i-th registration item of the user m through a word frequency-inverse document word frequency algorithm, wherein a is more than or equal to 2;
respectively acquiring a first word vector corresponding to the a key words of the i-th registration item of the first user and a second word vector corresponding to the a key words of the i-th registration item of the user m;
and obtaining the similarity of the ith registration item in the n-dimensional vector corresponding to the first user and the user m by calculating the cosine similarity of the first word vector and the second word vector.
4. The big data based user identification method of claim 2, further comprising:
if the character string length of the i-th registration item in the n-dimensional vector corresponding to the first user is smaller than the character string length of the i-th registration item in the n-dimensional vector corresponding to the user mIf the first registration item is equal to the first preset length, calculating the similarity of the ith registration item in the n-dimensional vector corresponding to the first user and the user m according to the following formula
Figure FDA0004027733970000022
/>
Figure FDA0004027733970000023
Wherein, the I X I is the length of the character string of the i-th registration item of the first user, the I Y I is the length of the character string of the i-th registration item of the user m, and the I X U Y I is the length of the character string after intersecting the character string of the i-th registration item of the first user and the i-th registration item of the user m according to the character sequence.
5. The big data based user identification method of any of claims 1-4, further comprising:
if the preset website list comprises a plurality of social networking sites and the first user is a registered user of the first social networking site, crawling the content published by the first user in the first social networking site within a preset time through a web crawler;
if the content contains the Uniform Resource Locator (URL), acquiring the content corresponding to the URL through the URL;
if the content corresponding to the URL is the content posted by the third user on the second social network site and the content posted by the first user on the first social network site is the same as the content corresponding to the URL, pushing the third user to the first user to instruct the first user to confirm whether the third user and the first user are the same user or not.
6. The big data based user identification method of claim 5, further comprising:
acquiring user head portraits of the first user and the third user, carrying out face recognition on the user head portraits of the first user and the third user, judging that the first user and the third user are the same user if the user head portraits of the first user and the third user are the same person, or acquiring a first geographic position corresponding to the first user when the first user publishes the content on a first social network site and a second geographic position corresponding to the third user when the third user publishes the content on a second social network site;
and if the distance between the first geographic position and the second geographic position on the map is smaller than the preset distance, judging that the first user and the third user are the same user.
7. The big data based user identification method of claim 6, further comprising:
if the first user and the fourth user are in a friend relationship on the first social network site, the third user and the fifth user are in a non-friend relationship on the second social network site, the first user and the third user are the same user, and the fourth user and the fifth user are the same user, pushing the fifth user to the third user on the second social network site.
8. A big data based user identification device, the device comprising: the device comprises a first acquisition unit, a second acquisition unit, a third acquisition unit and a pushing unit;
the first acquisition unit is used for calling an API (application program interface) of any website in a preset website list, acquiring registration information of all users of the website through a web crawler, wherein the registration information of the users comprises n registration items for any user, and n is more than or equal to 2;
the second obtaining unit is configured to obtain, for any user, according to n registration items of the user, an n-dimensional vector corresponding to the user, where a value of an ith bit of the n-dimensional vector is a string corresponding to the ith registration item of the user, where i is greater than or equal to 1 and less than or equal to n;
the second obtaining unit is further configured to obtain n registration items of a first user, and obtain an n-dimensional vector corresponding to the first user according to the n registration items of the first user;
the third obtaining unit is configured to, for any website in the preset website list, sequentially perform correlation matching with n-dimensional vectors corresponding to each user in the website through n-dimensional vectors corresponding to the first user, and sequentially obtain correlation between the first user and each user in the website;
the pushing unit is configured to send, if a user with a relevance to the first user being higher than a preset value exists in the website, an identifier of the website to the first user, so that the user identifies and manages the website.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.
10. A terminal device, characterized in that it comprises a memory, a processor, on which a computer program is stored which is executable on the processor, the processor executing the computer program to carry out the steps of the method according to any one of claims 1 to 7.
CN201910039490.9A 2019-01-16 2019-01-16 User identity recognition method based on big data and terminal equipment Active CN109962907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910039490.9A CN109962907B (en) 2019-01-16 2019-01-16 User identity recognition method based on big data and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910039490.9A CN109962907B (en) 2019-01-16 2019-01-16 User identity recognition method based on big data and terminal equipment

Publications (2)

Publication Number Publication Date
CN109962907A CN109962907A (en) 2019-07-02
CN109962907B true CN109962907B (en) 2023-04-21

Family

ID=67023437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910039490.9A Active CN109962907B (en) 2019-01-16 2019-01-16 User identity recognition method based on big data and terminal equipment

Country Status (1)

Country Link
CN (1) CN109962907B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111314496B (en) * 2020-05-15 2020-08-11 太平金融科技服务(上海)有限公司 Registration request intercepting method and device, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750464A (en) * 2012-02-06 2012-10-24 青岛印象派信息技术有限公司 Verification code method based on user identification

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073822A (en) * 2011-01-30 2011-05-25 北京搜狗科技发展有限公司 Method and system for preventing user information from leaking
CN102624643B (en) * 2011-08-05 2016-06-15 小米科技有限责任公司 A kind of method expanding contact people
CN102316167B (en) * 2011-09-26 2013-11-06 中国科学院计算机网络信息中心 Website recommending method, system thereof and network server
WO2015139500A1 (en) * 2014-03-18 2015-09-24 北京奇虎科技有限公司 Website analyzing and identifying method and device
CN106202297A (en) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 Identify the method and device of user interest
CN108829838B (en) * 2018-06-19 2021-11-26 彭建超 Batch processing method of account information and server
CN109063142B (en) * 2018-08-06 2021-03-05 网宿科技股份有限公司 Webpage resource pushing method, server and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750464A (en) * 2012-02-06 2012-10-24 青岛印象派信息技术有限公司 Verification code method based on user identification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
China's CDM Policies and Their Development Implications:Major Concerns for CDM Implementation.《中国人口·资源与环境(英文版)》.2006,(第02期), *

Also Published As

Publication number Publication date
CN109962907A (en) 2019-07-02

Similar Documents

Publication Publication Date Title
Riederer et al. Linking users across domains with location data: Theory and validation
US8095547B2 (en) Method and apparatus for detecting spam user created content
US9785989B2 (en) Determining a characteristic group
US8818049B2 (en) Retrieving contact information based on image recognition searches
US9225676B1 (en) Social network exploration systems and methods
WO2020156389A1 (en) Information pushing method and device
WO2016015468A1 (en) Data information transaction method and system
US20130097140A1 (en) Presenting social network connections on a search engine results page
CN111709052B (en) Private data identification and processing method, device, equipment and readable medium
US20190080000A1 (en) Entropic classification of objects
WO2017143930A1 (en) Method of sorting search results, and device for same
WO2019237541A1 (en) Method and apparatus for determining contact label, and terminal device and medium
CN112613917A (en) Information pushing method, device and equipment based on user portrait and storage medium
US11128479B2 (en) Method and apparatus for verification of social media information
US9846751B2 (en) Takepart action platform for websites
US20130179421A1 (en) System and Method for Collecting URL Information Using Retrieval Service of Social Network Service
US20140188626A1 (en) Method and apparatus for secure advertising
RU2701040C1 (en) Method and a computer for informing on malicious web resources
CN103365913A (en) Search result ordering method and device
CN113836131A (en) Big data cleaning method and device, computer equipment and storage medium
CN107656959B (en) Message leaving method and device and message leaving equipment
US20180337930A1 (en) Method and apparatus for providing website authentication data for search engine
CN112330382A (en) Item recommendation method and device, computing equipment and medium
CN109962907B (en) User identity recognition method based on big data and terminal equipment
US10412076B2 (en) Identifying users based on federated user identifiers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant