CN110752958A - User behavior analysis method, device, equipment and storage medium - Google Patents

User behavior analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN110752958A
CN110752958A CN201911048809.0A CN201911048809A CN110752958A CN 110752958 A CN110752958 A CN 110752958A CN 201911048809 A CN201911048809 A CN 201911048809A CN 110752958 A CN110752958 A CN 110752958A
Authority
CN
China
Prior art keywords
user
data
target
application program
identity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911048809.0A
Other languages
Chinese (zh)
Inventor
陈大伟
汪明玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201911048809.0A priority Critical patent/CN110752958A/en
Publication of CN110752958A publication Critical patent/CN110752958A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the disclosure provides a user behavior analysis method, a device, equipment and a storage medium, wherein the method comprises the following steps: receiving target data in at least one target application program sent by each user terminal in a plurality of user terminals; performing data processing on the target data in multiple dimensions to obtain the matching degree of the target data in each dimension; determining a plurality of identity identifications belonging to the same user in each target application program according to the matching degree of the target data on each dimension; and determining user behavior data of the user corresponding to the plurality of identity identifications on each target application program according to the plurality of identity identifications. The embodiment of the disclosure can solve the problem that the user behavior analysis method in the prior art cannot sufficiently analyze the characteristics of the user.

Description

User behavior analysis method, device, equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of data processing, and in particular, to a user behavior analysis method, device, equipment and storage medium.
Background
With the continuous development of network and information technology, various platforms are applied, and in order to improve various functions of the platform, provide better service for users by the platform and protect benefits of the platform, various information of the users need to be known and analyzed.
At present, various content application programs pay attention to original users and text data, the original users may be simultaneously active in a plurality of application programs, namely, the users may adopt different identification marks in different application programs, the identification marks are not related generally, so that the application program monitoring is not favorable, and great disadvantages are brought to understanding and analyzing various information of the users. Therefore, how to establish the association relationship among the multiple identity identifications of the same user of each platform is important to realize the association of the multiple identity identifications of the user, and further realize the behavior analysis of the user.
However, the existing user behavior analysis methods mainly construct user behavior data according to user metadata, such as gender, age, occupation, constellation, height, weight, shopping type, brand preference and/or income, and such user behavior analysis methods cannot sufficiently analyze the characteristics of the user.
Disclosure of Invention
The embodiment of the disclosure provides a user behavior analysis method, a user behavior analysis device and a storage medium, so as to solve the problem that the user behavior analysis method in the prior art cannot sufficiently analyze the characteristics of a user.
In a first aspect, an embodiment of the present disclosure provides a user behavior analysis method, including:
receiving target data in at least one target application program sent by each user terminal in a plurality of user terminals;
performing data processing on the target data in multiple dimensions to obtain the matching degree of the target data in each dimension;
determining a plurality of identity identifications belonging to the same user in each target application program according to the matching degree of the target data on each dimension;
and determining user behavior data of the user corresponding to the plurality of identity identifications on each target application program according to the plurality of identity identifications.
In a second aspect, an embodiment of the present disclosure provides a user behavior analysis apparatus, including:
the target data receiving module is used for receiving target data in at least one target application program sent by each user terminal in the plurality of user terminals;
the first matching degree determining module is used for carrying out data processing on the target data on multiple dimensions to obtain the matching degree of the target data on each dimension;
the identification determining module of the first user is used for determining a plurality of identifications belonging to the same user in each target application program according to the matching degree of the target data on each dimension;
and the first user behavior data determining module is used for determining the user behavior data of the user corresponding to the plurality of identity identifications on each target application program according to the plurality of identity identifications.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor, a memory, and a communication interface;
the communication interface is used for communicating with each user terminal;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the user behavior analysis method as described above in the first aspect and in various possible designs of the first aspect.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the user behavior analysis method according to the first aspect and various possible designs of the first aspect is implemented.
The user behavior analysis method, device, equipment and storage medium provided by the embodiment of the disclosure firstly receive target data in at least one target application program sent by each user terminal in a plurality of user terminals, start to analyze the target data, the target data is subjected to data processing in multiple dimensions to obtain the matching degree of the target data in each dimension, the characteristics of the user can be fully extracted through the Internet, and then according to the matching degree of the target data on each dimension, determining a plurality of identity identifications belonging to the same user in each target application program, clustering each user in each target application program, and according to the plurality of identity identifications, determining user behavior data of the users corresponding to the plurality of identity identifications on each target application program, so as to realize supervision and management of the users on the application program or the platform. According to the method and the device for analyzing the user behavior, the target data are subjected to data processing in multiple dimensions to obtain the matching degree of the target data in each dimension, then multiple identity identifications belonging to the same user in each target application program are determined according to the matching degree of the target data in each dimension, the characteristics of the user can be fully combined, and then the user behavior can be accurately analyzed, namely the user behavior can be timely and accurately analyzed by realizing the association among the multiple identity identifications of the same user of each platform or application program, and each platform or user can be better supervised and managed.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic structural diagram of a user behavior analysis system provided in an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a user behavior analysis method according to an embodiment of the present disclosure;
fig. 3 is a schematic flow chart of a user behavior analysis method according to another embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a user behavior analysis method according to another embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a user behavior analysis method according to yet another embodiment of the present disclosure;
fig. 6 is a schematic flow chart of a user behavior analysis method according to another embodiment of the present disclosure;
fig. 7 is a block diagram of the structure of the user behavior analysis apparatus according to the embodiment of the present disclosure;
fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
At present, various content application programs pay attention to original users and text data, the original users may be simultaneously active in a plurality of application programs, namely, the users may adopt different identification marks in different application programs, the identification marks are not related generally, so that the application program monitoring is not favorable, and great disadvantages are brought to understanding and analyzing various information of the users. For the above problems, the prior art proposes a concept of building user identity association to further analyze user behavior, but in the prior art, user behavior data is mainly built according to user metadata, such as gender, age, occupation, constellation, height, weight, shopping type, brand preference and/or income, and this method for analyzing user behavior cannot sufficiently combine internet information and cannot sufficiently embody the characteristics of the user, which further causes unfavorable application program supervision and brings great disadvantages to understanding and analyzing various information of the user. The embodiment of the present disclosure provides a user behavior analysis method to solve the above problem.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a user behavior analysis system according to an embodiment of the present disclosure. The user behavior analysis system provided by the embodiment of the present disclosure includes a user terminal 10 and a terminal device (or server) 20, where a plurality of users upload target data in at least one target application program through their own user terminals, and one user may upload the same target data or different target data in one or more target application programs through their own user terminals; then the terminal device receives target data in at least one target application program sent by each user terminal in the plurality of user terminals and stores the target data in the memory, and then the terminal device sends a data processing instruction to a processor or a server of the terminal device so that the processor or the server performs data processing on the target data in at least one target application program stored in the memory, namely performs data processing on the target data in a plurality of dimensions to obtain the matching degree of the target data in each dimension, determines a plurality of identification marks belonging to the same user in each target application program according to the matching degree of the target data in each dimension, and determines the identification mark association relationship of the user on each target application program corresponding to the plurality of identification marks according to the plurality of identification marks, and further, the characteristics of the user are fully embodied to analyze the user behaviors of the user corresponding to the plurality of identity identifications.
The user behavior analysis method can be realized in the following ways:
referring to fig. 2, fig. 2 is a schematic flowchart of a user behavior analysis method according to an embodiment of the present disclosure. The method of the embodiment of the present disclosure may be applied in a terminal device or a server, that is, the execution subject may be the terminal device or the server. The user behavior analysis method comprises the following steps:
s101, receiving target data in at least one target application program sent by each user terminal in a plurality of user terminals.
In the embodiment of the present disclosure, the terminal device may receive, through the communication interface, target data in at least one target application (platform) sent by each of the plurality of user terminals, store the target data in a memory of the terminal device, and then send a processing instruction to its own processor or server, so that the processor or server may analyze and process the target data in the memory and the target application.
The target application program is not limited, and may be an application program on any existing platform, each user terminal is not limited, and may be a terminal capable of uploading data, such as a mobile phone, a computer, a tablet, and the like, and each user terminal may upload target data in one or more target application programs, so that statistics and analysis on the target application program may be performed on a plurality of target application programs corresponding to a plurality of user terminals, and user behavior analysis is performed on a user in the target application program by taking any one of the plurality of target application programs as an example below.
S102, carrying out data processing on the target data in multiple dimensions to obtain the matching degree of the target data in each dimension.
In the embodiment of the present disclosure, target data in at least one target application program sent by each user terminal is subjected to data processing through multiple dimensions, where the target data may include published data, a user avatar, and a user nickname, and a specific process of the data processing may be different in each of the dimensions, for example, the multiple dimensions may include a published data dimension, an avatar dimension, and a nickname dimension, and may further include a communication number dimension.
The matching degree of the target data sent by the user terminal corresponding to each user can be obtained in each dimension, and the matching degree refers to the matching degree between the target data in one target application program sent by one user a through the user terminal a and the target data in other target application programs sent by the user through the user terminal, and the target data in at least one target application program sent by other users (possibly one other user B, possibly a plurality of other users B \ C \ D, and the like) through the corresponding user terminal.
S103, determining a plurality of identity identifications belonging to the same user in each target application program according to the matching degree of the target data in each dimension.
In the embodiment of the present disclosure, after the matching degree of the target data sent by each user terminal in each dimension is obtained, in order to accurately determine the characteristics of the user, the matching degree of the target data sent by each user terminal in each dimension needs to be weighted or is implemented by screening the matching degree in each dimension layer by layer, so as to obtain a plurality of identifiers belonging to the same user in all the target application programs.
For example, the target data uploaded by the user a in the target application 1 is 3: target data 1, target data 2, and target data 3, 4 for the target data in the target application B: the method comprises the steps of searching target data 4, target data 5, target data 6 and target data 7, searching the target data 4, target data 5, target data 6 and target data 7 which belong to the target data 1, target data 2 and target data 3 respectively and are uploaded by a user through a user terminal, and counting a plurality of identity marks of the same user in a target application program A and a target application program B, namely which target data are the target data of a user A and are uploaded on the target application program A and the target data on the target application program B by different user names, so that identity mark association of the same user is realized.
And S104, determining user behavior data of the user corresponding to the plurality of identity identifications on each target application program according to the plurality of identity identifications.
In the embodiment of the disclosure, after obtaining multiple identifiers belonging to the same user in each target application program, establishing a relationship between the identifiers of the users for the multiple identifiers of the same user, and taking the relationship between the identifiers of the users of the same user between different target application programs as user behavior data, and analyzing the user behavior of the users on different application programs through the relationship between the identifiers of the users. The user behavior here may be at least one of a sharing behavior and a forwarding behavior.
In practical applications, taking sharing as an example, when a user a shares content published on an application program to another application program, a terminal device or a server with authorization may obtain a user id in the application program before sharing and a user id in another application program after sharing, and then association between a platform where the user a is located and the shared platform may be achieved. Or implementation of indirect association: if the associated user of the A platform (the platform can be regarded as an application program) on the B platform is an A ' user, and the associated user of the A ' user of the B platform on the C platform is an A ' user, the user identity identification association relation between the A user-A ' user ' among the A platform, the B platform and the C platform can be established. The identity identification relationship is used for the platform to manage and analyze the user behavior, and the like, so that supervision and management of the user on the application program or the platform can be realized.
Specifically, published data (articles, pictures, expressions and the like) and basic data (head portraits, nicknames and the like) of a user are collected in a plurality of platforms, and matching degrees (for example, coincidence degrees of published texts of the user on the plurality of platforms) of user data (target data, including the published data and the basic data) on the plurality of platforms are obtained in a plurality of different dimensions, so that the identity of the same user in each platform is identified by combining the matching degrees of the dimensions; and further, establishing the identity identification incidence relation of the users among the platforms, and further realizing the analysis of the user behavior of the same user on the platforms.
In the embodiment of the disclosure, first, target data in at least one target application program sent by each user terminal in a plurality of user terminals is received, the target data is analyzed, the target data is subjected to data processing in a plurality of dimensions, a matching degree of the target data in each dimension is obtained, characteristics of a user can be sufficiently extracted through the internet, then, according to the matching degree of the target data in each dimension, a plurality of identity identifiers belonging to the same user in each target application program are determined, clustering is performed on each user in each target application program, and then, according to the plurality of identity identifiers, an identity identifier association relationship of the user corresponding to the plurality of identity identifiers on each target application program is determined for analyzing user behaviors corresponding to the plurality of identity identifiers, and supervision and management of the user on an application program or a platform are realized.
According to the method and the device for analyzing the user behavior, the target data are subjected to data processing in multiple dimensions to obtain the matching degree of the target data in each dimension, then multiple identity identifications belonging to the same user in each target application program are determined according to the matching degree of the target data in each dimension, the characteristics of the user can be fully combined, the user behavior can be accurately analyzed, the identity identification association of the same user of each platform is realized, the user behavior can be timely and accurately analyzed, and each platform or user can be better supervised and managed.
In practical application, the behavior of different users in each platform is analyzed, the identity identification incidence relation of the same user on different platforms is established, the behavior of the user can be further analyzed according to the identity identification incidence relation of the user, and each platform or user can be better supervised and managed according to the analyzed user behavior data.
In order to obtain the matching degree of the target data in each dimension, referring to fig. 3, fig. 3 is a schematic flow chart of a user behavior analysis method according to another embodiment of the present disclosure, and the embodiment of the present disclosure details S102 on the basis of the foregoing disclosed embodiment. The data processing of the target data in multiple dimensions to obtain the matching degree of the target data in each dimension includes:
s201, determining coincidence data and/or coincidence rate of the published data according to published data in the target data, and taking the coincidence data and/or coincidence rate as matching degree of the target data on the published data dimension.
In the embodiment of the disclosure, according to published data in the target data, coincidence data and/or coincidence rate of the published data are determined by counting coincidence of the published data on each target application program. Wherein, determining the coincidence data and/or coincidence rate of the published data can be achieved in two ways:
the first method is as follows: assuming that a user publishes a plurality of published data in a certain target application program, taking a target application program as an example, according to the published data, counting whether repeated published data exists in each published data on the target application program and at least two published data on other target application programs, performing quantity accumulation on the published data with the repetition, taking the accumulated quantity as coincidence data of the user in the target application program, taking a ratio between the coincidence data and the quantity corresponding to the published data as a coincidence rate, and taking the coincidence data or the coincidence rate or a weighted value of the coincidence data and the coincidence rate as a matching degree of the target data in the published data dimension.
Specifically, taking published data 1, published data 2 and published data 3 published by a user a on a target application program a as an example, obtaining the text data of all users on a target application program B, counting whether the published data 1 is overlapped with the text data of all users obtained on the target application program B for each published data published by the user a on the target application program a, if so, the overlapped data is 1, recording the identity of the user overlapped on the target application program B, and so on, counting the total overlapped data corresponding to all the published data published by the user a, that is, the overlapped data corresponding to each published data published by the user a on the target application program a, and accumulating the accumulated data as the overlapped data.
The second method comprises the following steps: referring to fig. 4, fig. 4 is a schematic flow chart of a user behavior analysis method according to another embodiment of the present disclosure, and the embodiment of the present disclosure explains S201 in detail on the basis of the foregoing embodiment. The published data comprises at least one item of articles, pictures and videos, and the number of the published data is at least one; determining coincidence data and/or coincidence rate of the published data according to published data in the target data, including:
s301, determining a reference application program from the target application programs, wherein the reference application program is any one of the target application programs;
s302, clustering at least one published data corresponding to each user in the reference application program with at least one published data corresponding to each user in other target application programs in each target application program to obtain a clustering result of each published data corresponding to each user in the reference application program, wherein the clustering result of each published data comprises the identity of a plurality of target users in other target application programs in each target application program;
s303, counting the times of the identity of each target user in the identity identifications of the target users appearing in the clustering results corresponding to the users in the reference application program according to the clustering results corresponding to the users in the reference application program;
s304, taking the times as coincidence data of at least one published data corresponding to each user on the reference application program;
s305, making a ratio of the coincidence data to the number of at least one published data published by the user on the reference application program through the user terminal corresponding to the coincidence data, and taking the ratio as the coincidence rate of at least one published data corresponding to each user on the reference application program.
In this embodiment of the present disclosure, any one of the target application programs may be used as a reference application program, and a specific process of determining coincidence data and/or coincidence rate of published data, taking a plurality of target data sent by a certain user through a user terminal of the user on the reference application program as an example, is as follows:
firstly, clustering at least one publication data published by the user on a reference application program and at least one publication data published by each user on other target application programs, finding out publication data similar to the publication data published by each user on other target application programs on each publication data of the user on the reference application program, and taking all target data similar to each type on the reference application program and other target application programs as a clustering identifier, wherein the clustering identifier is used for representing an identity identification group of the users corresponding to the similar publication data, and one user on one target application program corresponds to the identity identification of one user. And then acquiring the identity of the user which appears most from the identity group of each user, taking the user corresponding to the identity of the user which appears most as a user similar to the user, and taking the number of times of the identity of the user which appears most as the coincidence data of at least one published data corresponding to the user on a reference application program, wherein the coincidence data refers to the coincidence data of all published data of the user acquired on the reference application program, and taking the ratio of the coincidence data to the total number of all published data of the user acquired on the reference application program as the coincidence rate. The coincidence data or the coincidence rate or the weighted value of the coincidence data and the coincidence rate can be used as the matching degree of the target data on the published data dimension.
Specifically, for example, it may be determined that the users who are suspected to be the same (may be combined with other contents) or the same user by determining whether the number of coincidences (coincidence data) and/or the coincidence rate (the ratio of the number of coincidences to the base of the message) of the published data is greater than or equal to a preset threshold value, and if so, determining that the users are the same. The acquiring mode of the coincidence number can be as follows: aiming at each publication content (or part of publication content) of the user A on the platform A, such as 100 articles, then clustering the 100 articles and the publication content on the platform B respectively to obtain a clustering result of each article, wherein the clustering result comprises the identity of a plurality of users on the platform B, and therefore, the occurrence frequency of the identity of the users on each platform B is used as the coincidence number.
S202, according to the user head portraits in the target data, obtaining the similarity of the feature values of the user head portraits among the users through local sensitive Hash calculation, and taking the similarity of the feature values of the user head portraits as the matching degree of the target data in the head portraits dimension;
s203, according to the nickname in the target data, obtaining the similarity of the corresponding nickname between the users, and taking the similarity of the nickname as the matching degree of the target data on the nickname dimension.
In the embodiment of the disclosure, the same user is identified by using the text coincidence condition of the user and combining with other user basic data. The basic data includes: at least one of a head portrait, a nickname, and a communication number.
And aiming at the dimension of the head portrait, the matching degree of the head portrait of the user obtains the characteristic values of the head portrait of the user through the local sensitive hash value, and then, the similarity between the characteristic values is used as the matching degree. And aiming at the nickname dimension, acquiring the similarity between the nicknames of the users through an identification technology, and taking the similarity between the nicknames of the users as the matching degree.
Through the analysis of target data on multiple dimensions, the characteristics of the user can be fully extracted through the internet technology, the incidence relation among multiple identity identifications of the same user among multiple platforms is convenient to establish, and the monitoring, management, analysis and the like of the platforms are further realized.
To elaborate how to determine multiple ids belonging to the same user in each of the target applications, the following two implementations can be implemented:
the first method is as follows: as shown in fig. 5. Fig. 5 is a schematic flow chart of a user behavior analysis method according to still another embodiment of the present disclosure, and the embodiment of the present disclosure describes S103 in detail on the basis of the foregoing disclosed embodiment, for example, on the basis of the embodiment described in fig. 4. The determining, according to the matching degree of the target data in each dimension, a plurality of identifiers belonging to the same user in each target application includes:
s401, if the coincidence data is used as the matching degree of the target data on the dimension of the published data, and the coincidence data is larger than a first preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence data and the identity of the user on the reference application program corresponding to the coincidence data as a plurality of identities of the same user.
In the embodiment of the disclosure, for the published data dimension, if coincidence data is selected as the matching degree of the target data in the published data dimension, it is determined whether the coincidence data is greater than a first preset threshold, if the coincidence data is greater than the first preset threshold, if there are identities of a plurality of users of the coincidence data greater than the first preset threshold, the user on the target application program, that is, on the B platform, corresponding to the coincidence data greater than the first preset threshold is the same target user as the a user, or, if there are identities of the users of the plurality of coincidence data, the user on the target application program, that is, on the B platform, corresponding to the largest coincidence data in the coincidence data greater than the first preset threshold is the same target user as the a user. For example, the identity of the user whose coincidence data with the user a is greater than the first preset threshold on the platform B is identified by the user id 1 and the user id 2, and the coincidence data corresponding to the user id 1 is greater than the coincidence data corresponding to the user id 2, so that the user corresponding to the user id 1 is the same target user as the user a.
S402, if the coincidence rate is used as the matching degree of the target data on the dimension of the published data, and the coincidence rate is larger than a second preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence rate and the identity of the user on the reference application program corresponding to the coincidence rate as a plurality of identities of the same user.
In the embodiment of the present disclosure, if the coincidence rate is used as the matching degree of the target data in the published data dimension, it is determined whether the coincidence rate is greater than a second preset threshold, if there are multiple user identities whose coincidence rates are greater than the second preset threshold, the user on the target application program, i.e., the B platform, corresponding to the coincidence rate greater than the second preset threshold is the same target user as the a user, or, if there are multiple user identities whose coincidence rates are greater than the second preset threshold, the user on the target application program, i.e., the B platform, corresponding to the maximum coincidence rate among the coincidence rates greater than the second preset threshold is the same target user as the a user. For example, the identity of the user having the coincidence rate larger than the second preset threshold with the user a on the platform B is identified by the identity 1 of the user and the identity 2 of the user, and the coincidence rate corresponding to the identity 1 of the user is larger than the coincidence rate corresponding to the identity 2 of the user, so that the user corresponding to the identity 1 of the user is the same target user as the user a.
And S403, if the coincidence data is less than or equal to a first preset threshold value and/or the coincidence rate is less than or equal to a second preset threshold value, and the similarity of the feature values of the user avatar is greater than a third preset threshold value, taking the identity of the user on the reference application program corresponding to the similarity of the feature values of the user avatar and the identity of the user on other target application programs in each target application program corresponding to the similarity of the feature values of the user avatar as a plurality of identity of the same user.
In this embodiment of the disclosure, for the avatar dimension, if the coincidence data is less than or equal to the first preset threshold and/or the coincidence rate is less than or equal to the second preset threshold, and the similarity of the feature values of the avatar of the user is greater than the third preset threshold, the user on the target application program, that is, on the B platform, corresponding to the similarity of the feature values of the avatar of the user greater than the third preset threshold is the same target user as the a user, or, if there is an identity of the user whose similarity of the feature values of the avatar of the user is greater than the third preset threshold, the user on the target application program, that is, on the B platform, corresponding to the maximum value among the similarities of the feature values of the avatar of the user greater than the third preset threshold is the same target user as the a user. For example, the identity of the user having the similarity of the feature value of the avatar of the user greater than the third preset threshold with the user a on the platform B is identified by the identity 1 of the user and the identity 2 of the user, and the similarity of the feature value of the avatar of the user corresponding to the identity 1 of the user is greater than the similarity of the feature value of the avatar of the user corresponding to the identity 2 of the user, so that the user corresponding to the identity 1 of the user is the same target user as the user a.
S404, if the coincidence data is smaller than or equal to a first preset threshold and/or the coincidence rate is smaller than or equal to a second preset threshold, the similarity of the feature values of the user head portrait is smaller than or equal to a third preset threshold, and the similarity of the user nickname is larger than a fourth preset threshold, taking the identity of the user on other target application programs in each target application program corresponding to the similarity of the user nickname and the identity of the user on the reference application program corresponding to the similarity of the user nickname as a plurality of identities of the same user.
In this embodiment of the disclosure, for a nickname dimension, if coincidence data is less than or equal to a first preset threshold and/or the coincidence rate is less than or equal to a second preset threshold, the similarity of the feature values of the avatar of the user is less than or equal to a third preset threshold, and the similarity of the nickname of the user is greater than a fourth preset threshold, the user on the target application program, that is, the B platform, corresponding to the similarity of the nickname of the user greater than the fourth preset threshold is the same target user as the a user, or if there are identities of users whose similarities of the nicknames of the users are greater than the fourth preset threshold, the user on the target application program, that is, the B platform, corresponding to the maximum value among the similarities of the nicknames of the users greater than the fourth preset threshold is the same target user as the a user. For example, the user identifier 1 and the user identifier 2 of the user having the user nickname similarity greater than the fourth preset threshold with the user a on the platform B, and the user nickname similarity corresponding to the user identifier 1 is greater than the user nickname similarity corresponding to the user identifier 2 of the user, so that the user corresponding to the user identifier 1 is the same target user as the user a.
I.e. by means of layer-by-layer screening. For example, candidate users are screened out by using the coincidence condition of the text data, then, the target user which is the same as the user A on the platform B is further determined by the matching degree of the basic data, the multiple identity identifications of the same user of each platform or application program are correlated, the user behavior is accurately analyzed, and each platform or user is better supervised and managed.
The second method comprises the following steps: as shown in fig. 6. Fig. 6 is a schematic flow chart of a user behavior analysis method according to another embodiment of the present disclosure, and the embodiment of the present disclosure describes S103 in detail on the basis of the foregoing disclosed embodiment, for example, on the basis of the embodiment described in fig. 4. The determining, according to the matching degree of the target data in each dimension, a plurality of identifiers belonging to the same user in each target application includes:
s501, if the coincidence rate corresponding to the target data is used as the matching degree of the target data on the published data dimension, carrying out weighted fusion on the coincidence rate, the similarity of the feature values of the head portrait of the user and the similarity of the nickname of the user to obtain the weighted matching degree;
s502, taking the identity of the user on the other target application programs in each target application program corresponding to the weighted maximum matching degree and the identity of the user on the reference application program corresponding to the weighted maximum matching degree as a plurality of identities of the same user.
In the embodiment of the disclosure, the coincidence rate corresponding to the target data is selected as the matching degree of the target data in the published data dimension, and 2) the matching degree of the user a and the users on the platform B is obtained by weighting the dimension data, so that the platform B user with the highest matching degree is selected as the target user. Or, the target user of the B platform corresponding to the user a is obtained by further clustering the data of the above-mentioned dimensions, that is, clustering from the user dimensions, and this implementation can meet the processing requirement of large-scale data.
In the embodiment of the disclosure, published data (articles, pictures, expressions, and the like) and basic data (head portraits, nicknames, communication numbers, and the like) of a user are collected in a plurality of platforms, and matching degrees (for example, coincidence degrees of published texts of the user on the plurality of platforms) of user data (here, target data, including the published data and the basic data) on the plurality of platforms are obtained in a plurality of different dimensions, so that the identity of the user of the same user in each platform is identified by combining the matching degrees of the dimensions; and further, establishing the identity identification incidence relation of the users among the platforms, analyzing the user behavior of the user through the incidence relation among the identity identifications of the same user, and realizing the supervision and management of the user on the application program or the platform.
Specifically, how to determine the user behavior data of the user on each target application program corresponding to the multiple identity identifications according to the multiple identity identifications, in an embodiment, the embodiment of the present disclosure describes in detail S104 on the basis of the above embodiment. The determining, according to the multiple identity identifiers, user behavior data of the user on each target application program corresponding to the multiple identity identifiers includes:
establishing and storing an association table according to the plurality of identity identifications; wherein, the mapping relation among a plurality of identity identifications is stored in the association table; and determining user behavior data of the users corresponding to the plurality of identity identifications on each target application program according to the mapping relation.
In the embodiment of the present disclosure, one user in one platform corresponds to an identity of one user, and the mapping relationship of the identity of the user may be an identity association relationship of the user. And establishing an association table for the user identities of the same user on different target applications, wherein the association table stores the mapping relation among the multiple identities of each user, and the user behavior data can be determined according to the mapping relation. The behavior of the user can be further analyzed according to the user identification incidence relation, and each platform or user can be better supervised and managed according to the analyzed user behavior data. For example, if the associated user of the user a on the platform B is a user a', and the associated user of the user a on the platform B on the platform C is a user a ", the user id association relationship between the platform a, the platform B and the platform C can be established.
Fig. 7 is a block diagram of a user behavior analysis apparatus according to an embodiment of the present disclosure, which corresponds to the user behavior analysis method according to the embodiment of the present disclosure. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 7, the user behavior analysis device 70 includes: a target data receiving module 701, a first matching degree determining module 702, an identity identification determining module 703 of a first user, and a first user behavior data determining module 704; a target data receiving module 701, configured to receive target data in at least one target application sent by each user terminal in the multiple user terminals; a first matching degree determining module 702, configured to perform data processing on the target data in multiple dimensions, to obtain a matching degree of the target data in each dimension; an identity identifier determining module 703 of the first user, configured to determine, according to the matching degree of the target data in each dimension, multiple identity identifiers that belong to the same user in each target application program; a first user behavior data determining module 704, configured to determine, according to the multiple identity identifications, user behavior data of the users corresponding to the multiple identity identifications on each target application.
The target data receiving module 701, the first matching degree determining module 702, the first user identity determining module 703 and the first user behavior data determining module 704 provided in the embodiment of the present disclosure are configured to perform data processing on target data in multiple dimensions, obtain a matching degree of the target data in each dimension, and then determine multiple identities belonging to the same user in each target application according to the matching degree of the target data in each dimension, so as to sufficiently combine features of the user and further accurately analyze user behaviors, that is, by implementing multiple identity associations of the same user of each platform or application, user behaviors can be timely and accurately analyzed, and further, each platform or user can be better supervised and managed.
The apparatus provided in the embodiment of the present disclosure may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again in the embodiment of the present disclosure.
In one embodiment of the present disclosure, the target data includes publication data, a user avatar, and a user nickname, and the plurality of dimensions includes a publication data dimension, an avatar dimension, and a nickname dimension.
In an embodiment of the present disclosure, on the basis of the above-described disclosed embodiment, for example, on the basis of the embodiment of fig. 7, the embodiment of the present disclosure describes the matching degree determining module 702 in detail. The matching degree determining module 702 includes: the first matching degree determining unit is used for determining coincidence data and/or coincidence rate of the published data according to published data in the target data, and taking the coincidence data and/or coincidence rate as the matching degree of the target data on the published data dimension; a second matching degree determining unit, configured to obtain, according to a user avatar in the target data, a similarity of feature values of the user avatar between the users through locality sensitive hash calculation, and use the similarity of the feature values of the user avatar as a matching degree of the target data in the avatar dimension; and the third matching degree determining unit is used for obtaining the similarity of the corresponding nicknames between the users according to the nicknames in the target data, and taking the similarity of the nicknames as the matching degree of the target data on the nickname dimension.
In an embodiment of the present disclosure, on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the embodiment of fig. 7, the embodiment of the present disclosure describes the first matching degree determining unit in detail. The published data comprises at least one item of articles, pictures and videos, and the number of the published data is at least one; the first matching degree determining unit is specifically configured to:
determining a reference application program from each target application program, wherein the reference application program is any one of the target application programs; clustering at least one published data corresponding to each user on the reference application program with at least one published data corresponding to each user on other target application programs in each target application program respectively to obtain a clustering result of each published data corresponding to each user on the reference application program, wherein the clustering result of each published data comprises the identity of a plurality of target users on other target application programs in each target application program; counting the times of the appearance of the identity of each target user in the identity identifications of the target users in each clustering result corresponding to each user in the reference application program according to each clustering result corresponding to each user in the reference application program; taking the times as coincidence data of at least one published data corresponding to each user on the reference application program; and making a ratio of the coincidence data to the number of at least one published data published by the user on the reference application program through the user terminal corresponding to the coincidence data, and taking the ratio as the coincidence rate of at least one published data corresponding to each user on the reference application program.
In an embodiment of the present disclosure, on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the embodiment of fig. 7, the embodiment of the present disclosure describes in detail the identity determination module 703 of the first user. The identity module 703 of the first user is specifically configured to:
when the coincidence data is used as the matching degree of the target data on the published data dimension and is greater than a first preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence data and the identity of the user on the reference application program corresponding to the coincidence data as a plurality of identity of the same user; when the coincidence rate is used as the matching degree of the target data on the published data dimension and is greater than a second preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence rate and the identity of the user on the reference application program corresponding to the coincidence rate as a plurality of identity of the same user; when the coincidence data is smaller than or equal to a first preset threshold value and/or the coincidence rate is smaller than or equal to a second preset threshold value and the similarity of the feature values of the user avatar is larger than a third preset threshold value, taking the identity of the user on the other target application programs in each target application program corresponding to the similarity of the feature values of the user avatar and the identity of the user on the reference application program corresponding to the similarity of the feature values of the user avatar as a plurality of identities of the same user; and when the coincidence data is less than or equal to a first preset threshold value and/or the coincidence rate is less than or equal to a second preset threshold value, the similarity of the feature values of the head portraits of the users is less than or equal to a third preset threshold value, and the similarity of the nicknames of the users is greater than a fourth preset threshold value, taking the identity of the users on other target application programs corresponding to the similarity of the nicknames of the users and the identity of the users on the reference application program corresponding to the similarity of the nicknames of the users as a plurality of identity of the same user.
In an embodiment of the present disclosure, on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the embodiment of fig. 7, the embodiment of the present disclosure describes in detail the identity determination module 703 of the first user. The identity module 703 of the first user is specifically configured to:
when the coincidence rate corresponding to the target data is taken as the matching degree of the target data on the published data dimension, carrying out weighted fusion on the coincidence rate, the similarity of the characteristic values of the user head portrait and the similarity of the user nickname to obtain the weighted matching degree; and taking the identity of the user on the other target application programs in each target application program corresponding to the weighted maximum matching degree and the identity of the user on the reference application program corresponding to the weighted maximum matching degree as a plurality of identities of the same user.
In an embodiment of the present disclosure, on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the embodiment of fig. 7, the embodiment of the present disclosure describes the first user behavior data determining module 704 in detail. The first user behavior data determining module 704 is specifically configured to:
establishing and storing an association table according to the plurality of identity identifications, wherein the mapping relation among the plurality of identity identifications is stored in the association table; and determining user behavior data of the users corresponding to the plurality of identity identifications on each target application program according to the mapping relation.
Referring to fig. 8, a schematic structural diagram of an electronic device 800 suitable for implementing an embodiment of the present disclosure is shown, where the electronic device 800 may be a terminal device or a server. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 8, an electronic device 800 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 801 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage device 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the disclosed embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In a first aspect, an embodiment of the present disclosure provides a user behavior analysis method, including:
receiving target data in at least one target application program sent by each user terminal in a plurality of user terminals;
performing data processing on the target data in multiple dimensions to obtain the matching degree of the target data in each dimension;
determining a plurality of identity identifications belonging to the same user in each target application program according to the matching degree of the target data on each dimension;
and determining user behavior data of the user corresponding to the plurality of identity identifications on each target application program according to the plurality of identity identifications.
According to one or more embodiments of the present disclosure, the target data includes publication data, a user avatar, and a user nickname, and the plurality of dimensions includes a publication data dimension, an avatar dimension, and a nickname dimension.
According to one or more embodiments of the present disclosure, the performing data processing on the target data in multiple dimensions to obtain a matching degree of the target data in each dimension includes:
determining coincidence data and/or coincidence rate of the published data according to published data in the target data, and taking the coincidence data and/or coincidence rate as matching degree of the target data on the published data dimension;
according to the user head portrait in the target data, similarity of characteristic values of the user head portrait among the users is obtained through local sensitive hash calculation, and the similarity of the characteristic values of the user head portrait is used as matching degree of the target data in the head portrait dimension;
and according to the nickname in the target data, obtaining the similarity of the corresponding nickname between the users, and taking the similarity of the nickname as the matching degree of the target data on the nickname dimension.
According to one or more embodiments of the present disclosure, the publication data includes at least one of articles, pictures and videos, and the number of the publication data is at least one;
determining coincidence data and/or coincidence rate of the published data according to published data in the target data, including:
determining a reference application program from each target application program, wherein the reference application program is any one of the target application programs;
clustering at least one published data corresponding to each user on the reference application program with at least one published data corresponding to each user on other target application programs in each target application program respectively to obtain a clustering result of each published data corresponding to each user on the reference application program, wherein the clustering result of each published data comprises the identity of a plurality of target users on other target application programs in each target application program;
counting the times of the appearance of the identity of each target user in the identity identifications of the target users in each clustering result corresponding to each user in the reference application program according to each clustering result corresponding to each user in the reference application program;
taking the times as coincidence data of at least one published data corresponding to each user on the reference application program;
and making a ratio of the coincidence data to the number of at least one published data published by the user on the reference application program through the user terminal corresponding to the coincidence data, and taking the ratio as the coincidence rate of at least one published data corresponding to each user on the reference application program.
According to one or more embodiments of the present disclosure, the determining, according to the matching degree of the target data in each dimension, a plurality of identifiers belonging to the same user in each of the target applications includes:
if the coincidence data is used as the matching degree of the target data on the published data dimension and the coincidence data is larger than a first preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence data and the identity of the user on the reference application program corresponding to the coincidence data as a plurality of identity of the same user;
if the coincidence rate is taken as the matching degree of the target data on the published data dimension, and the coincidence rate is greater than a second preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence rate and the identity of the user on the reference application program corresponding to the coincidence rate as a plurality of identity of the same user;
if the coincidence data is smaller than or equal to a first preset threshold value and/or the coincidence rate is smaller than or equal to a second preset threshold value, and the similarity of the feature values of the user avatar is larger than a third preset threshold value, taking the identity of the user on the other target application programs in each target application program corresponding to the similarity of the feature values of the user avatar and the identity of the user on the reference application program corresponding to the similarity of the feature values of the user avatar as a plurality of identities of the same user;
and if the coincidence data is less than or equal to a first preset threshold value and/or the coincidence rate is less than or equal to a second preset threshold value, the similarity of the characteristic values of the head images of the users is less than or equal to a third preset threshold value, and the similarity of the nicknames of the users is greater than a fourth preset threshold value, taking the identity of the user on the other target application programs corresponding to the similarity of the nicknames of the users and the identity of the user on the reference application program corresponding to the similarity of the nicknames of the users as a plurality of identity of the same user.
According to one or more embodiments of the present disclosure, the determining, according to the matching degree of the target data in each dimension, a plurality of identifiers belonging to the same user in each of the target applications includes:
if the coincidence rate corresponding to the target data is taken as the matching degree of the target data on the dimension of the published data, carrying out weighted fusion on the coincidence rate, the similarity of the characteristic values of the head portrait of the user and the similarity of the nickname of the user to obtain the weighted matching degree;
and taking the identity of the user on the other target application programs in each target application program corresponding to the weighted maximum matching degree and the identity of the user on the reference application program corresponding to the weighted maximum matching degree as a plurality of identities of the same user.
According to one or more embodiments of the present disclosure, determining, according to the multiple identity identifications, user behavior data of the users corresponding to the multiple identity identifications on the respective target application programs includes:
establishing and storing an association table according to the plurality of identity identifications, wherein the mapping relation among the plurality of identity identifications is stored in the association table;
and determining user behavior data of the users corresponding to the plurality of identity identifications on each target application program according to the mapping relation.
In a second aspect, an embodiment of the present disclosure provides a user behavior analysis apparatus, including:
the target data receiving module is used for receiving target data in at least one target application program sent by each user terminal in the plurality of user terminals;
the first matching degree determining module is used for carrying out data processing on the target data on multiple dimensions to obtain the matching degree of the target data on each dimension;
the identification determining module of the first user is used for determining a plurality of identifications belonging to the same user in each target application program according to the matching degree of the target data on each dimension;
and the first user behavior data determining module is used for determining the user behavior data of the user corresponding to the plurality of identity identifications on each target application program according to the plurality of identity identifications.
According to one or more embodiments of the present disclosure, the target data includes publication data, a user avatar, and a user nickname, and the plurality of dimensions includes a publication data dimension, an avatar dimension, and a nickname dimension.
According to one or more embodiments of the present disclosure, the first matching degree determining module includes:
the first matching degree determining unit is used for determining coincidence data and/or coincidence rate of the published data according to published data in the target data, and taking the coincidence data and/or coincidence rate as the matching degree of the target data on the published data dimension;
a second matching degree determining unit, configured to obtain, according to a user avatar in the target data, a similarity of feature values of the user avatar between the users through locality sensitive hash calculation, and use the similarity of the feature values of the user avatar as a matching degree of the target data in the avatar dimension;
and the third matching degree determining unit is used for obtaining the similarity of the corresponding nicknames between the users according to the nicknames in the target data, and taking the similarity of the nicknames as the matching degree of the target data on the nickname dimension.
According to one or more embodiments of the present disclosure, the publication data includes at least one of articles, pictures and videos, and the number of the publication data is at least one;
the first matching degree determining unit is specifically configured to:
determining a reference application program from each target application program, wherein the reference application program is any one of the target application programs;
clustering at least one published data corresponding to each user on the reference application program with at least one published data corresponding to each user on other target application programs in each target application program respectively to obtain a clustering result of each published data corresponding to each user on the reference application program, wherein the clustering result of each published data comprises the identity of a plurality of target users on other target application programs in each target application program;
counting the times of the appearance of the identity of each target user in the identity identifications of the target users in each clustering result corresponding to each user in the reference application program according to each clustering result corresponding to each user in the reference application program;
taking the times as coincidence data of at least one published data corresponding to each user on the reference application program;
and making a ratio of the coincidence data to the number of at least one published data published by the user on the reference application program through the user terminal corresponding to the coincidence data, and taking the ratio as the coincidence rate of at least one published data corresponding to each user on the reference application program.
According to one or more embodiments of the present disclosure, the identity determination module of the first user is specifically configured to:
when the coincidence data is used as the matching degree of the target data on the published data dimension and is greater than a first preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence data and the identity of the user on the reference application program corresponding to the coincidence data as a plurality of identity of the same user;
when the coincidence rate is used as the matching degree of the target data on the published data dimension and is greater than a second preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence rate and the identity of the user on the reference application program corresponding to the coincidence rate as a plurality of identity of the same user;
when the coincidence data is smaller than or equal to a first preset threshold value and/or the coincidence rate is smaller than or equal to a second preset threshold value and the similarity of the feature values of the user avatar is larger than a third preset threshold value, taking the identity of the user on the other target application programs in each target application program corresponding to the similarity of the feature values of the user avatar and the identity of the user on the reference application program corresponding to the similarity of the feature values of the user avatar as a plurality of identities of the same user;
and when the coincidence data is less than or equal to a first preset threshold value and/or the coincidence rate is less than or equal to a second preset threshold value, the similarity of the feature values of the head portraits of the users is less than or equal to a third preset threshold value, and the similarity of the nicknames of the users is greater than a fourth preset threshold value, taking the identity of the users on other target application programs corresponding to the similarity of the nicknames of the users and the identity of the users on the reference application program corresponding to the similarity of the nicknames of the users as a plurality of identity of the same user.
According to one or more embodiments of the present disclosure, the identity determination module of the first user is specifically configured to:
when the coincidence rate corresponding to the target data is taken as the matching degree of the target data on the published data dimension, carrying out weighted fusion on the coincidence rate, the similarity of the characteristic values of the user head portrait and the similarity of the user nickname to obtain the weighted matching degree;
and taking the identity of the user on the other target application programs in each target application program corresponding to the weighted maximum matching degree and the identity of the user on the reference application program corresponding to the weighted maximum matching degree as a plurality of identities of the same user.
According to one or more embodiments of the present disclosure, the first user behavior data determining module is specifically configured to:
establishing and storing an association table according to the plurality of identity identifications, wherein the mapping relation among the plurality of identity identifications is stored in the association table;
and determining user behavior data of the users corresponding to the plurality of identity identifications on each target application program according to the mapping relation.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor, a memory, and a communication interface;
the communication interface is used for communicating with each user terminal;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the user behavior analysis method as described above in the first aspect and in various possible designs of the first aspect.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the user behavior analysis method according to the first aspect and various possible designs of the first aspect is implemented.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (10)

1. A user behavior analysis method is characterized by comprising the following steps:
receiving target data in at least one target application program sent by each user terminal in a plurality of user terminals;
performing data processing on the target data in multiple dimensions to obtain the matching degree of the target data in each dimension;
determining a plurality of identity identifications belonging to the same user in each target application program according to the matching degree of the target data on each dimension;
and determining user behavior data of the user corresponding to the plurality of identity identifications on each target application program according to the plurality of identity identifications.
2. The method of claim 1, wherein the target data comprises publication data, a user avatar, and a user nickname, and wherein the plurality of dimensions comprises a publication data dimension, an avatar dimension, and a nickname dimension.
3. The method of claim 2, wherein the performing data processing on the target data in multiple dimensions to obtain a matching degree of the target data in each dimension comprises:
determining coincidence data and/or coincidence rate of the published data according to published data in the target data, and taking the coincidence data and/or coincidence rate as matching degree of the target data on the published data dimension;
according to the user head portrait in the target data, similarity of characteristic values of the user head portrait among the users is obtained through local sensitive hash calculation, and the similarity of the characteristic values of the user head portrait is used as matching degree of the target data in the head portrait dimension;
and according to the nickname in the target data, obtaining the similarity of the corresponding nickname between the users, and taking the similarity of the nickname as the matching degree of the target data on the nickname dimension.
4. The method of claim 3, wherein the published data comprises at least one of articles, pictures, and videos, and the number of the published data is at least one;
determining coincidence data and/or coincidence rate of the published data according to published data in the target data, including:
determining a reference application program from each target application program, wherein the reference application program is any one of the target application programs;
clustering at least one published data corresponding to each user on the reference application program with at least one published data corresponding to each user on other target application programs in each target application program respectively to obtain a clustering result of each published data corresponding to each user on the reference application program, wherein the clustering result of each published data comprises the identity of a plurality of target users on other target application programs in each target application program;
counting the times of the appearance of the identity of each target user in the identity identifications of the target users in each clustering result corresponding to each user in the reference application program according to each clustering result corresponding to each user in the reference application program;
taking the times as coincidence data of at least one published data corresponding to each user on the reference application program;
and making a ratio of the coincidence data to the number of at least one published data published by the user on the reference application program through the user terminal corresponding to the coincidence data, and taking the ratio as the coincidence rate of at least one published data corresponding to each user on the reference application program.
5. The method according to claim 4, wherein the determining, according to the matching degree of the target data in each dimension, a plurality of identities belonging to the same user in each target application includes:
if the coincidence data is used as the matching degree of the target data on the published data dimension and the coincidence data is larger than a first preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence data and the identity of the user on the reference application program corresponding to the coincidence data as a plurality of identity of the same user;
if the coincidence rate is taken as the matching degree of the target data on the published data dimension, and the coincidence rate is greater than a second preset threshold value, taking the identity of the user on other target application programs in each target application program corresponding to the coincidence rate and the identity of the user on the reference application program corresponding to the coincidence rate as a plurality of identity of the same user;
if the coincidence data is smaller than or equal to a first preset threshold value and/or the coincidence rate is smaller than or equal to a second preset threshold value, and the similarity of the feature values of the user avatar is larger than a third preset threshold value, taking the identity of the user on the other target application programs in each target application program corresponding to the similarity of the feature values of the user avatar and the identity of the user on the reference application program corresponding to the similarity of the feature values of the user avatar as a plurality of identities of the same user;
and if the coincidence data is less than or equal to a first preset threshold value and/or the coincidence rate is less than or equal to a second preset threshold value, the similarity of the characteristic values of the head images of the users is less than or equal to a third preset threshold value, and the similarity of the nicknames of the users is greater than a fourth preset threshold value, taking the identity of the user on the other target application programs corresponding to the similarity of the nicknames of the users and the identity of the user on the reference application program corresponding to the similarity of the nicknames of the users as a plurality of identity of the same user.
6. The method according to claim 4, wherein the determining, according to the matching degree of the target data in each dimension, a plurality of identities belonging to the same user in each target application includes:
if the coincidence rate corresponding to the target data is taken as the matching degree of the target data on the dimension of the published data, carrying out weighted fusion on the coincidence rate, the similarity of the characteristic values of the head portrait of the user and the similarity of the nickname of the user to obtain the weighted matching degree;
and taking the identity of the user on the other target application programs in each target application program corresponding to the weighted maximum matching degree and the identity of the user on the reference application program corresponding to the weighted maximum matching degree as a plurality of identities of the same user.
7. The method according to any one of claims 1-6, wherein the determining, according to the plurality of identities, user behavior data of the users corresponding to the plurality of identities on the respective target applications comprises:
establishing and storing an association table according to the plurality of identity identifications, wherein the mapping relation among the plurality of identity identifications is stored in the association table;
and determining user behavior data of the users corresponding to the plurality of identity identifications on each target application program according to the mapping relation.
8. A user behavior analysis apparatus, comprising:
the target data receiving module is used for receiving target data in at least one target application program sent by each user terminal in the plurality of user terminals;
the first matching degree determining module is used for carrying out data processing on the target data on multiple dimensions to obtain the matching degree of the target data on each dimension;
the identification determining module of the first user is used for determining a plurality of identifications belonging to the same user in each target application program according to the matching degree of the target data on each dimension;
and the first user behavior data determining module is used for determining the user behavior data of the user corresponding to the plurality of identity identifications on each target application program according to the plurality of identity identifications.
9. An electronic device, comprising: at least one processor, a memory, and a communication interface;
the communication interface is used for communicating with each user terminal;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the user behavior analysis method of any of claims 1 to 7.
10. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, implement the user behavior analysis method of any one of claims 1 to 7.
CN201911048809.0A 2019-10-29 2019-10-29 User behavior analysis method, device, equipment and storage medium Pending CN110752958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911048809.0A CN110752958A (en) 2019-10-29 2019-10-29 User behavior analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911048809.0A CN110752958A (en) 2019-10-29 2019-10-29 User behavior analysis method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110752958A true CN110752958A (en) 2020-02-04

Family

ID=69281402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911048809.0A Pending CN110752958A (en) 2019-10-29 2019-10-29 User behavior analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110752958A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111866164A (en) * 2020-07-29 2020-10-30 钱秀英 Information acquisition system and method for data transmission among communication devices

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306833A1 (en) * 2009-05-28 2010-12-02 International Business Machines Corporation Autonomous intelligent user identity manager with context recognition capabilities
CN107330091A (en) * 2017-07-04 2017-11-07 百度在线网络技术(北京)有限公司 Information processing method and device
CN108830052A (en) * 2018-05-25 2018-11-16 恒安嘉新(北京)科技股份公司 A kind of striding equipment Internet user's recognition methods based on AI
CN110046293A (en) * 2019-03-01 2019-07-23 清华大学 A kind of user identification relevancy method and device
CN110163611A (en) * 2019-03-18 2019-08-23 腾讯科技(深圳)有限公司 A kind of personal identification method, device and relevant device
CN110188276A (en) * 2019-05-31 2019-08-30 秒针信息技术有限公司 Data sending device, method, electronic equipment and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306833A1 (en) * 2009-05-28 2010-12-02 International Business Machines Corporation Autonomous intelligent user identity manager with context recognition capabilities
CN107330091A (en) * 2017-07-04 2017-11-07 百度在线网络技术(北京)有限公司 Information processing method and device
CN108830052A (en) * 2018-05-25 2018-11-16 恒安嘉新(北京)科技股份公司 A kind of striding equipment Internet user's recognition methods based on AI
CN110046293A (en) * 2019-03-01 2019-07-23 清华大学 A kind of user identification relevancy method and device
CN110163611A (en) * 2019-03-18 2019-08-23 腾讯科技(深圳)有限公司 A kind of personal identification method, device and relevant device
CN110188276A (en) * 2019-05-31 2019-08-30 秒针信息技术有限公司 Data sending device, method, electronic equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111866164A (en) * 2020-07-29 2020-10-30 钱秀英 Information acquisition system and method for data transmission among communication devices
CN111866164B (en) * 2020-07-29 2021-05-07 广州伊智信息科技有限公司 Information acquisition system and method for data transmission among communication devices

Similar Documents

Publication Publication Date Title
CN110634047B (en) Method and device for recommending house resources, electronic equipment and storage medium
CN110390493B (en) Task management method and device, storage medium and electronic equipment
CN110781373B (en) List updating method and device, readable medium and electronic equipment
WO2023151589A1 (en) Video display method and apparatus, electronic device and storage medium
CN112311656A (en) Message aggregation and display method and device, electronic equipment and computer readable medium
CN111784712A (en) Image processing method, device, equipment and computer readable medium
CN111596991A (en) Interactive operation execution method and device and electronic equipment
CN110795554B (en) Target information analysis method, device, equipment and storage medium
CN110781066B (en) User behavior analysis method, device, equipment and storage medium
CN111209432A (en) Information acquisition method and device, electronic equipment and computer readable medium
CN111309496A (en) Method, system, device, equipment and storage medium for realizing delay task
CN110633383A (en) Method and device for identifying repeated house sources, electronic equipment and readable medium
CN111262744B (en) Multimedia information transmitting method, backup server and medium
CN113918659A (en) Data operation method and device, storage medium and electronic equipment
CN110752958A (en) User behavior analysis method, device, equipment and storage medium
CN111797353A (en) Information pushing method and device and electronic equipment
CN111552620A (en) Data acquisition method, device, terminal and storage medium
CN111832354A (en) Target object age identification method and device and electronic equipment
CN110941683B (en) Method, device, medium and electronic equipment for acquiring object attribute information in space
CN110780966B (en) Social interface processing method and device, electronic equipment and storage medium
CN110334763B (en) Model data file generation method, model data file generation device, model data file identification device, model data file generation apparatus, model data file identification apparatus, and model data file identification medium
CN113033680A (en) Video classification method and device, readable medium and electronic equipment
CN111343245A (en) Uploading line scheduling method and device, electronic equipment and readable storage medium
CN110634024A (en) User attribute marking method and device, electronic equipment and storage medium
CN113076195B (en) Object shunting method and device, readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200204