CN109088788B - Data processing method, device, equipment and computer readable storage medium - Google Patents

Data processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN109088788B
CN109088788B CN201810752308.XA CN201810752308A CN109088788B CN 109088788 B CN109088788 B CN 109088788B CN 201810752308 A CN201810752308 A CN 201810752308A CN 109088788 B CN109088788 B CN 109088788B
Authority
CN
China
Prior art keywords
user
data
characteristic data
identity characteristic
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810752308.XA
Other languages
Chinese (zh)
Other versions
CN109088788A (en
Inventor
袁晓静
翟京卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201810752308.XA priority Critical patent/CN109088788B/en
Publication of CN109088788A publication Critical patent/CN109088788A/en
Application granted granted Critical
Publication of CN109088788B publication Critical patent/CN109088788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2117User registration

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a data processing method, a data processing device, data processing equipment and a computer readable storage medium. The method comprises the steps of respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.

Description

Data processing method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of information data processing technologies, and in particular, to a data processing method, apparatus, device, and computer readable storage medium.
Background
Deep Packet Inspection (DPI) is an application layer traffic Inspection and control technology based on data packets, and performs Deep Inspection and analysis on different layers of information of the data packets to obtain application layer information of the whole data stream or data Packet, and then performs statistical analysis and control on traffic according to a policy defined by a DPI system.
With the development of big data and internet technology, various applications are entering people's lives. Because different applications do not have uniform requirements for the registration information of the user, the user identifications used by different applications registered by the same user may be different, and the same user identification may be used by different applications registered by different users. When the prior DPI system acquires the behavior data of a user, the user behavior data corresponding to each user is established for each application, a large amount of redundant data is stored, and panoramic user characteristic data cannot be formed.
Disclosure of Invention
The invention provides a data processing method, a data processing device, data processing equipment and a computer readable storage medium, which are used for solving the problems that when the prior DPI system acquires behavior characteristic data of users, user behavior characteristic data corresponding to each user is established for each application, a large amount of redundant data is stored, and panoramic user characteristic data cannot be formed.
One aspect of the present invention provides a data processing method, including:
respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information used for uniquely identifying a user main body;
determining whether the first user and the second user belong to the same user subject according to the first identity characteristic data of the first user and the second user;
and if the first user and the second user belong to the same user main body, merging the first user data and the second user data.
Another aspect of the present invention provides a data processing apparatus comprising:
the data extraction module is used for respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information used for uniquely identifying a user main body;
the determining module is used for determining whether the first user and the second user belong to the same user main body according to the first identity characteristic data of the first user and the second user;
a processing module, configured to perform merging processing on the first user data and the second user data if it is determined that the first user and the second user belong to the same user subject
Another aspect of the present invention provides a deep packet inspection device, including:
a memory, a processor, and a computer program stored on the memory and executable on the processor,
the processor, when running the computer program, implements the method described above.
Another aspect of the present invention provides a computer-readable storage medium storing a computer program,
which when executed by a processor implements the method described above.
According to the data processing method, the data processing device, the data processing equipment and the computer readable storage medium, first identity characteristic data of a first user and first identity characteristic data of a second user are respectively extracted from first user data and second user data to be processed, wherein the first identity characteristic data comprise at least one type of identity information used for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
Drawings
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data processing method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data processing apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a deep packet inspection device according to a fifth embodiment of the present invention.
With the above figures, certain embodiments of the invention have been illustrated and described in more detail below. The drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In embodiments of the present invention, the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. In the description of the following examples, "plurality" means two or more unless specifically limited otherwise.
The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Example one
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention. The embodiment of the invention provides a data processing method aiming at the problems that when the prior DPI system acquires the behavior characteristic data of users, the corresponding user behavior data of each user is established for each application, a large amount of redundant data is stored, and the panoramic user characteristic data cannot be formed. The method in this embodiment is applied to deep packet inspection equipment, and the computer equipment may be a computer equipment where a DPI system is located. In other embodiments of the present invention, the method in this embodiment may also be applied to other computer devices, and this embodiment takes a deep packet inspection device as an example for illustration. As shown in fig. 1, the method comprises the following specific steps:
step S101, first identity characteristic data of a first user and first identity characteristic data of a second user are respectively extracted from first user data and second user data to be processed, wherein the first identity characteristic data comprise at least one type of identity information used for uniquely identifying a user main body.
In this embodiment, the first user data and the second user data are user behavior data acquired by the DPI system for two different user accounts of one application, or user behavior data acquired for two different user accounts of two different applications.
In practical application, the first user data and the second user data to be processed may be specified by a technician by specifying an application identifier and a user registration account, or may be user data corresponding to any two user registration accounts in the obtained user data by the DPI system, which is not specifically limited in this embodiment.
The first identity characteristic data comprises at least one identity token for uniquely identifying a user agent. The identity information for uniquely identifying a user principal may at least include: identity card number, mobile phone number, email, etc.
And step S102, determining whether the first user and the second user belong to the same user main body according to the first identity characteristic data of the first user and the second user.
Because the first identity characteristic data of the users comprises at least one kind of identity information for uniquely identifying a user main body, if the first identity characteristic data of the first user and the second user simultaneously comprises at least one kind of identity information for uniquely identifying a user main body, when any kind of identity information for uniquely identifying a user main body simultaneously included in the first identity characteristic data of the first user and the second user is consistent, the first user and the second user can be determined to belong to the same user main body.
If the first identity characteristic data of the first user and the second user simultaneously includes at least one kind of identity information for uniquely identifying one user principal, it can be determined that the first user and the second user do not belong to the same user principal when any kind of identity information for uniquely identifying one user principal simultaneously included in the first identity characteristic data of the first user and the second user is inconsistent.
If the first identity characteristic data of the first user and the second user does not include identity information used for uniquely identifying a user principal at the same time, it cannot be determined that the first user and the second user belong to the same user principal or cannot be determined that the first user and the second user do not belong to the same user principal according to the first identity characteristic data of the first user and the second user.
And step S103, if the first user and the second user belong to the same user main body, merging the first user data and the second user data.
And after determining that the first user and the second user belong to the same user main body, merging the first user data and the second user data.
Specifically, the merging the first user data and the second user data includes:
and generating a uniform user data identifier corresponding to the first user data and the second user data, removing redundant information in the first user data and the second user data, and generating more comprehensive user data corresponding to the user data identifier.
The embodiment of the invention respectively extracts first identity characteristic data of a first user and second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
Example two
Fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention. On the basis of the first embodiment, in this embodiment, if it is not determined that the first user and the second user belong to the same user principal, second identity feature data of the first user and the second user are respectively extracted from the first user data and the second user data to be processed, where the second identity feature data at least includes: family address, friend information, incidence relation and behavior characteristic data; calculating the similarity between the second identity characteristic data of the first user and the second user; comparing the similarity between the second identity characteristic data of the first user and the second user with the size of a first preset threshold value; and if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, determining that the first user and the second user belong to the same user main body, and merging the first user data and the second user data. If the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, wherein the second preset threshold is smaller than the first preset threshold; and if the similarity between the second identity characteristic data of the first user and the second user is greater than a second preset threshold value, establishing an association relationship between the first user data and the second user data.
As shown in fig. 2, the method comprises the following specific steps:
step S201, extracting first identity feature data of the first user and the second user from the first user data and the second user data to be processed, respectively, where the first identity feature data includes at least one kind of identity information for uniquely identifying a user principal.
In this embodiment, the first user data and the second user data are user behavior data acquired by the DPI system for two different user accounts of one application, or user behavior data acquired for two different user accounts of two different applications.
In practical application, the first user data and the second user data to be processed may be specified by a technician by specifying an application identifier and a user registration account, or may be user data corresponding to any two user registration accounts in the obtained user data by the DPI system, which is not specifically limited in this embodiment.
The first identity characteristic data comprises at least one identity token for uniquely identifying a user agent. The identity information for uniquely identifying a user principal may at least include: identity card number, mobile phone number, email, etc.
Optionally, the first identity feature data of the first user and the first identity feature data of the second user may be extracted from the first user data and the second user data to be processed, respectively, and recorded in the data list.
Step S202, determining whether the first user and the second user belong to the same user subject according to the first identity characteristic data of the first user and the second user.
In this embodiment, whether the first user and the second user belong to the same user subject is determined according to the first identity feature data of the first user and the second user, which may be specifically implemented in the following manner:
judging whether any identity information exists in the first identity characteristic data of the first user and the second user; and if any identity information exists in the first identity characteristic data of the first user and the second user, determining that the first user and the second user belong to the same user main body.
Because the first identity characteristic data of the users comprises at least one kind of identity information for uniquely identifying a user main body, if the first identity characteristic data of the first user and the second user simultaneously comprises at least one kind of identity information for uniquely identifying a user main body, when any kind of identity information for uniquely identifying a user main body simultaneously included in the first identity characteristic data of the first user and the second user is consistent, the first user and the second user can be determined to belong to the same user main body.
If the first identity characteristic data of the first user and the second user simultaneously includes at least one kind of identity information for uniquely identifying one user principal, it can be determined that the first user and the second user do not belong to the same user principal when any kind of identity information for uniquely identifying one user principal simultaneously included in the first identity characteristic data of the first user and the second user is inconsistent.
If the first identity characteristic data of the first user and the second user does not include identity information used for uniquely identifying a user principal at the same time, it cannot be determined that the first user and the second user belong to the same user principal or cannot be determined that the first user and the second user do not belong to the same user principal according to the first identity characteristic data of the first user and the second user.
Step S203, if it is determined that the first user and the second user belong to the same user subject, merging the first user data and the second user data.
And after determining that the first user and the second user belong to the same user main body, merging the first user data and the second user data.
Specifically, the merging the first user data and the second user data includes:
and generating a uniform user data identifier corresponding to the first user data and the second user data, removing redundant information in the first user data and the second user data, and generating more comprehensive user data corresponding to the user data identifier.
Step S204, if the first user and the second user are not determined to belong to the same user subject, second identity characteristic data of the first user and second user are respectively extracted from the first user data and second user data to be processed.
Wherein the second identity characteristic data comprises at least: family address, friend information, association relation and behavior characteristic data. The association relationship may be mobile phone contact information. Optionally, the second identity characteristic data may further include an account number of the instant messaging tool, and the like.
Optionally, if it is not determined that the first user and the second user belong to the same user subject, before extracting second identity feature data of the first user and the second user from the first user data and the second user data to be processed, the method further includes:
respectively extracting registered accounts of a first user and a second user from first user data and second user data to be processed; judging whether the registered accounts of the first user and the second user are consistent; and if the registered accounts of the first user and the second user are consistent, then executing the subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
If the registered accounts of the first user and the second user are inconsistent, calculating the similarity of the registered accounts of the first user and the second user; judging whether the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold value or not; and if the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold, then executing a subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
The registered accounts of the first user and the second user are two character strings, and the similarity between the registered accounts of the first user and the second user is calculated. For example, two strings may be matched, the longest matching sub-string of the two strings may be determined, and the proportion of the longest sub-string may be calculated.
In addition, the third preset threshold may be set by a technician according to actual needs, and this embodiment is not specifically limited herein.
Step S205, calculating the similarity between the second identity characteristic data of the first user and the second user; and comparing the similarity between the second identity characteristic data of the first user and the second user with the first preset threshold value.
In this embodiment, the similarity between the second identity characteristic data of the first user and the second user may be specifically implemented by any method in the prior art for calculating the similarity between the two users according to the behavior data and the attribute information of the two users, which is not specifically limited in this embodiment.
The first preset threshold may be set by a technician according to actual needs, and this embodiment is not specifically limited herein.
Step S206, if the similarity between the second identity characteristic data of the first user and the second user is larger than a first preset threshold value, determining that the first user and the second user belong to the same user subject, and merging the first user data and the second user data.
In this embodiment, if the similarity between the second identity characteristic data of the first user and the second user is greater than the first preset threshold, it is indicated that the similarity between the second identity characteristic data of the first user and the second user is very high, and the first user and the second user may be considered to belong to the same user subject, and the first user data and the second user data are merged.
In addition, the process of merging the first user data and the second user data is the same as step S203, and details are not repeated here in this embodiment.
Step S207, if the similarity between the second identity characteristic data of the first user and the second user is less than or equal to the first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, where the second preset threshold is less than the first preset threshold.
The second preset threshold may be set by a technician according to actual needs, and this embodiment is not specifically limited here.
If the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a second preset threshold, it is indicated that the association degree between the first user data and the second user data is small, the first user data and the second user data are not combined, and the association relationship between the first user data and the second user data is not required to be established.
Step S208, if the similarity between the second identity characteristic data of the first user and the second user is greater than a second preset threshold, establishing an association relationship between the first user data and the second user data.
If the similarity between the second identity characteristic data of the first user and the second user is greater than the second preset threshold, it is indicated that the first user and the second user cannot be determined to belong to the same user subject according to the existing user data of the first user and the second user, but the association between the first user and the second user is large, so that the association relationship between the first user data and the second user data is established, so that after more important identity data of the first user and the second user is subsequently acquired, whether the first user and the second user belong to the same user subject can be further determined more accurately, and the accuracy of merging the user data is improved.
According to the embodiment of the invention, when the first user and the second user are not determined to belong to the same user subject, the second identity characteristic data of the first user and the second user are respectively extracted from the first user data and the second user data to be processed, and according to the similarity between the second identity characteristic data of the first user and the second user, if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, the first user and the second user are determined to belong to the same user subject, and the first user data and the second user data are combined; if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a first preset threshold and larger than a second preset threshold, establishing an incidence relation between the first user data and the second user data, accurately determining a plurality of user numbers of the same user main body, combining the plurality of user data of the same user main body to form panoramic user characteristic data, and reducing the overall data redundancy of the DPI system.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a data processing apparatus according to a third embodiment of the present invention. The data processing device provided by the embodiment of the invention can execute the processing flow provided by the embodiment of the data processing method. As shown in fig. 3, the apparatus 30 includes: a data extraction module 301, a determination module 302 and a processing module 303.
Specifically, the data extraction module 301 is configured to extract first identity feature data of the first user and first identity feature data of the second user from the first user data and the second user data to be processed, where the first identity feature data includes at least one type of identity information for uniquely identifying a user principal.
The determining module 302 is configured to determine whether the first user and the second user belong to the same user subject according to the first identity feature data of the first user and the second user.
The processing module 303 is configured to perform merging processing on the first user data and the second user data if it is determined that the first user and the second user belong to the same user main body.
The apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiment provided in the first embodiment, and specific functions are not described herein again.
The embodiment of the invention respectively extracts first identity characteristic data of a first user and second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
Example four
On the basis of the third embodiment, in this embodiment, the processing module is further configured to:
if the first user and the second user are not determined to belong to the same user subject, respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed, wherein the second identity characteristic data at least comprises: family address, friend information, incidence relation and behavior characteristic data; calculating the similarity between the second identity characteristic data of the first user and the second user; comparing the similarity between the second identity characteristic data of the first user and the second user with the size of a first preset threshold value; and if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, determining that the first user and the second user belong to the same user main body, and merging the first user data and the second user data.
Optionally, the processing module is further configured to:
if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, wherein the second preset threshold is smaller than the first preset threshold; and if the similarity between the second identity characteristic data of the first user and the second user is greater than a second preset threshold value, establishing an association relationship between the first user data and the second user data.
Optionally, the processing module is further configured to:
respectively extracting registered accounts of a first user and a second user from first user data and second user data to be processed; judging whether the registered accounts of the first user and the second user are consistent; and if the registered accounts of the first user and the second user are consistent, then executing the subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
Optionally, the processing module is further configured to:
if the registered accounts of the first user and the second user are inconsistent, calculating the similarity of the registered accounts of the first user and the second user; judging whether the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold value or not; and if the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold, then executing a subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
Optionally, the processing module is further configured to:
judging whether any identity information exists in the first identity characteristic data of the first user and the second user; and if any identity information exists in the first identity characteristic data of the first user and the second user, determining that the first user and the second user belong to the same user main body.
The apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiment provided in the second embodiment, and specific functions are not described herein again.
According to the embodiment of the invention, when the first user and the second user are not determined to belong to the same user subject, the second identity characteristic data of the first user and the second user are respectively extracted from the first user data and the second user data to be processed, and according to the similarity between the second identity characteristic data of the first user and the second user, if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, the first user and the second user are determined to belong to the same user subject, and the first user data and the second user data are combined; if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a first preset threshold and larger than a second preset threshold, establishing an incidence relation between the first user data and the second user data, accurately determining a plurality of user numbers of the same user main body, combining the plurality of user data of the same user main body to form panoramic user characteristic data, and reducing the overall data redundancy of the DPI system.
EXAMPLE five
Fig. 4 is a schematic structural diagram of a deep packet inspection device according to a fifth embodiment of the present invention. As shown in fig. 4, the apparatus 40 includes: a processor 401, a memory 402, and computer programs stored on the memory 402 and executable by the processor 401.
The processor 401, when executing the computer program stored on the memory 402, implements the data processing method provided by any of the method embodiments described above.
The embodiment of the invention respectively extracts first identity characteristic data of a first user and second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the data processing method provided in any of the above method embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (7)

1. A data processing method, comprising:
respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information used for uniquely identifying a user main body;
determining whether the first user and the second user belong to the same user subject according to the first identity characteristic data of the first user and the second user;
if the first user and the second user belong to the same user main body, merging the first user data and the second user data;
after determining whether the first user and the second user belong to the same user subject according to the first identity characteristic data of the first user and the second user, the method further includes:
if the first user and the second user are not determined to belong to the same user subject, respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed, wherein the second identity characteristic data at least comprises: family address, friend information, incidence relation and behavior characteristic data;
calculating the similarity between the second identity characteristic data of the first user and the second user;
comparing the similarity between the second identity characteristic data of the first user and the second user with a first preset threshold value;
if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, determining that the first user and the second user belong to the same user subject, and merging the first user data and the second user data;
after comparing the similarity between the second identity characteristic data of the first user and the second user with the first preset threshold, the method further includes:
if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to the first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, wherein the second preset threshold is smaller than the first preset threshold;
if the similarity between the second identity characteristic data of the first user and the second user is greater than the second preset threshold, establishing an association relationship between the first user data and the second user data;
if it is not determined that the first user and the second user belong to the same user subject, before extracting second identity feature data of the first user and the second user from the first user data and the second user data to be processed, respectively, the method further includes:
respectively extracting registered accounts of a first user and a second user from first user data and second user data to be processed;
judging whether the registered accounts of the first user and the second user are consistent;
and if the registered accounts of the first user and the second user are consistent, then executing the subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
2. The method of claim 1, wherein determining whether the first user and the second user belong to the same user subject based on the first identity characteristic data of the first user and the second user comprises:
judging whether any identity information in the first identity characteristic data of the first user and the second user is consistent;
and if any one of the identity information in the first identity feature data of the first user and the second user is consistent, determining that the first user and the second user belong to the same user main body.
3. The method of claim 1, wherein after determining whether the registered accounts of the first user and the second user are consistent, the method further comprises:
if the registered accounts of the first user and the second user are inconsistent, calculating the similarity of the registered accounts of the first user and the second user;
judging whether the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold value or not;
and if the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold, then executing a subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
4. A data processing apparatus, comprising:
the data extraction module is used for respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information used for uniquely identifying a user main body;
the determining module is used for determining whether the first user and the second user belong to the same user main body according to the first identity characteristic data of the first user and the second user;
the processing module is used for merging the first user data and the second user data if the first user and the second user belong to the same user main body;
the processing module is further configured to:
if the first user and the second user are not determined to belong to the same user subject, respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed, wherein the second identity characteristic data at least comprises: family address, friend information, incidence relation and behavior characteristic data;
calculating the similarity between the second identity characteristic data of the first user and the second user;
comparing the similarity between the second identity characteristic data of the first user and the second user with a first preset threshold value;
if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, determining that the first user and the second user belong to the same user subject, and merging the first user data and the second user data;
the processing module is further configured to:
if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to the first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, wherein the second preset threshold is smaller than the first preset threshold;
if the similarity between the second identity characteristic data of the first user and the second user is greater than the second preset threshold, establishing an association relationship between the first user data and the second user data;
the processing module is further configured to: respectively extracting registered accounts of a first user and a second user from first user data and second user data to be processed;
judging whether the registered accounts of the first user and the second user are consistent;
and if the registered accounts of the first user and the second user are consistent, then executing the subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
5. The apparatus of claim 4, wherein the processing module is further configured to:
if the registered accounts of the first user and the second user are inconsistent, calculating the similarity of the registered accounts of the first user and the second user;
judging whether the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold value or not;
and if the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold, then executing a subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
6. A deep packet inspection device, comprising:
a memory, a processor, and a computer program stored on the memory and executable on the processor,
the processor, when executing the computer program, implements the method of any of claims 1-3.
7. A computer-readable storage medium, in which a computer program is stored,
the computer program, when executed by a processor, implementing the method of any one of claims 1-3.
CN201810752308.XA 2018-07-10 2018-07-10 Data processing method, device, equipment and computer readable storage medium Active CN109088788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810752308.XA CN109088788B (en) 2018-07-10 2018-07-10 Data processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810752308.XA CN109088788B (en) 2018-07-10 2018-07-10 Data processing method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109088788A CN109088788A (en) 2018-12-25
CN109088788B true CN109088788B (en) 2021-02-02

Family

ID=64837484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810752308.XA Active CN109088788B (en) 2018-07-10 2018-07-10 Data processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109088788B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767348A (en) * 2019-04-02 2020-10-13 上海晶赞融宣科技有限公司 Data fusion method and device, storage medium and server
CN110245146B (en) * 2019-05-20 2022-11-25 中国平安人寿保险股份有限公司 User identification method and related device
CN110557363A (en) * 2019-06-03 2019-12-10 北京城市网邻信息技术有限公司 identity verification method, device and storage medium
CN112395320B (en) * 2020-11-26 2023-03-07 深圳市房多多网络科技有限公司 Building information merging method, device, equipment and computer readable storage medium
CN113641657A (en) * 2021-08-23 2021-11-12 苏州良医汇网络科技有限公司 Method, device and equipment for merging user accounts

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729682A (en) * 2009-11-11 2010-06-09 南京联创科技集团股份有限公司 Method for automatically tracing communication network users
CN103905379A (en) * 2012-12-25 2014-07-02 腾讯科技(深圳)有限公司 Method for identifying internet users and device thereof
CN105844489A (en) * 2016-03-21 2016-08-10 联想(北京)有限公司 Information processing method and electronic device
CN106572048A (en) * 2015-10-09 2017-04-19 腾讯科技(深圳)有限公司 Identification method and system of user information in social network
CN106570719A (en) * 2016-08-24 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and apparatus
US9774670B2 (en) * 2010-08-22 2017-09-26 Qwilt, Inc. Methods for detection of content servers and caching popular content therein
CN108235368A (en) * 2016-12-15 2018-06-29 中国电信股份有限公司 For determining the method and device of the radio resource of business occupancy
CN108259314A (en) * 2016-12-29 2018-07-06 乐视汽车(北京)有限公司 Information-pushing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729682A (en) * 2009-11-11 2010-06-09 南京联创科技集团股份有限公司 Method for automatically tracing communication network users
US9774670B2 (en) * 2010-08-22 2017-09-26 Qwilt, Inc. Methods for detection of content servers and caching popular content therein
CN103905379A (en) * 2012-12-25 2014-07-02 腾讯科技(深圳)有限公司 Method for identifying internet users and device thereof
CN106572048A (en) * 2015-10-09 2017-04-19 腾讯科技(深圳)有限公司 Identification method and system of user information in social network
CN105844489A (en) * 2016-03-21 2016-08-10 联想(北京)有限公司 Information processing method and electronic device
CN106570719A (en) * 2016-08-24 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and apparatus
CN108235368A (en) * 2016-12-15 2018-06-29 中国电信股份有限公司 For determining the method and device of the radio resource of business occupancy
CN108259314A (en) * 2016-12-29 2018-07-06 乐视汽车(北京)有限公司 Information-pushing method and device

Also Published As

Publication number Publication date
CN109088788A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN109088788B (en) Data processing method, device, equipment and computer readable storage medium
CN107423613B (en) Method and device for determining device fingerprint according to similarity and server
CN110033302B (en) Malicious account identification method and device
CN111666346B (en) Information merging method, transaction inquiring method, device, computer and storage medium
CN104980402B (en) Method and device for identifying malicious operation
CN106878275B (en) Identity verification method and device and server
CN110765760B (en) Legal case distribution method and device, storage medium and server
US10997609B1 (en) Biometric based user identity verification
WO2015106728A1 (en) Data processing method and system
CN109635625B (en) Intelligent identity verification method, equipment, storage medium and device
US20230410221A1 (en) Information processing apparatus, control method, and program
CN108234454B (en) Identity authentication method, server and client device
CN110675252A (en) Risk assessment method and device, electronic equipment and storage medium
CN113381963A (en) Domain name detection method, device and storage medium
CN107656959B (en) Message leaving method and device and message leaving equipment
CN112182520B (en) Identification method and device of illegal account number, readable medium and electronic equipment
US11412063B2 (en) Method and apparatus for setting mobile device identifier
CN112257689A (en) Training and recognition method of face recognition model, storage medium and related equipment
CN111062301A (en) Identity authentication method and device, electronic equipment and computer readable storage medium
CN109587248A (en) User identification method, device, server and storage medium
CN110097258A (en) A kind of customer relationship network creating method, device and computer readable storage medium
CN109446030A (en) A kind of behavior monitoring method and device
CN113011301A (en) Living body identification method and device and electronic equipment
CN112149552A (en) Intelligent monitoring method and device
CN112948646B (en) Data identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant