CN109088788B - Data processing method, device, equipment and computer readable storage medium - Google Patents
Data processing method, device, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN109088788B CN109088788B CN201810752308.XA CN201810752308A CN109088788B CN 109088788 B CN109088788 B CN 109088788B CN 201810752308 A CN201810752308 A CN 201810752308A CN 109088788 B CN109088788 B CN 109088788B
- Authority
- CN
- China
- Prior art keywords
- user
- data
- characteristic data
- identity characteristic
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 29
- 230000006399 behavior Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 11
- 238000007689 inspection Methods 0.000 claims description 9
- 238000013075 data extraction Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/028—Capturing of monitoring data by filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2117—User registration
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a data processing method, a data processing device, data processing equipment and a computer readable storage medium. The method comprises the steps of respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
Description
Technical Field
The present invention relates to the field of information data processing technologies, and in particular, to a data processing method, apparatus, device, and computer readable storage medium.
Background
Deep Packet Inspection (DPI) is an application layer traffic Inspection and control technology based on data packets, and performs Deep Inspection and analysis on different layers of information of the data packets to obtain application layer information of the whole data stream or data Packet, and then performs statistical analysis and control on traffic according to a policy defined by a DPI system.
With the development of big data and internet technology, various applications are entering people's lives. Because different applications do not have uniform requirements for the registration information of the user, the user identifications used by different applications registered by the same user may be different, and the same user identification may be used by different applications registered by different users. When the prior DPI system acquires the behavior data of a user, the user behavior data corresponding to each user is established for each application, a large amount of redundant data is stored, and panoramic user characteristic data cannot be formed.
Disclosure of Invention
The invention provides a data processing method, a data processing device, data processing equipment and a computer readable storage medium, which are used for solving the problems that when the prior DPI system acquires behavior characteristic data of users, user behavior characteristic data corresponding to each user is established for each application, a large amount of redundant data is stored, and panoramic user characteristic data cannot be formed.
One aspect of the present invention provides a data processing method, including:
respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information used for uniquely identifying a user main body;
determining whether the first user and the second user belong to the same user subject according to the first identity characteristic data of the first user and the second user;
and if the first user and the second user belong to the same user main body, merging the first user data and the second user data.
Another aspect of the present invention provides a data processing apparatus comprising:
the data extraction module is used for respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information used for uniquely identifying a user main body;
the determining module is used for determining whether the first user and the second user belong to the same user main body according to the first identity characteristic data of the first user and the second user;
a processing module, configured to perform merging processing on the first user data and the second user data if it is determined that the first user and the second user belong to the same user subject
Another aspect of the present invention provides a deep packet inspection device, including:
a memory, a processor, and a computer program stored on the memory and executable on the processor,
the processor, when running the computer program, implements the method described above.
Another aspect of the present invention provides a computer-readable storage medium storing a computer program,
which when executed by a processor implements the method described above.
According to the data processing method, the data processing device, the data processing equipment and the computer readable storage medium, first identity characteristic data of a first user and first identity characteristic data of a second user are respectively extracted from first user data and second user data to be processed, wherein the first identity characteristic data comprise at least one type of identity information used for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
Drawings
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data processing method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data processing apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a deep packet inspection device according to a fifth embodiment of the present invention.
With the above figures, certain embodiments of the invention have been illustrated and described in more detail below. The drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In embodiments of the present invention, the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. In the description of the following examples, "plurality" means two or more unless specifically limited otherwise.
The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Example one
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention. The embodiment of the invention provides a data processing method aiming at the problems that when the prior DPI system acquires the behavior characteristic data of users, the corresponding user behavior data of each user is established for each application, a large amount of redundant data is stored, and the panoramic user characteristic data cannot be formed. The method in this embodiment is applied to deep packet inspection equipment, and the computer equipment may be a computer equipment where a DPI system is located. In other embodiments of the present invention, the method in this embodiment may also be applied to other computer devices, and this embodiment takes a deep packet inspection device as an example for illustration. As shown in fig. 1, the method comprises the following specific steps:
step S101, first identity characteristic data of a first user and first identity characteristic data of a second user are respectively extracted from first user data and second user data to be processed, wherein the first identity characteristic data comprise at least one type of identity information used for uniquely identifying a user main body.
In this embodiment, the first user data and the second user data are user behavior data acquired by the DPI system for two different user accounts of one application, or user behavior data acquired for two different user accounts of two different applications.
In practical application, the first user data and the second user data to be processed may be specified by a technician by specifying an application identifier and a user registration account, or may be user data corresponding to any two user registration accounts in the obtained user data by the DPI system, which is not specifically limited in this embodiment.
The first identity characteristic data comprises at least one identity token for uniquely identifying a user agent. The identity information for uniquely identifying a user principal may at least include: identity card number, mobile phone number, email, etc.
And step S102, determining whether the first user and the second user belong to the same user main body according to the first identity characteristic data of the first user and the second user.
Because the first identity characteristic data of the users comprises at least one kind of identity information for uniquely identifying a user main body, if the first identity characteristic data of the first user and the second user simultaneously comprises at least one kind of identity information for uniquely identifying a user main body, when any kind of identity information for uniquely identifying a user main body simultaneously included in the first identity characteristic data of the first user and the second user is consistent, the first user and the second user can be determined to belong to the same user main body.
If the first identity characteristic data of the first user and the second user simultaneously includes at least one kind of identity information for uniquely identifying one user principal, it can be determined that the first user and the second user do not belong to the same user principal when any kind of identity information for uniquely identifying one user principal simultaneously included in the first identity characteristic data of the first user and the second user is inconsistent.
If the first identity characteristic data of the first user and the second user does not include identity information used for uniquely identifying a user principal at the same time, it cannot be determined that the first user and the second user belong to the same user principal or cannot be determined that the first user and the second user do not belong to the same user principal according to the first identity characteristic data of the first user and the second user.
And step S103, if the first user and the second user belong to the same user main body, merging the first user data and the second user data.
And after determining that the first user and the second user belong to the same user main body, merging the first user data and the second user data.
Specifically, the merging the first user data and the second user data includes:
and generating a uniform user data identifier corresponding to the first user data and the second user data, removing redundant information in the first user data and the second user data, and generating more comprehensive user data corresponding to the user data identifier.
The embodiment of the invention respectively extracts first identity characteristic data of a first user and second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
Example two
Fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention. On the basis of the first embodiment, in this embodiment, if it is not determined that the first user and the second user belong to the same user principal, second identity feature data of the first user and the second user are respectively extracted from the first user data and the second user data to be processed, where the second identity feature data at least includes: family address, friend information, incidence relation and behavior characteristic data; calculating the similarity between the second identity characteristic data of the first user and the second user; comparing the similarity between the second identity characteristic data of the first user and the second user with the size of a first preset threshold value; and if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, determining that the first user and the second user belong to the same user main body, and merging the first user data and the second user data. If the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, wherein the second preset threshold is smaller than the first preset threshold; and if the similarity between the second identity characteristic data of the first user and the second user is greater than a second preset threshold value, establishing an association relationship between the first user data and the second user data.
As shown in fig. 2, the method comprises the following specific steps:
step S201, extracting first identity feature data of the first user and the second user from the first user data and the second user data to be processed, respectively, where the first identity feature data includes at least one kind of identity information for uniquely identifying a user principal.
In this embodiment, the first user data and the second user data are user behavior data acquired by the DPI system for two different user accounts of one application, or user behavior data acquired for two different user accounts of two different applications.
In practical application, the first user data and the second user data to be processed may be specified by a technician by specifying an application identifier and a user registration account, or may be user data corresponding to any two user registration accounts in the obtained user data by the DPI system, which is not specifically limited in this embodiment.
The first identity characteristic data comprises at least one identity token for uniquely identifying a user agent. The identity information for uniquely identifying a user principal may at least include: identity card number, mobile phone number, email, etc.
Optionally, the first identity feature data of the first user and the first identity feature data of the second user may be extracted from the first user data and the second user data to be processed, respectively, and recorded in the data list.
Step S202, determining whether the first user and the second user belong to the same user subject according to the first identity characteristic data of the first user and the second user.
In this embodiment, whether the first user and the second user belong to the same user subject is determined according to the first identity feature data of the first user and the second user, which may be specifically implemented in the following manner:
judging whether any identity information exists in the first identity characteristic data of the first user and the second user; and if any identity information exists in the first identity characteristic data of the first user and the second user, determining that the first user and the second user belong to the same user main body.
Because the first identity characteristic data of the users comprises at least one kind of identity information for uniquely identifying a user main body, if the first identity characteristic data of the first user and the second user simultaneously comprises at least one kind of identity information for uniquely identifying a user main body, when any kind of identity information for uniquely identifying a user main body simultaneously included in the first identity characteristic data of the first user and the second user is consistent, the first user and the second user can be determined to belong to the same user main body.
If the first identity characteristic data of the first user and the second user simultaneously includes at least one kind of identity information for uniquely identifying one user principal, it can be determined that the first user and the second user do not belong to the same user principal when any kind of identity information for uniquely identifying one user principal simultaneously included in the first identity characteristic data of the first user and the second user is inconsistent.
If the first identity characteristic data of the first user and the second user does not include identity information used for uniquely identifying a user principal at the same time, it cannot be determined that the first user and the second user belong to the same user principal or cannot be determined that the first user and the second user do not belong to the same user principal according to the first identity characteristic data of the first user and the second user.
Step S203, if it is determined that the first user and the second user belong to the same user subject, merging the first user data and the second user data.
And after determining that the first user and the second user belong to the same user main body, merging the first user data and the second user data.
Specifically, the merging the first user data and the second user data includes:
and generating a uniform user data identifier corresponding to the first user data and the second user data, removing redundant information in the first user data and the second user data, and generating more comprehensive user data corresponding to the user data identifier.
Step S204, if the first user and the second user are not determined to belong to the same user subject, second identity characteristic data of the first user and second user are respectively extracted from the first user data and second user data to be processed.
Wherein the second identity characteristic data comprises at least: family address, friend information, association relation and behavior characteristic data. The association relationship may be mobile phone contact information. Optionally, the second identity characteristic data may further include an account number of the instant messaging tool, and the like.
Optionally, if it is not determined that the first user and the second user belong to the same user subject, before extracting second identity feature data of the first user and the second user from the first user data and the second user data to be processed, the method further includes:
respectively extracting registered accounts of a first user and a second user from first user data and second user data to be processed; judging whether the registered accounts of the first user and the second user are consistent; and if the registered accounts of the first user and the second user are consistent, then executing the subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
If the registered accounts of the first user and the second user are inconsistent, calculating the similarity of the registered accounts of the first user and the second user; judging whether the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold value or not; and if the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold, then executing a subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
The registered accounts of the first user and the second user are two character strings, and the similarity between the registered accounts of the first user and the second user is calculated. For example, two strings may be matched, the longest matching sub-string of the two strings may be determined, and the proportion of the longest sub-string may be calculated.
In addition, the third preset threshold may be set by a technician according to actual needs, and this embodiment is not specifically limited herein.
Step S205, calculating the similarity between the second identity characteristic data of the first user and the second user; and comparing the similarity between the second identity characteristic data of the first user and the second user with the first preset threshold value.
In this embodiment, the similarity between the second identity characteristic data of the first user and the second user may be specifically implemented by any method in the prior art for calculating the similarity between the two users according to the behavior data and the attribute information of the two users, which is not specifically limited in this embodiment.
The first preset threshold may be set by a technician according to actual needs, and this embodiment is not specifically limited herein.
Step S206, if the similarity between the second identity characteristic data of the first user and the second user is larger than a first preset threshold value, determining that the first user and the second user belong to the same user subject, and merging the first user data and the second user data.
In this embodiment, if the similarity between the second identity characteristic data of the first user and the second user is greater than the first preset threshold, it is indicated that the similarity between the second identity characteristic data of the first user and the second user is very high, and the first user and the second user may be considered to belong to the same user subject, and the first user data and the second user data are merged.
In addition, the process of merging the first user data and the second user data is the same as step S203, and details are not repeated here in this embodiment.
Step S207, if the similarity between the second identity characteristic data of the first user and the second user is less than or equal to the first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, where the second preset threshold is less than the first preset threshold.
The second preset threshold may be set by a technician according to actual needs, and this embodiment is not specifically limited here.
If the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a second preset threshold, it is indicated that the association degree between the first user data and the second user data is small, the first user data and the second user data are not combined, and the association relationship between the first user data and the second user data is not required to be established.
Step S208, if the similarity between the second identity characteristic data of the first user and the second user is greater than a second preset threshold, establishing an association relationship between the first user data and the second user data.
If the similarity between the second identity characteristic data of the first user and the second user is greater than the second preset threshold, it is indicated that the first user and the second user cannot be determined to belong to the same user subject according to the existing user data of the first user and the second user, but the association between the first user and the second user is large, so that the association relationship between the first user data and the second user data is established, so that after more important identity data of the first user and the second user is subsequently acquired, whether the first user and the second user belong to the same user subject can be further determined more accurately, and the accuracy of merging the user data is improved.
According to the embodiment of the invention, when the first user and the second user are not determined to belong to the same user subject, the second identity characteristic data of the first user and the second user are respectively extracted from the first user data and the second user data to be processed, and according to the similarity between the second identity characteristic data of the first user and the second user, if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, the first user and the second user are determined to belong to the same user subject, and the first user data and the second user data are combined; if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a first preset threshold and larger than a second preset threshold, establishing an incidence relation between the first user data and the second user data, accurately determining a plurality of user numbers of the same user main body, combining the plurality of user data of the same user main body to form panoramic user characteristic data, and reducing the overall data redundancy of the DPI system.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a data processing apparatus according to a third embodiment of the present invention. The data processing device provided by the embodiment of the invention can execute the processing flow provided by the embodiment of the data processing method. As shown in fig. 3, the apparatus 30 includes: a data extraction module 301, a determination module 302 and a processing module 303.
Specifically, the data extraction module 301 is configured to extract first identity feature data of the first user and first identity feature data of the second user from the first user data and the second user data to be processed, where the first identity feature data includes at least one type of identity information for uniquely identifying a user principal.
The determining module 302 is configured to determine whether the first user and the second user belong to the same user subject according to the first identity feature data of the first user and the second user.
The processing module 303 is configured to perform merging processing on the first user data and the second user data if it is determined that the first user and the second user belong to the same user main body.
The apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiment provided in the first embodiment, and specific functions are not described herein again.
The embodiment of the invention respectively extracts first identity characteristic data of a first user and second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
Example four
On the basis of the third embodiment, in this embodiment, the processing module is further configured to:
if the first user and the second user are not determined to belong to the same user subject, respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed, wherein the second identity characteristic data at least comprises: family address, friend information, incidence relation and behavior characteristic data; calculating the similarity between the second identity characteristic data of the first user and the second user; comparing the similarity between the second identity characteristic data of the first user and the second user with the size of a first preset threshold value; and if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, determining that the first user and the second user belong to the same user main body, and merging the first user data and the second user data.
Optionally, the processing module is further configured to:
if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, wherein the second preset threshold is smaller than the first preset threshold; and if the similarity between the second identity characteristic data of the first user and the second user is greater than a second preset threshold value, establishing an association relationship between the first user data and the second user data.
Optionally, the processing module is further configured to:
respectively extracting registered accounts of a first user and a second user from first user data and second user data to be processed; judging whether the registered accounts of the first user and the second user are consistent; and if the registered accounts of the first user and the second user are consistent, then executing the subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
Optionally, the processing module is further configured to:
if the registered accounts of the first user and the second user are inconsistent, calculating the similarity of the registered accounts of the first user and the second user; judging whether the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold value or not; and if the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold, then executing a subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
Optionally, the processing module is further configured to:
judging whether any identity information exists in the first identity characteristic data of the first user and the second user; and if any identity information exists in the first identity characteristic data of the first user and the second user, determining that the first user and the second user belong to the same user main body.
The apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiment provided in the second embodiment, and specific functions are not described herein again.
According to the embodiment of the invention, when the first user and the second user are not determined to belong to the same user subject, the second identity characteristic data of the first user and the second user are respectively extracted from the first user data and the second user data to be processed, and according to the similarity between the second identity characteristic data of the first user and the second user, if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, the first user and the second user are determined to belong to the same user subject, and the first user data and the second user data are combined; if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to a first preset threshold and larger than a second preset threshold, establishing an incidence relation between the first user data and the second user data, accurately determining a plurality of user numbers of the same user main body, combining the plurality of user data of the same user main body to form panoramic user characteristic data, and reducing the overall data redundancy of the DPI system.
EXAMPLE five
Fig. 4 is a schematic structural diagram of a deep packet inspection device according to a fifth embodiment of the present invention. As shown in fig. 4, the apparatus 40 includes: a processor 401, a memory 402, and computer programs stored on the memory 402 and executable by the processor 401.
The processor 401, when executing the computer program stored on the memory 402, implements the data processing method provided by any of the method embodiments described above.
The embodiment of the invention respectively extracts first identity characteristic data of a first user and second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information for uniquely identifying a user main body; determining whether the first user and the second user belong to the same user main body or not according to the first identity characteristic data of the first user and the second user; and if the first user and the second user belong to the same user main body, merging the first user data and the second user data, so that a plurality of user data of the same user main body are merged to form panoramic user characteristic data, and the data redundancy of the DPI system is reduced.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the data processing method provided in any of the above method embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
Claims (7)
1. A data processing method, comprising:
respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information used for uniquely identifying a user main body;
determining whether the first user and the second user belong to the same user subject according to the first identity characteristic data of the first user and the second user;
if the first user and the second user belong to the same user main body, merging the first user data and the second user data;
after determining whether the first user and the second user belong to the same user subject according to the first identity characteristic data of the first user and the second user, the method further includes:
if the first user and the second user are not determined to belong to the same user subject, respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed, wherein the second identity characteristic data at least comprises: family address, friend information, incidence relation and behavior characteristic data;
calculating the similarity between the second identity characteristic data of the first user and the second user;
comparing the similarity between the second identity characteristic data of the first user and the second user with a first preset threshold value;
if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, determining that the first user and the second user belong to the same user subject, and merging the first user data and the second user data;
after comparing the similarity between the second identity characteristic data of the first user and the second user with the first preset threshold, the method further includes:
if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to the first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, wherein the second preset threshold is smaller than the first preset threshold;
if the similarity between the second identity characteristic data of the first user and the second user is greater than the second preset threshold, establishing an association relationship between the first user data and the second user data;
if it is not determined that the first user and the second user belong to the same user subject, before extracting second identity feature data of the first user and the second user from the first user data and the second user data to be processed, respectively, the method further includes:
respectively extracting registered accounts of a first user and a second user from first user data and second user data to be processed;
judging whether the registered accounts of the first user and the second user are consistent;
and if the registered accounts of the first user and the second user are consistent, then executing the subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
2. The method of claim 1, wherein determining whether the first user and the second user belong to the same user subject based on the first identity characteristic data of the first user and the second user comprises:
judging whether any identity information in the first identity characteristic data of the first user and the second user is consistent;
and if any one of the identity information in the first identity feature data of the first user and the second user is consistent, determining that the first user and the second user belong to the same user main body.
3. The method of claim 1, wherein after determining whether the registered accounts of the first user and the second user are consistent, the method further comprises:
if the registered accounts of the first user and the second user are inconsistent, calculating the similarity of the registered accounts of the first user and the second user;
judging whether the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold value or not;
and if the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold, then executing a subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
4. A data processing apparatus, comprising:
the data extraction module is used for respectively extracting first identity characteristic data of a first user and first identity characteristic data of a second user from first user data and second user data to be processed, wherein the first identity characteristic data comprises at least one type of identity information used for uniquely identifying a user main body;
the determining module is used for determining whether the first user and the second user belong to the same user main body according to the first identity characteristic data of the first user and the second user;
the processing module is used for merging the first user data and the second user data if the first user and the second user belong to the same user main body;
the processing module is further configured to:
if the first user and the second user are not determined to belong to the same user subject, respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed, wherein the second identity characteristic data at least comprises: family address, friend information, incidence relation and behavior characteristic data;
calculating the similarity between the second identity characteristic data of the first user and the second user;
comparing the similarity between the second identity characteristic data of the first user and the second user with a first preset threshold value;
if the similarity between the second identity characteristic data of the first user and the second user is greater than a first preset threshold value, determining that the first user and the second user belong to the same user subject, and merging the first user data and the second user data;
the processing module is further configured to:
if the similarity between the second identity characteristic data of the first user and the second user is smaller than or equal to the first preset threshold, comparing the similarity between the second identity characteristic data of the first user and the second user with a second preset threshold, wherein the second preset threshold is smaller than the first preset threshold;
if the similarity between the second identity characteristic data of the first user and the second user is greater than the second preset threshold, establishing an association relationship between the first user data and the second user data;
the processing module is further configured to: respectively extracting registered accounts of a first user and a second user from first user data and second user data to be processed;
judging whether the registered accounts of the first user and the second user are consistent;
and if the registered accounts of the first user and the second user are consistent, then executing the subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
5. The apparatus of claim 4, wherein the processing module is further configured to:
if the registered accounts of the first user and the second user are inconsistent, calculating the similarity of the registered accounts of the first user and the second user;
judging whether the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold value or not;
and if the similarity of the registered accounts of the first user and the second user is greater than a third preset threshold, then executing a subsequent step of respectively extracting second identity characteristic data of the first user and the second user from the first user data and the second user data to be processed.
6. A deep packet inspection device, comprising:
a memory, a processor, and a computer program stored on the memory and executable on the processor,
the processor, when executing the computer program, implements the method of any of claims 1-3.
7. A computer-readable storage medium, in which a computer program is stored,
the computer program, when executed by a processor, implementing the method of any one of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810752308.XA CN109088788B (en) | 2018-07-10 | 2018-07-10 | Data processing method, device, equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810752308.XA CN109088788B (en) | 2018-07-10 | 2018-07-10 | Data processing method, device, equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109088788A CN109088788A (en) | 2018-12-25 |
CN109088788B true CN109088788B (en) | 2021-02-02 |
Family
ID=64837484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810752308.XA Active CN109088788B (en) | 2018-07-10 | 2018-07-10 | Data processing method, device, equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109088788B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767348A (en) * | 2019-04-02 | 2020-10-13 | 上海晶赞融宣科技有限公司 | Data fusion method and device, storage medium and server |
CN110245146B (en) * | 2019-05-20 | 2022-11-25 | 中国平安人寿保险股份有限公司 | User identification method and related device |
CN110557363A (en) * | 2019-06-03 | 2019-12-10 | 北京城市网邻信息技术有限公司 | identity verification method, device and storage medium |
CN112395320B (en) * | 2020-11-26 | 2023-03-07 | 深圳市房多多网络科技有限公司 | Building information merging method, device, equipment and computer readable storage medium |
CN113641657A (en) * | 2021-08-23 | 2021-11-12 | 苏州良医汇网络科技有限公司 | Method, device and equipment for merging user accounts |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101729682A (en) * | 2009-11-11 | 2010-06-09 | 南京联创科技集团股份有限公司 | Method for automatically tracing communication network users |
CN103905379A (en) * | 2012-12-25 | 2014-07-02 | 腾讯科技(深圳)有限公司 | Method for identifying internet users and device thereof |
CN105844489A (en) * | 2016-03-21 | 2016-08-10 | 联想(北京)有限公司 | Information processing method and electronic device |
CN106572048A (en) * | 2015-10-09 | 2017-04-19 | 腾讯科技(深圳)有限公司 | Identification method and system of user information in social network |
CN106570719A (en) * | 2016-08-24 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Data processing method and apparatus |
US9774670B2 (en) * | 2010-08-22 | 2017-09-26 | Qwilt, Inc. | Methods for detection of content servers and caching popular content therein |
CN108235368A (en) * | 2016-12-15 | 2018-06-29 | 中国电信股份有限公司 | For determining the method and device of the radio resource of business occupancy |
CN108259314A (en) * | 2016-12-29 | 2018-07-06 | 乐视汽车(北京)有限公司 | Information-pushing method and device |
-
2018
- 2018-07-10 CN CN201810752308.XA patent/CN109088788B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101729682A (en) * | 2009-11-11 | 2010-06-09 | 南京联创科技集团股份有限公司 | Method for automatically tracing communication network users |
US9774670B2 (en) * | 2010-08-22 | 2017-09-26 | Qwilt, Inc. | Methods for detection of content servers and caching popular content therein |
CN103905379A (en) * | 2012-12-25 | 2014-07-02 | 腾讯科技(深圳)有限公司 | Method for identifying internet users and device thereof |
CN106572048A (en) * | 2015-10-09 | 2017-04-19 | 腾讯科技(深圳)有限公司 | Identification method and system of user information in social network |
CN105844489A (en) * | 2016-03-21 | 2016-08-10 | 联想(北京)有限公司 | Information processing method and electronic device |
CN106570719A (en) * | 2016-08-24 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Data processing method and apparatus |
CN108235368A (en) * | 2016-12-15 | 2018-06-29 | 中国电信股份有限公司 | For determining the method and device of the radio resource of business occupancy |
CN108259314A (en) * | 2016-12-29 | 2018-07-06 | 乐视汽车(北京)有限公司 | Information-pushing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN109088788A (en) | 2018-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109088788B (en) | Data processing method, device, equipment and computer readable storage medium | |
CN107423613B (en) | Method and device for determining device fingerprint according to similarity and server | |
CN110033302B (en) | Malicious account identification method and device | |
CN111666346B (en) | Information merging method, transaction inquiring method, device, computer and storage medium | |
CN104980402B (en) | Method and device for identifying malicious operation | |
CN106878275B (en) | Identity verification method and device and server | |
CN110765760B (en) | Legal case distribution method and device, storage medium and server | |
US10997609B1 (en) | Biometric based user identity verification | |
WO2015106728A1 (en) | Data processing method and system | |
CN109635625B (en) | Intelligent identity verification method, equipment, storage medium and device | |
US20230410221A1 (en) | Information processing apparatus, control method, and program | |
CN108234454B (en) | Identity authentication method, server and client device | |
CN110675252A (en) | Risk assessment method and device, electronic equipment and storage medium | |
CN113381963A (en) | Domain name detection method, device and storage medium | |
CN107656959B (en) | Message leaving method and device and message leaving equipment | |
CN112182520B (en) | Identification method and device of illegal account number, readable medium and electronic equipment | |
US11412063B2 (en) | Method and apparatus for setting mobile device identifier | |
CN112257689A (en) | Training and recognition method of face recognition model, storage medium and related equipment | |
CN111062301A (en) | Identity authentication method and device, electronic equipment and computer readable storage medium | |
CN109587248A (en) | User identification method, device, server and storage medium | |
CN110097258A (en) | A kind of customer relationship network creating method, device and computer readable storage medium | |
CN109446030A (en) | A kind of behavior monitoring method and device | |
CN113011301A (en) | Living body identification method and device and electronic equipment | |
CN112149552A (en) | Intelligent monitoring method and device | |
CN112948646B (en) | Data identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |