CN109088788A - Data processing method, device, equipment and computer readable storage medium - Google Patents

Data processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN109088788A
CN109088788A CN201810752308.XA CN201810752308A CN109088788A CN 109088788 A CN109088788 A CN 109088788A CN 201810752308 A CN201810752308 A CN 201810752308A CN 109088788 A CN109088788 A CN 109088788A
Authority
CN
China
Prior art keywords
user
data
identity
similarity
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810752308.XA
Other languages
Chinese (zh)
Other versions
CN109088788B (en
Inventor
袁晓静
翟京卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201810752308.XA priority Critical patent/CN109088788B/en
Publication of CN109088788A publication Critical patent/CN109088788A/en
Application granted granted Critical
Publication of CN109088788B publication Critical patent/CN109088788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2117User registration

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of data processing method, device, equipment and computer readable storage medium.Method of the invention, by extracting the first identity characteristic data of the first user and second user respectively from the first user data and second user data to be processed, the first identity characteristic data include at least one identity information for one user agent of unique identification;According to the first identity characteristic data of the first user and second user, determine whether the first user and second user belong to same user agent;If it is determined that the first user and second user belong to same user agent, processing then is merged to the first user data and second user data, it realizes and multiple user data merging treatments of same user agent is formed into panorama type user characteristic data, reduce the data redundancy of DPI system entirety.

Description

Data processing method, device, equipment and computer readable storage medium
Technical field
The present invention relates to information data processing technology field more particularly to a kind of data processing method, device, equipment and meters Calculation machine readable storage medium storing program for executing.
Background technique
It is a kind of application based on data message that deep message, which detects (Deep Packet Inspection, abbreviation DPI), Laminar flow amount detects and controls technology, carries out depth detection and analysis for the different layers information of data message, to obtain entire The application layer message of data flow or data packet, the strategy then defined according to DPI system is for statistical analysis to flow and controls.
With the development of big data and Internet technology, various applications enter people's lives.Due to different applications pair Not having unified requirement in the registration information of user, the user identifier that same user's registration different application uses may be different, Different user registration different application may use identical user identifier.DPI system is in the behavioral data for obtaining user at present When, the corresponding user behavior data of each user is established for every kind of application, stores a large amount of redundant data, and can not be formed complete Scape formula user characteristic data.
Summary of the invention
The present invention provides a kind of data processing method, device, equipment and computer readable storage medium, to solve at present DPI system establishes the corresponding user behavior characteristics of each user when obtaining the behavioural characteristic data of user, for every kind of application Data store a large amount of redundant data, and the problem of can not form panorama type user characteristic data.
It is an aspect of the invention to provide a kind of data processing methods, comprising:
Extract the of the first user and second user respectively from the first user data and second user data to be processed One identity characteristic, the first identity characteristic data include at least one identity for one user agent of unique identification Information;
According to the first identity characteristic data of first user and second user, determine that first user and second uses Whether family belongs to same user agent;
If it is determined that first user and second user belong to same user agent, then to first user data and Two user data merge processing.
Another aspect of the present invention is to provide a kind of data processing equipment, comprising:
Data extraction module, for extracting the first use respectively from the first user data and second user data to be processed The first identity characteristic data at family and second user, the first identity characteristic data include at least one for unique identification one The identity information of a user agent;
Determining module determines described for the first identity characteristic data according to first user and second user Whether one user and second user belong to same user agent;
Processing module, for if it is determined that first user and second user belong to same user agent, then to described One user data and second user data merge processing
Another aspect of the present invention is to provide a kind of deep packet detection device, comprising:
Memory, processor, and it is stored in the computer journey that can be run on the memory and on the processor Sequence,
The processor realizes method described above when running the computer program.
Another aspect of the present invention is to provide a kind of computer readable storage medium, is stored with computer program,
The computer program realizes method described above when being executed by processor.
Data processing method, device, equipment and computer readable storage medium provided by the invention, by to be processed Extract the first identity characteristic data of the first user and second user in first user data and second user data respectively, first Identity characteristic data include at least one identity information for one user agent of unique identification;According to the first user and second The first identity characteristic data of user, determine whether the first user and second user belong to same user agent;If it is determined that first User and second user belong to same user agent, then merge processing to the first user data and second user data, real Show and multiple user data merging treatments of same user agent are formed into panorama type user characteristic data, has reduced DPI system Whole data redundancy.
Detailed description of the invention
Fig. 1 is the data processing method flow chart that the embodiment of the present invention one provides;
Fig. 2 is data processing method flow chart provided by Embodiment 2 of the present invention;
Fig. 3 is the structural schematic diagram for the data processing equipment that the embodiment of the present invention three provides;
Fig. 4 is the structural schematic diagram for the deep packet detection device that the embodiment of the present invention five provides.
Through the above attached drawings, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of the inventive concept in any manner with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate idea of the invention.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.
In the embodiment of the present invention, term " first ", " second " etc. are used for description purposes only, and should not be understood as instruction or It implies relative importance or implicitly indicates the quantity of indicated technical characteristic.It is " more in the description of following embodiment It is a " it is meant that two or more, unless otherwise specifically defined.
These specific embodiments can be combined with each other below, may be at certain for the same or similar concept or process It is repeated no more in a little embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.
Embodiment one
Fig. 1 is the data processing method flow chart that the embodiment of the present invention one provides.The embodiment of the present invention is directed to current DPI system It unites when obtaining the behavioural characteristic data of user, establishes the corresponding user behavior data of each user for every kind of application, store A large amount of redundant data, and the problem of panorama type user characteristic data can not be formed, provide data processing method.The present embodiment In method be applied to deep packet detection device, which can be with the computer equipment where DPI system.The present invention In other embodiments, the method in the present embodiment can also be applied to other computer equipments, and the present embodiment is examined with deep message It is illustrated for measurement equipment.As shown in Figure 1, specific step is as follows for this method:
Step S101, the first user and second are extracted respectively from the first user data and second user data to be processed The first identity characteristic data of user, the first identity characteristic data include at least one for one user agent of unique identification Identity information.
In the present embodiment, the first user data and second user data are a kind of two differences of the DPI system for application User account obtain user behavior data, or for two different applications two user accounts obtain user Behavioral data.
In practical applications, the first user data to be processed and second user data can be passed through specified by technical staff Application identities and user's registration account are specified, and are also possible to DPI system and are used to any two in resulting user data The corresponding user data of family register account number, the present embodiment are not specifically limited herein.
First identity characteristic data include that at least one identity for one user agent of unique identification is believed.Wherein, it uses Believe at least may include: ID card No., phone number, E-mail address etc. in the identity of one user agent of unique identification.
Step S102, according to the first identity characteristic data of the first user and second user, the first user and second are determined Whether user belongs to same user agent.
Since the first identity characteristic data of user include at least one identity for one user agent of unique identification Information, if the first identity characteristic data of the first user and second user include at least one for one use of unique identification simultaneously The identity information of householder's body, then the first identity characteristic data of the first user and second user simultaneously include any one When identity information for one user agent of unique identification is consistent, so that it may it is same to determine that the first user and second user belong to User agent.
If the first identity characteristic data of the first user and second user include at least one for unique identification one simultaneously The identity information of a user agent, then the first identity characteristic data of the first user and second user simultaneously include it is any When a kind of identity information for one user agent of unique identification is inconsistent, so that it may determine the first user and second user not Belong to same user agent.
If there is no include simultaneously to be used for unique identification one in the first identity characteristic data of the first user and second user The identity information of a user agent then not can determine that the first use according to the first identity characteristic data of the first user and second user Family and second user belong to same user agent, can not determine that the first user and second user are not belonging to same user agent.
Step S103, if it is determined that the first user and second user belong to same user agent, then to the first user data and Second user data merge processing.
After determining that the first user and second user belong to same user agent, to the first user data and second user Data merge processing.
Specifically, merging processing to the first user data and second user data, specifically include:
The first user data and the corresponding unified user data mark of second user data are generated, removal first is used Redundancy in user data and second user data generates more fully user data corresponding with user data mark.
The embodiment of the present invention by extracting the first use from the first user data and second user data to be processed respectively The first identity characteristic data at family and second user, the first identity characteristic data include at least one for one use of unique identification The identity information of householder's body;According to the first identity characteristic data of the first user and second user, the first user and second are determined Whether user belongs to same user agent;If it is determined that the first user and second user belong to same user agent, then use first User data and second user data merge processing, realize multiple user data merging treatment shapes of same user agent At panorama type user characteristic data, reduce the data redundancy of DPI system entirety.
Embodiment two
Fig. 2 is data processing method flow chart provided by Embodiment 2 of the present invention.On the basis of the above embodiment 1, originally In embodiment, if uncertain first user and second user belong to same user agent, from the first user data to be processed and The Second Identity of Local data of the first user and second user are extracted in second user data respectively, Second Identity of Local data are extremely It less include: home address, friend information, incidence relation and behavioural characteristic data;Calculate the second of the first user and second user Similarity between identity characteristic data;Compare the similarity between the first user and the Second Identity of Local data of second user With the size of the first preset threshold;If similarity between the first user and the Second Identity of Local data of second user is greater than the One preset threshold, it is determined that the first user and second user belong to same user agent, to the first user data and second user Data merge processing.If the similarity between the first user and the Second Identity of Local data of second user is less than or waits In the first preset threshold, then compare similarity between the first user and the Second Identity of Local data of second user and second pre- If the size of threshold value, the second preset threshold is less than the first preset threshold;If the Second Identity of Local of the first user and second user Similarity between data is greater than the second preset threshold, then the association established between the first user data and second user data is closed System.
As shown in Fig. 2, specific step is as follows for this method:
Step S201, the first user and second are extracted respectively from the first user data and second user data to be processed The first identity characteristic data of user, the first identity characteristic data include at least one for one user agent of unique identification Identity information.
In the present embodiment, the first user data and second user data are a kind of two differences of the DPI system for application User account obtain user behavior data, or for two different applications two user accounts obtain user Behavioral data.
In practical applications, the first user data to be processed and second user data can be passed through specified by technical staff Application identities and user's registration account are specified, and are also possible to DPI system and are used to any two in resulting user data The corresponding user data of family register account number, the present embodiment are not specifically limited herein.
First identity characteristic data include that at least one identity for one user agent of unique identification is believed.Wherein, it uses Believe at least may include: ID card No., phone number, E-mail address etc. in the identity of one user agent of unique identification.
Optionally, the first user and can be extracted respectively from the first user data and second user data to be processed The first identity characteristic data of two users, and be recorded in data list.
Step S202, according to the first identity characteristic data of the first user and second user, the first user and second are determined Whether user belongs to same user agent.
In the present embodiment, according to the first identity characteristic data of the first user and second user, the first user and are determined Whether two users belong to same user agent, can specifically realize in the following way:
Judge in the first identity characteristic data of the first user and second user with the presence or absence of any one identity information one It causes;If the first user with there are any one identity information is consistent in the first identity characteristic data of second user, it is determined that One user and second user belong to same user agent.
Since the first identity characteristic data of user include at least one identity for one user agent of unique identification Information, if the first identity characteristic data of the first user and second user include at least one for one use of unique identification simultaneously The identity information of householder's body, then the first identity characteristic data of the first user and second user simultaneously include any one When identity information for one user agent of unique identification is consistent, so that it may it is same to determine that the first user and second user belong to User agent.
If the first identity characteristic data of the first user and second user include at least one for unique identification one simultaneously The identity information of a user agent, then the first identity characteristic data of the first user and second user simultaneously include it is any When a kind of identity information for one user agent of unique identification is inconsistent, so that it may determine the first user and second user not Belong to same user agent.
If there is no include simultaneously to be used for unique identification one in the first identity characteristic data of the first user and second user The identity information of a user agent then not can determine that the first use according to the first identity characteristic data of the first user and second user Family and second user belong to same user agent, can not determine that the first user and second user are not belonging to same user agent.
Step S203, if it is determined that the first user and second user belong to same user agent, then to the first user data and Second user data merge processing.
After determining that the first user and second user belong to same user agent, to the first user data and second user Data merge processing.
Specifically, merging processing to the first user data and second user data, specifically include:
The first user data and the corresponding unified user data mark of second user data are generated, removal first is used Redundancy in user data and second user data generates more fully user data corresponding with user data mark.
If step S204, uncertain first user and second user belong to same user agent, used to be processed first The Second Identity of Local data of the first user and second user are extracted in user data and second user data respectively.
Wherein, Second Identity of Local data include at least: home address, friend information, incidence relation and behavioural characteristic number According to.Wherein, incidence relation can be mobile phone contact information.Optionally, Second Identity of Local data can also include Instant Messenger The account etc. of news tool.
Optionally, if uncertain first user and second user belong to same user agent, from the first user to be processed Before the Second Identity of Local data for extracting the first user and second user in data and second user data respectively, further includes:
Extract the note of the first user and second user respectively from the first user data and second user data to be processed Volume account;Judge whether the first user is consistent with the register account number of second user;If the registration account of the first user and second user It is number consistent, then it executes and subsequent extracts the first user and second respectively from the first user data and second user data to be processed The step of Second Identity of Local data of user.
If the register account number of the first user and second user is inconsistent, the registration account of the first user and second user is calculated Number similarity;Judge whether the similarity of the register account number of the first user and second user is greater than third predetermined threshold value;If the The similarity of the register account number of one user and second user is greater than third predetermined threshold value, then executes subsequent from the first use to be processed The step of Second Identity of Local data of the first user and second user are extracted in user data and second user data respectively.
Wherein, the register account number of the first user and second user is two character strings, calculates the first user and second user Register account number similarity, specifically can using in the prior art any one calculate two character strings similarity degree side Method realizes that the present embodiment is not specifically limited herein.For example, can be matched to two character strings, determine in two character strings Matched longest substring, and calculate the longest substring proportion.
In addition, third predetermined threshold value can be set according to actual needs by technical staff, the present embodiment is not done herein It is specific to limit.
Step S205, the similarity between the first user and the Second Identity of Local data of second user is calculated;Compare The size of similarity and the first preset threshold between one user and the Second Identity of Local data of second user.
In the present embodiment, the similarity between the first user and the Second Identity of Local data of second user is calculated, specifically Can using in the prior art any one the similar of two users is calculated according to the behavioral data and attribute information of two users The method of degree realizes that the present embodiment is not specifically limited herein.
Wherein, the first preset threshold can be set according to actual needs by technical staff, and the present embodiment is not done herein It is specific to limit.
If step S206, it is pre- to be greater than first for the similarity between the first user and the Second Identity of Local data of second user If threshold value, it is determined that the first user and second user belong to same user agent, to the first user data and second user data Merge processing.
In the present embodiment, if the similarity between the first user and the Second Identity of Local data of second user is greater than first Preset threshold then illustrates that the direct similarity of the Second Identity of Local data of the first user and second user is very high, can recognize Belong to same user agent for the first user and second user, place is merged to the first user data and second user data Reason.
In addition, the present embodiment consistent with step S203 that merge processing to the first user data and second user data Details are not described herein again.
If step S207, the similarity between the first user and the Second Identity of Local data of second user is less than or waits In the first preset threshold, then compare similarity between the first user and the Second Identity of Local data of second user and second pre- If the size of threshold value, the second preset threshold is less than the first preset threshold.
Wherein, the second preset threshold can be set according to actual needs by technical staff, and the present embodiment is not done herein It is specific to limit.
If it is pre- that the similarity between the first user and the Second Identity of Local data of second user is less than or equal to second If threshold value, then illustrate that correlation degree is smaller between the first user data and second user data, not to the first user data and Two user data merge processing, without the incidence relation established between the first user data and second user data.
If step S208, it is pre- to be greater than second for the similarity between the first user and the Second Identity of Local data of second user If threshold value, then the incidence relation between the first user data and second user data is established.
If the similarity between the first user and the Second Identity of Local data of second user is greater than the second preset threshold, Illustrate not can determine that the first user and second user belong to according to the user data of current existing first user and second user Same user agent, but the association between the first user and second user is larger, therefore, establishes the first user data and second Incidence relation between user data, in order to the subsequent more importantly identity data for getting the first user and second user Afterwards, further more accurately it can determine whether the first user and second user belong to same user agent, to improve number of users According to merging treatment precision.
The embodiment of the present invention is by when uncertain first user and second user belong to same user agent, to be processed The first user data and second user data in extract the Second Identity of Local data of the first user and second user, root respectively According to the size of the similarity between the first user and the Second Identity of Local data of second user, if the first user and second user Second Identity of Local data between similarity be greater than the first preset threshold, it is determined that the first user and second user belong to together One user agent merges processing to the first user data and second user data;If the of the first user and second user Similarity between two identity characteristic data is less than or equal to the first preset threshold, and is greater than the second preset threshold, then establishes Incidence relation between first user data and second user data realizes the multiple use for accurately determining same user agent Amount, and multiple user data merging treatments of same user agent are formed into panorama type user characteristic data, reduce DPI system The data redundancy for entirety of uniting.
Embodiment three
Fig. 3 is the structural schematic diagram for the data processing equipment that the embodiment of the present invention three provides.It is provided in an embodiment of the present invention The process flow that data processing equipment can be provided with configuration for executing data processing embodiment.As shown in figure 3, the device 30 includes: Data extraction module 301, determining module 302 and processing module 303.
Specifically, data extraction module 301 is used for from the first user data and second user data to be processed respectively The first identity characteristic data of the first user and second user are extracted, the first identity characteristic data include at least one for unique Identify the identity information of a user agent.
Determining module 302 is used for the first identity characteristic data according to the first user and second user, determines the first user Whether belong to same user agent with second user.
Processing module 303 is used for if it is determined that the first user and second user belong to same user agent, then to the first user Data and second user data merge processing.
Device provided in an embodiment of the present invention can be specifically used for executing embodiment of the method provided by above-described embodiment one, Details are not described herein again for concrete function.
The embodiment of the present invention by extracting the first use from the first user data and second user data to be processed respectively The first identity characteristic data at family and second user, the first identity characteristic data include at least one for one use of unique identification The identity information of householder's body;According to the first identity characteristic data of the first user and second user, the first user and second are determined Whether user belongs to same user agent;If it is determined that the first user and second user belong to same user agent, then use first User data and second user data merge processing, realize multiple user data merging treatment shapes of same user agent At panorama type user characteristic data, reduce the data redundancy of DPI system entirety.
Example IV
On the basis of above-described embodiment three, in the present embodiment, processing module is also used to:
If uncertain first user and second user belong to same user agent, from the first user data to be processed and the The Second Identity of Local data of the first user and second user are extracted in two user data respectively, Second Identity of Local data are at least It include: home address, friend information, incidence relation and behavioural characteristic data;Calculate the second body of the first user and second user Similarity between part characteristic;Compare similarity between the first user and the Second Identity of Local data of second user with The size of first preset threshold;If the similarity between the first user and the Second Identity of Local data of second user is greater than first Preset threshold, it is determined that the first user and second user belong to same user agent, to the first user data and second user number It is handled according to merging.
Optionally, processing module is also used to:
If it is pre- that the similarity between the first user and the Second Identity of Local data of second user is less than or equal to first If threshold value, then compare the similarity and the second preset threshold between the first user and the Second Identity of Local data of second user Size, the second preset threshold is less than the first preset threshold;If between the first user and the Second Identity of Local data of second user Similarity be greater than the second preset threshold, then establish the incidence relation between the first user data and second user data.
Optionally, processing module is also used to:
Extract the note of the first user and second user respectively from the first user data and second user data to be processed Volume account;Judge whether the first user is consistent with the register account number of second user;If the registration account of the first user and second user It is number consistent, then it executes and subsequent extracts the first user and second respectively from the first user data and second user data to be processed The step of Second Identity of Local data of user.
Optionally, processing module is also used to:
If the register account number of the first user and second user is inconsistent, the registration account of the first user and second user is calculated Number similarity;Judge whether the similarity of the register account number of the first user and second user is greater than third predetermined threshold value;If the The similarity of the register account number of one user and second user is greater than third predetermined threshold value, then executes subsequent from the first use to be processed The step of Second Identity of Local data of the first user and second user are extracted in user data and second user data respectively.
Optionally, processing module is also used to:
Judge in the first identity characteristic data of the first user and second user with the presence or absence of any one identity information one It causes;If the first user with there are any one identity information is consistent in the first identity characteristic data of second user, it is determined that One user and second user belong to same user agent.
Device provided in an embodiment of the present invention can be specifically used for executing embodiment of the method provided by above-described embodiment two, Details are not described herein again for concrete function.
The embodiment of the present invention is by when uncertain first user and second user belong to same user agent, to be processed The first user data and second user data in extract the Second Identity of Local data of the first user and second user, root respectively According to the size of the similarity between the first user and the Second Identity of Local data of second user, if the first user and second user Second Identity of Local data between similarity be greater than the first preset threshold, it is determined that the first user and second user belong to together One user agent merges processing to the first user data and second user data;If the of the first user and second user Similarity between two identity characteristic data is less than or equal to the first preset threshold, and is greater than the second preset threshold, then establishes Incidence relation between first user data and second user data realizes the multiple use for accurately determining same user agent Amount, and multiple user data merging treatments of same user agent are formed into panorama type user characteristic data, reduce DPI system The data redundancy for entirety of uniting.
Embodiment five
Fig. 4 is the structural schematic diagram for the deep packet detection device that the embodiment of the present invention five provides.As shown in figure 4, this sets Standby 40 include: processor 401, memory 402, and is stored in the computer that can be executed on memory 402 and by processor 401 Program.
Processor 401 realizes any of the above-described embodiment of the method when executing and storing in the computer program on memory 402 The data processing method of offer.
The embodiment of the present invention by extracting the first use from the first user data and second user data to be processed respectively The first identity characteristic data at family and second user, the first identity characteristic data include at least one for one use of unique identification The identity information of householder's body;According to the first identity characteristic data of the first user and second user, the first user and second are determined Whether user belongs to same user agent;If it is determined that the first user and second user belong to same user agent, then use first User data and second user data merge processing, realize multiple user data merging treatment shapes of same user agent At panorama type user characteristic data, reduce the data redundancy of DPI system entirety.
In addition, the embodiment of the present invention also provides a kind of computer readable storage medium, it is stored with computer program, the meter Calculation machine program realizes the data processing method that any of the above-described embodiment of the method provides when being executed by processor.
In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each functional module Division progress for example, in practical application, can according to need and above-mentioned function distribution is complete by different functional modules At the internal structure of device being divided into different functional modules, to complete all or part of the functions described above.On The specific work process for stating the device of description, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claims are pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is only limited by appended claims System.

Claims (13)

1. a kind of data processing method characterized by comprising
Extract the first body of the first user and second user respectively from the first user data and second user data to be processed Part characteristic, the first identity characteristic data include that at least one identity for one user agent of unique identification is believed Breath;
According to the first identity characteristic data of first user and second user, determines first user and second user is It is no to belong to same user agent;
If it is determined that first user and second user belong to same user agent, then first user data and second are used User data merges processing.
2. the method according to claim 1, wherein described according to the first of first user and second user Identity characteristic data, determine whether first user and second user belong to same user agent, comprising:
Judge in the first identity characteristic data of first user and second user with the presence or absence of any one identity letter Breath is consistent;
If first user in the first identity characteristic data of second user there are identity information described in any one is consistent, Then determine that first user and second user belong to same user agent.
3. method according to claim 1 or 2, which is characterized in that described according to first user and second user First identity characteristic data, determine whether first user and second user belong to after same user agent, further includes:
If not knowing first user and second user belonging to same user agent, from the first user data to be processed and The Second Identity of Local data of the first user and second user, the Second Identity of Local data are extracted in two user data respectively It includes at least: home address, friend information, incidence relation and behavioural characteristic data;
Calculate the similarity between first user and the Second Identity of Local data of second user;
Compare the similarity and the first preset threshold between first user and the Second Identity of Local data of second user Size;
If the similarity between first user and the Second Identity of Local data of second user is greater than the first preset threshold, Determine that first user and second user belong to same user agent, to first user data and second user data into Row merging treatment.
4. according to the method described in claim 3, it is characterized in that, the second of first user and second user After the size of similarity between identity characteristic data and the first preset threshold, further includes:
If similarity between first user and the Second Identity of Local data of second user is less than or equal to described the One preset threshold, the then similarity between first user and the Second Identity of Local data of second user and second pre- If the size of threshold value, second preset threshold is less than first preset threshold;
If the similarity between first user and the Second Identity of Local data of second user is greater than the described second default threshold Value, then establish the incidence relation between first user data and second user data.
5. according to the method described in claim 3, it is characterized in that, if not knowing first user and second user belongs to together One user agent, it is described to extract the first user and the second use respectively from the first user data and second user data to be processed Before the Second Identity of Local data at family, further includes:
Extract the registration account of the first user and second user respectively from the first user data and second user data to be processed Number;
Judge whether first user is consistent with the register account number of second user;
If first user is consistent with the register account number of second user, execute it is subsequent from the first user data to be processed and The step of Second Identity of Local data of the first user and second user are extracted in second user data respectively.
6. according to the method described in claim 5, it is characterized in that, the registration of judgement first user and second user After whether account is consistent, further includes:
If the register account number of first user and second user is inconsistent, the note of first user and second user are calculated The similarity of volume account;
Judge whether the similarity of the register account number of first user and second user is greater than third predetermined threshold value;
If the similarity of the register account number of first user and second user be greater than third predetermined threshold value, execute it is subsequent to The Second Identity of Local number of the first user and second user is extracted in the first user data and second user data of processing respectively According to the step of.
7. a kind of data processing equipment characterized by comprising
Data extraction module, for extracted respectively from the first user data and second user data to be processed first user and First identity characteristic data of second user, the first identity characteristic data include at least one for one use of unique identification The identity information of householder's body;
Determining module determines that described first uses for the first identity characteristic data according to first user and second user Whether family and second user belong to same user agent;
Processing module, for if it is determined that first user and second user belong to same user agent, then to first use User data and second user data merge processing.
8. device according to claim 7, which is characterized in that the processing module is also used to:
If not knowing first user and second user belonging to same user agent, from the first user data to be processed and The Second Identity of Local data of the first user and second user, the Second Identity of Local data are extracted in two user data respectively It includes at least: home address, friend information, incidence relation and behavioural characteristic data;
Calculate the similarity between first user and the Second Identity of Local data of second user;
Compare the similarity and the first preset threshold between first user and the Second Identity of Local data of second user Size;
If the similarity between first user and the Second Identity of Local data of second user is greater than the first preset threshold, Determine that first user and second user belong to same user agent, to first user data and second user data into Row merging treatment.
9. device according to claim 8, which is characterized in that the processing module is also used to:
If similarity between first user and the Second Identity of Local data of second user is less than or equal to described the One preset threshold, the then similarity between first user and the Second Identity of Local data of second user and second pre- If the size of threshold value, second preset threshold is less than first preset threshold;
If the similarity between first user and the Second Identity of Local data of second user is greater than the described second default threshold Value, then establish the incidence relation between first user data and second user data.
10. device according to claim 8, which is characterized in that the processing module is also used to:
Extract the registration account of the first user and second user respectively from the first user data and second user data to be processed Number;
Judge whether first user is consistent with the register account number of second user;
If first user is consistent with the register account number of second user, execute it is subsequent from the first user data to be processed and The step of Second Identity of Local data of the first user and second user are extracted in second user data respectively.
11. device according to claim 10, which is characterized in that the processing module is also used to:
If the register account number of first user and second user is inconsistent, the note of first user and second user are calculated The similarity of volume account;
Judge whether the similarity of the register account number of first user and second user is greater than third predetermined threshold value;
If the similarity of the register account number of first user and second user be greater than third predetermined threshold value, execute it is subsequent to The Second Identity of Local number of the first user and second user is extracted in the first user data and second user data of processing respectively According to the step of.
12. a kind of deep packet detection device characterized by comprising
Memory, processor, and it is stored in the computer program that can be run on the memory and on the processor,
The processor realizes such as method of any of claims 1-6 when running the computer program.
13. a kind of computer readable storage medium, which is characterized in that it is stored with computer program,
Such as method of any of claims 1-6 is realized when the computer program is executed by processor.
CN201810752308.XA 2018-07-10 2018-07-10 Data processing method, device, equipment and computer readable storage medium Active CN109088788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810752308.XA CN109088788B (en) 2018-07-10 2018-07-10 Data processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810752308.XA CN109088788B (en) 2018-07-10 2018-07-10 Data processing method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109088788A true CN109088788A (en) 2018-12-25
CN109088788B CN109088788B (en) 2021-02-02

Family

ID=64837484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810752308.XA Active CN109088788B (en) 2018-07-10 2018-07-10 Data processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109088788B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245146A (en) * 2019-05-20 2019-09-17 中国平安人寿保险股份有限公司 A kind of user knows method for distinguishing and relevant apparatus
CN110557363A (en) * 2019-06-03 2019-12-10 北京城市网邻信息技术有限公司 identity verification method, device and storage medium
CN111767348A (en) * 2019-04-02 2020-10-13 上海晶赞融宣科技有限公司 Data fusion method and device, storage medium and server
CN112395320A (en) * 2020-11-26 2021-02-23 深圳市房多多网络科技有限公司 Building information merging method, device, equipment and computer readable storage medium
CN113641657A (en) * 2021-08-23 2021-11-12 苏州良医汇网络科技有限公司 Method, device and equipment for merging user accounts

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729682A (en) * 2009-11-11 2010-06-09 南京联创科技集团股份有限公司 Method for automatically tracing communication network users
CN103905379A (en) * 2012-12-25 2014-07-02 腾讯科技(深圳)有限公司 Method for identifying internet users and device thereof
CN105844489A (en) * 2016-03-21 2016-08-10 联想(北京)有限公司 Information processing method and electronic device
CN106572048A (en) * 2015-10-09 2017-04-19 腾讯科技(深圳)有限公司 Identification method and system of user information in social network
CN106570719A (en) * 2016-08-24 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and apparatus
US9774670B2 (en) * 2010-08-22 2017-09-26 Qwilt, Inc. Methods for detection of content servers and caching popular content therein
CN108235368A (en) * 2016-12-15 2018-06-29 中国电信股份有限公司 For determining the method and device of the radio resource of business occupancy
CN108259314A (en) * 2016-12-29 2018-07-06 乐视汽车(北京)有限公司 Information-pushing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729682A (en) * 2009-11-11 2010-06-09 南京联创科技集团股份有限公司 Method for automatically tracing communication network users
US9774670B2 (en) * 2010-08-22 2017-09-26 Qwilt, Inc. Methods for detection of content servers and caching popular content therein
CN103905379A (en) * 2012-12-25 2014-07-02 腾讯科技(深圳)有限公司 Method for identifying internet users and device thereof
CN106572048A (en) * 2015-10-09 2017-04-19 腾讯科技(深圳)有限公司 Identification method and system of user information in social network
CN105844489A (en) * 2016-03-21 2016-08-10 联想(北京)有限公司 Information processing method and electronic device
CN106570719A (en) * 2016-08-24 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and apparatus
CN108235368A (en) * 2016-12-15 2018-06-29 中国电信股份有限公司 For determining the method and device of the radio resource of business occupancy
CN108259314A (en) * 2016-12-29 2018-07-06 乐视汽车(北京)有限公司 Information-pushing method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767348A (en) * 2019-04-02 2020-10-13 上海晶赞融宣科技有限公司 Data fusion method and device, storage medium and server
CN110245146A (en) * 2019-05-20 2019-09-17 中国平安人寿保险股份有限公司 A kind of user knows method for distinguishing and relevant apparatus
CN110245146B (en) * 2019-05-20 2022-11-25 中国平安人寿保险股份有限公司 User identification method and related device
CN110557363A (en) * 2019-06-03 2019-12-10 北京城市网邻信息技术有限公司 identity verification method, device and storage medium
CN112395320A (en) * 2020-11-26 2021-02-23 深圳市房多多网络科技有限公司 Building information merging method, device, equipment and computer readable storage medium
CN112395320B (en) * 2020-11-26 2023-03-07 深圳市房多多网络科技有限公司 Building information merging method, device, equipment and computer readable storage medium
CN113641657A (en) * 2021-08-23 2021-11-12 苏州良医汇网络科技有限公司 Method, device and equipment for merging user accounts

Also Published As

Publication number Publication date
CN109088788B (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN109088788A (en) Data processing method, device, equipment and computer readable storage medium
CN103336766B (en) Short text garbage identification and modeling method and device
CN103399896B (en) The method and system of incidence relation between identification user
CN103368917B (en) A kind of risk control method and system of network virtual user
CN103778186B (en) A kind of detection method of " network waistcoat "
CN112418274B (en) Decision tree generation method and device
CN106921504B (en) Method and equipment for determining associated paths of different users
CN106156755A (en) Similarity calculating method in a kind of recognition of face and system
CN108959516B (en) Conversation message treating method and apparatus
CN109726265A (en) Assist information processing method, equipment and the computer readable storage medium of chat
CN110162637B (en) Information map construction method, device and equipment
CN106572048A (en) Identification method and system of user information in social network
Pilehvar et al. Inducing embeddings for rare and unseen words by leveraging lexical resources
CN104899201B (en) Text Extraction, sensitive word determination method, device and server
US20230410221A1 (en) Information processing apparatus, control method, and program
CN110502670A (en) Network social intercourse relationship knowledge mapping generation method and system based on artificial intelligence
WO2023272862A1 (en) Risk control recognition method and apparatus based on network behavior data, and electronic device and medium
US11412063B2 (en) Method and apparatus for setting mobile device identifier
CN108268762B (en) Mobile social network user identity identification method based on behavior modeling
CN108234454A (en) A kind of identity identifying method, server and client device
US9332031B1 (en) Categorizing accounts based on associated images
CN108462624A (en) A kind of recognition methods of spam, device and electronic equipment
CN116151965B (en) Risk feature extraction method and device, electronic equipment and storage medium
CN109660621A (en) Content pushing method and service equipment
CN113705164A (en) Text processing method and device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant