CN109598525A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN109598525A
CN109598525A CN201710916816.2A CN201710916816A CN109598525A CN 109598525 A CN109598525 A CN 109598525A CN 201710916816 A CN201710916816 A CN 201710916816A CN 109598525 A CN109598525 A CN 109598525A
Authority
CN
China
Prior art keywords
data
abnormal
application platform
application
side attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710916816.2A
Other languages
Chinese (zh)
Other versions
CN109598525B (en
Inventor
郭琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201710916816.2A priority Critical patent/CN109598525B/en
Publication of CN109598525A publication Critical patent/CN109598525A/en
Application granted granted Critical
Publication of CN109598525B publication Critical patent/CN109598525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data processing method and device, by the ID data for obtaining multiple application platform records, after being screened out from it abnormal ID data, also consider the quantity of the abnormal ID data of each application platform record, all ID data of the insecure application platform record of Record ID data are rejected accordingly, the abnormal ID data that can also record the other application platform filtered out simultaneously are rejected, and using other normal ID data as ID data to be processed, utilize the incidence relation between ID data to be processed, accurately and quickly determine at least one target object of these ID data mappings to be processed, it is launched convenient for accurately completing the advertisement to different objects.

Description

Data processing method and device
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of data processing method and device.
Background technique
Digital marketing is the practical activity that products & services are promoted using digital communication channel, thus timely with one kind, Mode that is related, customizing and save cost and consumer link up.Wherein, by the internet behavioral data to user into Row analysis, realizing that advertisement orientation is launched is common technological means in digital advertisement marketing, can satisfy the demand of different user, Substantially increase user experience and product sales volume.
In practical applications, it since same user may leave data information in different platform different scenes, such as browses The User ID data such as device cookie, mobile device ID, website account, cell-phone number, in order to navigate to a variety of ID data of user With the same user, ID data usually are carried out using the incidence relation between the ID data monitored and are got through, are generated for user One globally unique virtual ID data.
However, during progress User ID data are got through, it is easy to be influenced by abnormal ID data, cause to be associated with As a result it fails.For example, due to the ID data that user's unrest fill data obtains, so that a large amount of different users are owned by the same mobile phone Number, then, according to existing data processing scheme, which can be identified as different user same user, so as to cause whole A ID data correlation relation is unreliable, reduces the accuracy of user behavior analysis, to affect the reliability of advertisement dispensing And accuracy.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind State the data processing method and device of problem.
The embodiment of the invention provides a kind of data processing methods, which comprises
Obtain the ID data of multiple application platform records;
Screen the abnormal ID data in the ID data;
Count the quantity in the ID data of each application platform record comprising exception ID data;
It determines that the quantity that statistics obtains meets the first preset standard, rejects ID data and its that respective application platform records The abnormal ID data of his application platform record, and other ID data that will acquire are as ID data to be processed;
Using the incidence relation between the ID data to be processed, the ID data mapping to be processed at least one is determined Target object.
Preferably, the abnormal ID data in the screening ID data, comprising:
According to the type of the ID data, the ID data of acquisition are grouped;
Count incidence relation of each of any one group ID data relative to other ID data organized;
The ID data for determining that the incidence relation meets the second preset standard are exception ID data.
Preferably, the abnormal ID data in the screening ID data, comprising:
The ID data that will acquire are as vertex, side of the application platform as the vertex belonging to the ID data, Construct non-directed graph;
ID, which is extracted, according to the side attribute of the non-directed graph is associated with subgraph;
Obtain the corresponding ID quality of data feature of each side attribute in each described ID association subgraph;
Using the corresponding judgment criteria of all kinds of ID quality of data features, the corresponding exception ID data of corresponding side attribute are determined.
Preferably, the abnormal ID data in the screening ID data, comprising:
According to default blacklist filtering rule, the abnormal ID data in the ID data are screened;
Or;
Screen the abnormal ID data that default white list filtering rule is not met in the ID data.
Preferably, described to obtain the corresponding ID quality of data feature of each side attribute in each described ID association subgraph, packet It includes:
For each side attribute of each ID association subgraph, the quantity score between all kinds of ID data of respective attributes is counted Cloth;
It is described using the corresponding judgment criteria of all kinds of ID quality of data features, determine the corresponding exception ID number of corresponding side attribute According to, comprising:
Obtain the corresponding default quantile of each side attribute;
Judge the quantity of the corresponding 2nd ID data of the corresponding first ID data of each side attribute than distribution whether be more than The corresponding default quantile, the first ID data and the 2nd ID data are the corresponding different types of same side attribute ID data;
If so, determining that the first ID data are exception ID data;
If not, the first ID data and the 2nd ID data that selection is new, it is one corresponding to return to each side attribute Whether the quantity of the corresponding 2nd ID data of the first ID data is more than the corresponding default quantile step than distribution, until completing Judgement of the quantity of the different types of ID data of all side attributes than distribution.
The embodiment of the invention also provides a kind of data processing equipment, described device includes:
Module is obtained, for obtaining the ID data of multiple application platform records;
Screening module, for screening the abnormal ID data in the ID data;
Statistical module, for counting the quantity in the ID data that each application platform records comprising exception ID data;
Data processing module rejects respective application platform for determining that the quantity that statistics obtains meets the first preset standard The abnormal ID data of ID data and other application the platform record of record, and other ID data that will acquire are as ID to be processed Data;
Target object determining module, for determining described wait locate using the incidence relation between the ID data to be processed Manage at least one target object of ID data mapping.
Preferably, the screening module includes:
Grouped element is grouped the ID data of acquisition for the type according to the ID data;
First statistic unit, for counting pass of each of any one group ID data relative to other ID data organized Connection relationship;
First determination unit, the ID data for determining that the incidence relation meets the second preset standard are exception ID number According to.
Preferably, the screening module includes:
Structural unit, ID data for will acquire are as vertex, application platform conduct belonging to the ID data The side on the vertex constructs non-directed graph;
Extraction unit extracts ID for the side attribute according to the non-directed graph and is associated with subgraph;
Feature acquiring unit, it is special for obtaining the corresponding ID quality of data of each side attribute in each described ID association subgraph Sign;
Second determination unit, for determining corresponding side attribute using the corresponding judgment criteria of all kinds of ID quality of data features Corresponding exception ID data.
The embodiment of the invention also provides a kind of storage medium, the storage medium includes the program of storage, wherein in institute Equipment where controlling the storage medium when stating program operation executes data processing method as described above.
The embodiment of the invention also provides a kind of processor, the processor is for running program, wherein described program fortune Data processing method as described above is executed when row.
By above-mentioned technical proposal, data processing method provided by the invention passes through the ID for obtaining multiple application platforms records The abnormal ID number of each application platform record will be further considered after filtering out abnormal ID data in these ID data in data According to quantity, and determining the quantity meet the first preset standard, it is believed that respective application platform record ID data can not It leans on, the abnormal ID data that the other application platform rejecting all ID data of application platform record and filtering out is recorded, Other ID data to will acquire determine this using the incidence relation between ID data to be processed as ID data to be processed At least one target object of a little ID data mappings to be processed.It can be seen that the embodiment of the present application is by ID data itself and its comes Source is added to abnormal judgement with identification, realizes the anomalous identification for adapting to the ID data of different application platforms record, greatly improves The accuracy of user behavior analysis, and then improve the recognition efficiency and accuracy of target object, it is easy to implement advertisement It is accurate to launch.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of data processing method provided by the embodiments of the present application;
Fig. 2 shows a kind of ID provided by the embodiments of the present application to get through schematic diagram;
Fig. 3 shows the flow chart of another data processing method provided by the embodiments of the present application;
Fig. 4 shows a kind of structural block diagram of data processing equipment provided by the embodiments of the present application;
Fig. 5 shows the structural block diagram of another data processing equipment provided by the embodiments of the present application;
Fig. 6 shows the structural block diagram of another data processing equipment provided by the embodiments of the present application;
Fig. 7 shows the hardware structure diagram of a kind of electronic equipment provided by the embodiments of the present application;
Fig. 8 shows a kind of structural block diagram of data processing system provided by the embodiments of the present application.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Referring to Fig.1, be a kind of flow chart of data processing method provided by the embodiments of the present application, this method may include with Lower step:
Step S101 obtains the ID data of multiple application platform records;
In this application, the ID data of acquisition can be used to characterize the identity of internet object, can specifically include browsing The various User ID data such as device cookie, mobile device ID, website account, cell-phone number, the application ID collected to each platform The content and its quantity of data are not construed as limiting.
Wherein, cookie refers to certain websites to distinguish user identity, carries out session tracking two and is stored in user's sheet Data in ground terminal, are properly termed as browser rs cache, and under http protocol, server or script can safeguard Client Work A kind of mode for upper information of standing, and it can be stored in the small text file in user browser (client) by Web server, It may include the information of relevant user, and corresponding browser or the information of electronic equipment etc..
It should be noted that if installing multiple browsers on an electronic equipment, each browser can be respectively independent again Cookie is stored in space, and the same user is logged in using different browsers or logged in using distinct electronic apparatuses, it will is obtained Different cookie informations.As it can be seen that the application can identify user in conjunction with browser cookie and other ID data.
In practical applications, a real object can possess multiple ID data of different application platforms, such as various browsings Device cookie, various social platform accounts, multiple mobile phone IMEI (International Mobile Equipment Identity, International Mobile Equipment Identity code) number and various financial accounts etc., after user logs in application platform, the application Platform would generally be acquired and record to each ID data of the user.
Step S102 screens the abnormal ID data in the ID data;
For reliable and target object is recognized accurately, the ID of the available multiple application platforms records of the embodiment of the present application Data carry out User ID using the incidence relation between these ID data and get through, be generated for user one it is globally unique virtual ID identifies at least one corresponding target object of these ID data, to generate subject object object using ID data User's portrait.
Schematic diagram is got through referring to ID shown in Fig. 2, during progress ID is got through, the presence of abnormal ID data is easy to lead Cause association results invalid.For example, a large amount of different users all fill in the same cell-phone number if user fills in mobile phone IMEI number carelessly, Different user can be identified as same user by the cell-phone number, and this abnormal incidence relation is easy to that the expansion of virus-type occurs, Final resulting entire ID association map is caused to be in down state.
So in order to guarantee quality that User ID is got through, the application is in the multiple ID numbers for obtaining multiple application platforms and sending According to rear, can reliable recognition exception ID data wherein included simultaneously reject, in this regard, the present embodiment can use blacklist, white list, The modes such as statistical rules or non-directed graph realize the identification and rejecting of abnormal ID data, but be not limited to the present embodiment description this Several implementations.
Optionally, the embodiment of the present application can screen different in the ID data of acquisition according to default blacklist filtering rule Normal ID data.Wherein, blacklist filtering rule can rule of thumb or historical record determine, such as by history determination as 13800000000, the exception ID data such as mailbox numbers such as the cell-phone numbers such as 13612345678,123@xxx.xx, are added to blacklist Library, and in this, as blacklist filtering rule, the ID data of acquisition are directly filtered, that is, the ID data that will acquire and black name The data stored in single library are compared, and if they are the same, illustrate that the ID data currently compared are exception ID data, conversely, current ratio Pair ID data may be considered normal ID data.
Optionally, the embodiment of the present application, which can also be screened directly, does not meet the different of default white list filtering rule in ID data Normal ID data, specifically can be according to the create-rule of ID data, as cookie create-rule, legal mobile phone create-rule (can be with By different operator etc. determine, the application to it without limitation) etc., white list filtering rule is determined, for the multiple of acquisition ID data verify it and meet white list filtering rule, it is believed that it is normal ID data, can be used to carry out subsequent ID to get through place Reason, and for such as 123456789 cell-phone number, it is clear that it does not simultaneously meet cell-phone number create-rule, can screen it as exception ID Data.
As another embodiment of the application, the embodiment of the present application can also preset statistical rules, realize to abnormal ID data Screening.Detailed process may include: the ID data for multiple application platforms record of acquisition, can be according to the class of ID data Type is grouped the ID data of acquisition, later, counts the ID data that each ID data is organized relative to other in any one group Incidence relation, determine incidence relation meet the second preset standard ID data be exception ID data.
For example, the quantity that identical cell-phone number corresponds to different mailbox numbers is counted, if the quantity a predetermined level is exceeded can be direct Reject the incidence relation between the cell-phone number and the mailbox number.Such as corresponding 30 different mailbox numbers of the same cell-phone number, it is this It is relatively low to happen probability, it is believed that the cell-phone number is abnormal cell-phone number.It should be noted that for different ID data, The content of second preset standard can be different, however it is not limited to which the corresponding number a predetermined level is exceeded of the present embodiment description is in this Hold, the application is no longer described in detail one by one herein.
In addition, the application can also utilize obtained ID data and its affiliated application as the application another embodiment Platform constructs non-directed graph, later, obtains multiple ID according to side attribute (type of such as application platform) extraction of the non-directed graph and is associated with Subgraph recycles the corresponding ID quality of data feature of each side attribute in each ID association subgraph, filters out ID association subgraph In abnormal ID data.Specific implementation procedure is referred to the description of hereafter corresponding embodiment, and the present embodiment is herein no longer It is described in detail.
It can be seen that the application can be by being not limited to mode listed above, from multiple ID numbers of acquisition above Abnormal ID data are filtered out in, to improve the reliability and accuracy of recongnition of objects.
Step S103 counts the quantity in the ID data of each application platform record comprising exception ID data;
Applicants have found that for different application platforms record data there may be very big quality of data characteristic is poor Different, for example the user mobile phone number of the website A is more accurate, the subscriber mailbox comparison of the website B is accurate, C browser default disabling the Tripartite cookie is easy to appear the case where first party cookie corresponds to many third party cookie, and D browser seldom goes out relatively Existing such case.
As it can be seen that source, that is, application platform of ID data, has a significant impact to the accurate and reliable recognition of abnormal ID data.Such as If the website B is because a cell-phone number is determined as abnormal cell-phone number by ID data reasons, this will will lead to the cell-phone number and supervise under the website A The corresponding ID data correlation relation measured is also removed, and reduces the identification certainty and accuracy of abnormal ID data.
So the application can be directed to the ID data from different application platforms, corresponding data processing standards are set i.e. Abnormal ID data criterion of identification, still by taking above-mentioned C and D browser as an example, the first party and third party cookie quantity ratio of C browser Abnormal judgment threshold, it is more high than the setting of the abnormal judgment threshold of D browser.It should be noted that for other kinds of Application platform, identify its record abnormal ID data mode can all differences, specifically can be according to the type of application platform And the content of ID data determines, the application is no longer described in detail one by one herein.
Based on this, the embodiment of the present application after filtering out abnormal ID data, can be tested further in the ID data from acquisition The quantity of the abnormal ID data from each application platform is demonstrate,proved, i.e., the embodiment of the present application is in the verifying to ID data correlation relation On the basis of, increase the verifying of the source-information to ID data.
Optionally, after the present embodiment filters out all exception ID data, can be divided according to the source of abnormal ID data Class, to obtain the quantity of the abnormal ID data from different application platforms.Certainly, the application is when screening exception ID data, After can also be according to all ID data screenings that the source of ID data, an application platform record, then screens another and answer The ID data recorded with platform, to directly obtain the abnormal ID data of different application platforms record.The application is to each application products The statistical of the abnormal ID data of platform is without limitation.
Step S104 judges whether there is the application platform that the quantity that statistics obtains meets the first preset standard, if so, Enter step S105;If not, executing step S106;
It is corresponding to screen in conjunction with above-mentioned analysis it is found that the reliability for the ID data that different application platforms monitor is different The standard of abnormal ID data is also different, however, either from the ID data of which application platform, if the application platform has The abnormal ID data having are very more, it will usually think the application platform record ID data be it is incredible, in order to further mention The reliability of high recongnition of objects can reject the ID data of this application platform record.
Based on this, after being screened according to respective mode and counting to obtain the quantity of the abnormal ID data of each application platform, this The quantity for the abnormal ID data that application embodiment can further calculate each application platform accounts for the total of the ID data of its record The ratio or percentage of quantity, and verify the ratio or whether percentage reaches outlier threshold corresponding with the application platform, if Reach, illustrates that the ID data of application platform record have a large amount of exception ID data, this application platform note will not be reused The ID data of record carry out subsequent processing;If not up to, illustrating exception ID data ratio existing for the ID data of application platform record Less, the normal ID data that can also be recorded using the application platform carry out subsequent processing.
It can be seen that the first preset standard in step S104 can refer to that the quantity for the abnormal ID data that statistics obtains accounts for it The ratio or percentage of the total quantity of the ID data of application platform record, reach the default outlier threshold of the application platform.Wherein, The default outlier threshold of different application platforms may be the same or different, the embodiment of the present application can be according to practical need It was determined that without limitation to the size of the default outlier threshold of each application platform.
Step S105 rejects the ID data of respective application platform record and the abnormal ID number of other application platform record According to, and other ID data that will acquire are as ID data to be processed;
Step S106, the abnormal ID data that rejecting screening goes out, and other ID data that will acquire are as ID data to be processed;
In rejecting abnormalities ID data, the embodiment of the present application considers the source of ID data, verifies recording different types The reliability of each application platform of ID data, it will be considered that the ID data of insecure application platform record are rejected, to will be considered to Reliably ID data realize the identification of target object as ID data to be processed accordingly.
Step S107 determines at least the one of ID data mapping to be processed using the incidence relation between ID data to be processed A target object.
Referring to the incidence relation between ID data shown in Fig. 2, for the ID to be processed of determining each application platform record Data carry out User ID and get through, the ID data to be processed for being mapped as same target are associated, each ID number as shown in Figure 2 Thick segment between indicates the incidence relation between two ID data, and multiple ID data that same target maps are got through.
Wherein, the incidence relation between ID data to be processed can be determined according to data content, as shown in Fig. 2, user exists Application platform carries out operation generation behavior event, which can usually will record behavior event, and generate the row For the relevant information of event, the account of the application platform is such as logged in, what client is the application platform etc. is logged in by, this Shen The data that application platform records please are denoted as ID data.
So if same user logs in different application platforms, in the ID data of this multiple application platforms record there may be Identical ID data content, the present embodiment can be established between the ID data of different application platforms based on identical ID data content Incidence relation;And for the different ID data of same application platform record, it can also in this way, to identical ID number Incidence relation is determined between.
Optionally, multiple ID data that the embodiment of the present application can also have using pre-recorded same user, determine Incidence relation between ID data to be processed from multiple application platforms, i.e., to corresponding with multiple ID data of same user or Identical ID data to be processed determine incidence relation etc..The application closes the association how determined between multiple ID data to be processed The method of system carries out the implementation method that User ID is got through to ID data to be processed and is not construed as limiting, it is not limited to the present embodiment The implementation method of foregoing description.
It according to the method described above, can be using the ID data to be processed with incidence relation as an ID data group, the data Each ID data in group can have incidence relation as shown in Figure 2, i.e., any one ID data to be processed at least with one its His ID data correlation to be processed, most of ID data to be processed are usually and at least two other ID data correlations to be processed.This In application, an ID data group usually corresponds to a target object, that is to say, that each ID number to be processed in an ID data group According to usually a target object logs in the data that different application platforms monitor.
As another embodiment of the application, the application can use the multiple ID numbers to be processed for mapping same target object According to, construct user's portrait of the target object, unique virtual ID data of the corresponding target object can also be generated, so as to When needing to information such as certain user's advertisements, by inquiring the virtual ID data of the user, the user and its use are accurately identified Family portrait, to push the information such as suitable advertisement for it.
In conclusion the embodiment of the present application will obtain the ID data of multiple application platforms records, sieved from these ID data After selecting abnormal ID data, the quantity of the abnormal ID data of each application platform record will be further considered, and determining the number Amount meets the first preset standard, it is believed that the ID data of respective application platform record are unreliable, will reject application platform note The abnormal ID data of all ID data of record and the other application platform filtered out record, thus other ID data that will acquire As ID data to be processed, using the incidence relation between ID data to be processed, these ID data mappings to be processed are determined extremely A few target object.It can be seen that ID data itself and its source are added to abnormal judgement and identification by the embodiment of the present application In, it realizes the anomalous identification for adapting to the ID data of different application platforms record, substantially increases the accuracy of user behavior analysis, And then the recognition efficiency and accuracy of target object are improved, it is easy to implement the accurate dispensing of advertisement.
Referring to Fig. 3, for the flow chart of another data processing method provided by the embodiments of the present application, this method may include Following steps:
Step S301 obtains the ID data of multiple application platform records;
Wherein, ID data may include the data of the identity for the user that characterization logs in respective application platform, such as browser The different types of data such as cookie, mobile device ID, website account, cell-phone number and mailbox number, the application are flat to each application The content of different types of ID data of platform record is not construed as limiting, the ID data type of each application platform record can it is identical can also With difference, can specifically be determined according to application platform type and user in factors such as the operations of the application platform.
Step S302, the ID data that will acquire are as vertex, side of the application platform as the vertex belonging to the ID data, Construct non-directed graph;
In practical applications, non-directed graph refers to that side does not have directive figure, in the present embodiment, the ID data that can be will acquire As vertex set, corresponding application platform is as side collection, to be the vertex of non-directed graph by ID data pick-up, by the ID data institute Side of the application platform of category as this vertex, to generate the non-directed graph of ID data.
The vertex attribute of description in conjunction with above-described embodiment to ID data, gained non-directed graph may include but not limit to In extracted from ID data such as No. IEMI, enterprise is using cookie, cell-phone number and mailbox number ID type, Yi Jiru The ID numerical value such as 138xxxxxxxx, 123@xxx.xx.The side attribute of non-directed graph can be application platform belonging to corresponding vertex attribute Attribute information, then the side attribute can include but is not limited to various media names, various browser types etc..
Step S303 extracts ID according to the side attribute of the non-directed graph and is associated with subgraph;
The present embodiment can be grouped the side and its vertex of non-directed graph according to the type of side attribute, and by same class The corresponding side of the side attribute of type and its vertex constitute an ID and are associated with subgraph.Such as by side attribute be various types browser side and Its vertex constitutes an ID and is associated with subgraph;Side attribute is constituted another ID and be associated with subgraph etc. for the side of media name and its vertex Deng the present embodiment is no longer described in detail one by one herein.
It can be seen that generally including same type of multiple side attributes in an ID association subgraph, such as above-mentioned is multiple Browser, multiple media names, multiple cell-phone numbers etc..
Step S304 is counted between all kinds of ID data of respective attributes for each side attribute of each ID association subgraph Quantity is than distribution;
In the present embodiment practical application, if a first ID data are many relative to the quantity of the 2nd ID data, this The possible exception of first ID data, the first ID data and the 2nd ID data are the different types of ID data of same side attribute.Institute With the application can count the quantity between all kinds of ID data than distribution for each side attribute, and a such as A application cookie is corresponding B application cookie quantity is than distribution, and a B application cookie corresponding A application cookie quantity is than distribution;One A media corresponds to B Media quantity is than distribution, and a B media corresponding A media quantity is than distribution etc..
Step S305 obtains the corresponding default quantile of each side attribute;
In conjunction with foregoing description, default quantile can be judge corresponding side attribute ID data whether Yi Chang judgement mark Standard, default quantile corresponding for different side attributes can be different, naturally it is also possible to and it is identical, it specifically can be according to actual needs It determines, the numerical value of the present embodiment default quantile corresponding to each side attribute is not construed as limiting.Wherein, default quantile can be one A percentage, however, it is not limited to this.
Step S306 judges the quantity of the corresponding 2nd ID data of the corresponding first ID data of each side attribute than distribution It whether is more than to preset quantile accordingly, if so, entering step S307;If it is not, executing step S308;
Wherein, the first ID data and the 2nd ID data are the corresponding different types of ID data of same side attribute, specifically may be used To be any one ID data in different types of ID data set, the embodiment of the present application can successively be determined according to certain sequence First ID data and the 2nd ID data are judged, the judgement for the quantity between the ID data of any two type than distribution Method can be identical.
It illustrates, it is assumed that the default quantile of certain side attribute such as browser is 95%, and 95% A application cookie is corresponding B application cookie number within 20, it is believed that A application cookie is normal;If the corresponding B of an A application cookie is answered It is more than 20 with the quantity of cookie, it is believed that A application cookie is abnormal.For the exception of other side attributes such as media name Judgment method is similar, and this will not be detailed here for the present embodiment.
Step S307 determines that the first ID data are exception ID data;
Step S308 detects whether to complete quantity the sentencing than distribution to the different types of ID data of all side attributes It is disconnected, if not, entering step S309;If so, executing step S310;
Optionally, the embodiment of the present application can realize the different types of ID to each side attribute according to certain sequence or rule Quantity between data after completing primary judgement, can detecte current whether there is and do not carry out quantity score than the judgement of distribution The ID data of cloth judgement, if it does, will continue to judge in the manner described above, until completing all types of ID to all side attributes Quantity between data than distribution judgement, so as to all exception ID data in the ID data that screening obtains.
Step S309 selects new the first ID data and the 2nd ID data, and return step S306;
Step S310 counts the quantity accounting of the abnormal ID data under each side attribute;
In the present embodiment, the corresponding exception ID data of each side attribute are filtered out according to the method described above, that is, are screened certainly It is whether reliable in order to further verify each application sample platform after the abnormal ID data of different application platforms, each side attribute can be counted The quantity of corresponding exception ID data, and calculate the quantity that the corresponding exception ID data of each side attribute account for its total ID data Than the quantity accounting of exception ID data as under the side attribute.
Step S311, verifying are currently greater than the side attribute of corresponding outlier threshold with the presence or absence of quantity accounting, if so, entering step Rapid S312;If it is not, executing step S313;
In this application, ID data corresponding for each side attribute abnormal ID data can may all occur, if it exists Abnormal ID overabundance of data, it will usually think that the corresponding application platform of the side attribute is unreliable, will no longer with the application platform supervise It surveys and the ID data recorded carries out user behavior analysis.So the application, which can be set, judges the side for different side attributes The whether reliable critical value of the corresponding application platform of attribute is preset in the ID data of application platform record and there are how many exceptions When ID data, so that it may think that the application platform is unreliable, can be using the critical value as outlier threshold, the present embodiment is different to this The size of normal threshold value is not construed as limiting.
Step S312 rejects the ID data that quantity accounting is greater than under the side attribute of corresponding outlier threshold, and filter out Abnormal ID data under other side attributes;
It is big for the quantity accounting of exception ID data in the ID data of record in conjunction with the description of above-described embodiment corresponding portion When outlier threshold, it is generally recognized that application platform is insecure accordingly, in order to avoid it is to user behavior analysis result The ID data that this kind of application platform records can be rejected, be not used in subsequent ID and get through processing by adverse effect, the embodiment of the present application.
Moreover, thinking insecure application platform record in addition to rejecting to improve the accuracy of identification target object Outside ID data, other sides that the abnormal ID data that can also record the other application platform filtered out, i.e. rejecting screening go out belong to Abnormal ID data under property.
Step S313, rejecting screening go out each side attribute under abnormal ID data;
Abnormal ID data in the ID data for determining each application platform record are all not many, i.e., verified each application of determination Platform is all abnormal ID data deletion that is reliable, can only recording each application platform, retains the normal ID data of acquisition, For realizing that ID is got through, recongnition of objects accuracy is improved.
Step S314 using other remaining ID data as ID data to be processed, and utilizes the pass between ID data to be processed Connection relationship determines at least one target object of ID data mapping to be processed.
In the present embodiment, for obtained ID data to be processed, algorithm can be got through using ID and determines each ID to be processed Incidence relation between data realizes gained to obtain the corresponding target object of each group of associated ID data to be processed The reliable recognition of the target object for the ID data to be processed description arrived, and then to each ID to be processed of each target object mapping Data are analyzed, and realize the accurately and securely analysis to user behavior, to be based on the analysis results its accurate dispensing advertisement.
Wherein, the embodiment of the present application not only allows for ID data correlation relation in determining ID data procedures to be processed Monitoring, and consider the otherness in the source of different ID data, eliminate the poor application platform record of quality of data characteristic ID data, that is, eliminate record ID data it is unreliable or inaccurate application platform record all ID data, to avoid ID data source causes abnormal ID data to misidentify, and influences the accuracy and reliability of recongnition of objects.
It is a kind of structural block diagram of data processing equipment provided by the embodiments of the present application referring to Fig. 4, which can wrap It includes:
Module 41 is obtained, for obtaining the ID data of multiple application platform records;
Screening module 42, for screening the abnormal ID data in the ID data;
Optionally, the application can realize the screening of abnormal ID data in different ways, so, referring to Fig. 5, the screening Module 42 may include:
Grouped element 4211 is grouped the ID data of acquisition for the type according to the ID data;
First statistic unit 4212, the ID data organized for counting each of any one group ID data relative to other Incidence relation;
First determination unit 4213, the ID data for determining that the incidence relation meets the second preset standard are exception ID Data.
As another embodiment of the application, referring to structural block diagram shown in fig. 6, which can also include:
Structural unit 4221, ID data for will acquire are as vertex, application platform belonging to the ID data As the side on the vertex, non-directed graph is constructed;
Extraction unit 4222 extracts ID for the side attribute according to the non-directed graph and is associated with subgraph;
Feature acquiring unit 4223, for obtaining the corresponding ID data matter of each side attribute in each described ID association subgraph Measure feature;
In this application, this feature acquiring unit 4223 specifically can be used for belonging to for each side of each ID association subgraph Property, the quantity between all kinds of ID data of respective attributes is counted than distribution.As it can be seen that the ID quality of data feature that the present embodiment obtains The quantity between each ID data be can be than distribution, however, it is not limited to this.
Second determination unit 4224, for determining corresponding edge using the corresponding judgment criteria of all kinds of ID quality of data features The corresponding exception ID data of attribute.
Specific implementation content based on features described above acquiring unit 4223, second determination unit 4224 may include:
Subelement is obtained, for obtaining the corresponding default quantile of each side attribute;
Judgment sub-unit, for judging the quantity of the corresponding 2nd ID data of the corresponding first ID data of each side attribute It whether is more than the corresponding default quantile than distribution, the first ID data and the 2nd ID data are same side attributes Corresponding different types of ID data;
First determine subelement, be for the judging result in judgment sub-unit it is yes, determine the first ID data be it is different Normal ID data;
Second determines subelement, be for the judging result in judgment sub-unit it is no, select new the first ID data and 2nd ID data, and trigger judgment sub-unit and continue to judge, until completing the number of the different types of ID data of all side attributes Measure the judgement than distribution.
Optionally, above-mentioned screening module 42 can also include:
First screening unit, for screening the abnormal ID data in the ID data according to blacklist filtering rule is preset;
Or;
Second screening unit, for screening the abnormal ID number for not meeting default white list filtering rule in the ID data According to.
Statistical module 43, for counting the quantity in the ID data that each application platform records comprising exception ID data;
It is flat to reject respective application for determining that the quantity that statistics obtains meets the first preset standard for data processing module 44 The ID data of platform record and the abnormal ID data of other application platform record, and other ID data that will acquire are as to be processed ID data;
Target object determining module 45, for using the incidence relation between the ID data to be processed, determine it is described to Handle at least one target object of ID data mapping.
In conclusion in the present embodiment, the ID data of multiple application platform records will be obtained, and from these ID data After filtering out abnormal ID data, the quantity of the abnormal ID data of each application platform record will be further considered, and should determining Quantity meets the first preset standard, it is believed that the ID data of respective application platform record are unreliable, will reject the application platform The abnormal ID data of all ID data of record and the other application platform filtered out record, thus other ID numbers that will acquire According to as ID data to be processed, using the incidence relation between ID data to be processed, these ID data mappings to be processed are determined At least one target object.
It can be seen that ID data itself and its source are added to abnormal judgement with identification by the embodiment of the present application, realize The anomalous identification for adapting to the ID data of different application platforms record, substantially increases the accuracy of user behavior analysis, Jin Erti The high recognition efficiency and accuracy of target object, is easy to implement the accurate dispensing of advertisement.
Apparatus structure description mainly is carried out from the functional module angle of realization data processing method above, below from hard The description of part structure, the apparatus may include processor and memory, above-mentioned acquisition module, screening module, statistical module, at data It manages module, target object determining module etc. to store in memory as program unit, storage is stored in by processor execution Above procedure unit in device realizes corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one Or more, ID data itself and its source are added to abnormal judgement with identification by adjusting kernel parameter, realizes and adapts to not With the anomalous identification of the ID data of application platform record, the accuracy of user behavior analysis is substantially increased, and then improve mesh The recognition efficiency and accuracy for marking object, are easy to implement the accurate dispensing of advertisement.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash
RAM), memory includes at least one storage chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor Existing above-mentioned data processing method, the realization process of this method are referred to the description of above method embodiment corresponding portion, this reality Applying example, details are not described herein.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation The above-mentioned data processing method of Shi Zhihang, the realization process of this method are referred to the description of above method embodiment corresponding portion, Details are not described herein for the present embodiment.
It is the hardware structure diagram of a kind of electronic equipment provided by the embodiments of the present application referring to Fig. 7, which can wrap Include but be not limited to following hardware component, moreover, electronic equipment equipment provided by the present application can be server, PC, iPad, The products such as mobile phone, the application are not construed as limiting the product type of electronic equipment, the electronic equipment may include: communication port 71, Memory 72, processor 73 and it is stored in the program that can be run on memory 72 and on processor 73.
Communication port 71, for being communicatively coupled with multiple application platforms;
In embodiments of the present invention, communication port 71 can be the wireless communication such as WIFI module, gsm module or GPRS module The port of module is also possible to the port of wire communication module, such as USB port, type and its knot of the application to communication port Structure is not construed as limiting.Memory 72, for storing the multiple instruction for realizing the data processing method of above method embodiment description.
In practical applications, memory may include the non-volatile memory in computer-readable medium, arbitrary access The forms such as memory (RAM) and/or Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory Including at least one storage chip.
Processor 73, for loading and executing the program of memory storage, comprising:
Obtain the ID data of multiple application platform records;
Screen the abnormal ID data in the ID data;
Count the quantity in the ID data of each application platform record comprising exception ID data;
It determines that the quantity that statistics obtains meets the first preset standard, rejects ID data and its that respective application platform records The abnormal ID data of his application platform record, and other ID data that will acquire are as ID data to be processed;
Using the incidence relation between the ID data to be processed, the ID data mapping to be processed at least one is determined Target object.
Optionally, the program that processor 73 executes that memory 72 stores realizes that the screening process of exception ID data can also wrap It includes:
According to the type of the ID data, the ID data of acquisition are grouped;
Count incidence relation of each of any one group ID data relative to other ID data organized;
The ID data for determining that the incidence relation meets the second preset standard are exception ID data.
Alternatively, can also include:
The ID data that will acquire are as vertex, side of the application platform as the vertex belonging to the ID data, Construct non-directed graph;
ID, which is extracted, according to the side attribute of the non-directed graph is associated with subgraph;
Obtain the corresponding ID quality of data feature of each side attribute in each described ID association subgraph;
Using the corresponding judgment criteria of all kinds of ID quality of data features, the corresponding exception ID data of corresponding side attribute are determined.
Optionally, the program for realizing following steps can also be performed in processor 73:
For each side attribute of each ID association subgraph, the quantity score between all kinds of ID data of respective attributes is counted Cloth;
Obtain the corresponding default quantile of each side attribute;
Judge the quantity of the corresponding 2nd ID data of the corresponding first ID data of each side attribute than distribution whether be more than The corresponding default quantile, the first ID data and the 2nd ID data are the corresponding different types of same side attribute ID data;
If so, determining that the first ID data are exception ID data;
If not, the first ID data and the 2nd ID data that selection is new, it is one corresponding to return to each side attribute Whether the quantity of the corresponding 2nd ID data of the first ID data is more than the corresponding default quantile step than distribution, until completing Judgement of the quantity of the different types of ID data of all side attributes than distribution.
Alternatively, the program for realizing following steps can also be performed in processor 73:
According to default blacklist filtering rule, the abnormal ID data in the ID data are screened;
Or;Screen the abnormal ID data that default white list filtering rule is not met in the ID data.
It can be seen that ID data itself and its source are added to abnormal judgement with identification by the embodiment of the present application, realize The anomalous identification for adapting to the ID data of different application platforms record, substantially increases the accuracy of user behavior analysis, Jin Erti The high recognition efficiency and accuracy of target object, is easy to implement the accurate dispensing of advertisement.
It is a kind of structural block diagram of data processing system provided by the embodiments of the present application referring to Fig. 8, which can wrap It includes: multiple application apparatus 81 of corresponding different application platforms, and the electronic equipment 82 with above-mentioned electronic equipment hardware configuration.
Wherein, application apparatus 81 can be terminal or server or database etc., the product class of the application application device Type is not construed as limiting, and in the embodiment of the present application, each application apparatus can be used for monitoring and recording object and log in respective application platform The ID data of generation, specific implementation process are referred to the description of above method embodiment corresponding portion, and the present embodiment is herein not It is described in detail again.
Present invention also provides a kind of computer program products, when executing on data processing equipment, are adapted for carrying out just The program of beginningization there are as below methods step:
Obtain the ID data of multiple application platform records;
Screen the abnormal ID data in the ID data;
Count the quantity in the ID data of each application platform record comprising exception ID data;
It determines that the quantity that statistics obtains meets the first preset standard, rejects ID data and its that respective application platform records The abnormal ID data of his application platform record, and other ID data that will acquire are as ID data to be processed;
Using the incidence relation between the ID data to be processed, the ID data mapping to be processed at least one is determined Target object.
Optionally, the program for screening the abnormal ID data method step in the ID data is executed:
According to the type of the ID data, the ID data of acquisition are grouped;
Count incidence relation of each of any one group ID data relative to other ID data organized;
The ID data for determining that the incidence relation meets the second preset standard are exception ID data.
As another embodiment of the application, the program for screening the abnormal ID data method step in the ID data is executed:
The ID data that will acquire are as vertex, side of the application platform as the vertex belonging to the ID data, Construct non-directed graph;
ID, which is extracted, according to the side attribute of the non-directed graph is associated with subgraph;
Obtain the corresponding ID quality of data feature of each side attribute in each described ID association subgraph;
Using the corresponding judgment criteria of all kinds of ID quality of data features, the corresponding exception ID data of corresponding side attribute are determined.
Wherein, it executes and described obtains the corresponding ID quality of data feature step of each side attribute in each described ID association subgraph Rapid program, can specifically include:
For each side attribute of each ID association subgraph, the quantity score between all kinds of ID data of respective attributes is counted Cloth;
Correspondingly, execution is described using the corresponding judgment criteria of all kinds of ID quality of data features, determines corresponding side attribute pair The program for the abnormal ID data step answered, can specifically include:
Obtain the corresponding default quantile of each side attribute;
Judge the quantity of the corresponding 2nd ID data of the corresponding first ID data of each side attribute than distribution whether be more than The corresponding default quantile, the first ID data and the 2nd ID data are the corresponding different types of same side attribute ID data;
If so, determining that the first ID data are exception ID data;
If not, the first ID data and the 2nd ID data that selection is new, it is one corresponding to return to each side attribute Whether the quantity of the corresponding 2nd ID data of the first ID data is more than the corresponding default quantile step than distribution, until completing Judgement of the quantity of the different types of ID data of all side attributes than distribution.
Optionally, the program for screening the abnormal ID data method step in the ID data is executed, may include:
According to default blacklist filtering rule, the abnormal ID data in the ID data are screened;
Or;
Screen the abnormal ID data that default white list filtering rule is not met in the ID data.
To sum up, ID data itself and its source are added to abnormal judgement by computer program product provided by the embodiments of the present application In identification, realizes the anomalous identification for adapting to the ID data of different application platforms record, substantially increase user behavior analysis Accuracy, and then the recognition efficiency and accuracy of target object are improved, it is easy to implement the accurate dispensing of advertisement.
It should be understood by those skilled in the art that, embodiments herein can provide as method, apparatus, electronic equipment, be System or computer program product.Therefore, the application can be used complete hardware embodiment, complete software embodiment or combine software With the form of the embodiment of hardware aspect.Moreover, it wherein includes that computer can use journey that the application, which can be used in one or more, Implement in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of sequence code Computer program product form.
The application is produced referring to according to the method, apparatus of the embodiment of the present application, electronic equipment, system and computer program The flowchart and/or the block diagrams of product describes.It should be understood that can be realized by computer program instructions in flowchart and/or the block diagram Each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these meters Calculation machine program instruction is to the place of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices Device is managed to generate a machine, so that producing by the instruction that computer or the processor of other programmable data processing devices execute Life is for realizing the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram Device.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flashRAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide for method, apparatus, electronic equipment, system or Computer program product.Therefore, the application can be used complete hardware embodiment, complete software embodiment or combine software and hardware The form of the embodiment of aspect.Moreover, it wherein includes computer usable program code that the application, which can be used in one or more, Computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) on the calculating implemented The form of machine program product.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims (10)

1. a kind of data processing method, which is characterized in that the described method includes:
Obtain the ID data of multiple application platform records;
Screen the abnormal ID data in the ID data;
Count the quantity in the ID data of each application platform record comprising exception ID data;
Determine that the obtained quantity of statistics meets the first preset standard, reject respective application platform record ID data and other answer The abnormal ID data recorded with platform, and other ID data that will acquire are as ID data to be processed;
Using the incidence relation between the ID data to be processed, at least one target of the ID data mapping to be processed is determined Object.
2. the method according to claim 1, wherein the abnormal ID data in the screening ID data, packet It includes:
According to the type of the ID data, the ID data of acquisition are grouped;
Count incidence relation of each of any one group ID data relative to other ID data organized;
The ID data for determining that the incidence relation meets the second preset standard are exception ID data.
3. the method according to claim 1, wherein the abnormal ID data in the screening ID data, packet It includes:
The ID data that will acquire are as vertex, side of the application platform as the vertex belonging to the ID data, construction Non-directed graph;
ID, which is extracted, according to the side attribute of the non-directed graph is associated with subgraph;
Obtain the corresponding ID quality of data feature of each side attribute in each described ID association subgraph;
Using the corresponding judgment criteria of all kinds of ID quality of data features, the corresponding exception ID data of corresponding side attribute are determined.
4. the method according to claim 1, wherein the abnormal ID data in the screening ID data, packet It includes:
According to default blacklist filtering rule, the abnormal ID data in the ID data are screened;
Or;
Screen the abnormal ID data that default white list filtering rule is not met in the ID data.
5. according to the method described in claim 3, it is characterized in that, described obtain each side category in each described ID association subgraph The corresponding ID quality of data feature of property, comprising:
For each side attribute of each ID association subgraph, the quantity between all kinds of ID data of respective attributes is counted than distribution;
It is described to determine the corresponding exception ID data of corresponding side attribute using the corresponding judgment criteria of all kinds of ID quality of data features, Include:
Obtain the corresponding default quantile of each side attribute;
Judge the quantity of the corresponding 2nd ID data of the corresponding first ID data of each side attribute than whether being distributed more than corresponding The default quantile, the first ID data and the 2nd ID data are the corresponding different types of ID of same side attribute Data;
If so, determining that the first ID data are exception ID data;
If not, the first ID data and the 2nd ID data that selection is new, it is one first corresponding to return to each side attribute Whether the quantity of the corresponding 2nd ID data of ID data is more than the corresponding default quantile step than distribution, until completing all Judgement of the quantity of the different types of ID data of side attribute than distribution.
6. a kind of data processing equipment, which is characterized in that described device includes:
Module is obtained, for obtaining the ID data of multiple application platform records;
Screening module, for screening the abnormal ID data in the ID data;
Statistical module, for counting the quantity in the ID data that each application platform records comprising exception ID data;
Data processing module rejects respective application platform record for determining that the quantity that statistics obtains meets the first preset standard ID data and other application platform record abnormal ID data, and other ID data that will acquire are as ID number to be processed According to;
Target object determining module, for determining the ID to be processed using the incidence relation between the ID data to be processed At least one target object of data mapping.
7. device according to claim 6, which is characterized in that the screening module includes:
Grouped element is grouped the ID data of acquisition for the type according to the ID data;
First statistic unit is closed for counting each of any one group ID data relative to the association of other ID data organized System;
First determination unit, the ID data for determining that the incidence relation meets the second preset standard are exception ID data.
8. device according to claim 6, which is characterized in that the screening module includes:
Structural unit, ID data for will acquire are as vertex, described in application platform belonging to the ID data is used as The side on vertex constructs non-directed graph;
Extraction unit extracts ID for the side attribute according to the non-directed graph and is associated with subgraph;
Feature acquiring unit, for obtaining the corresponding ID quality of data feature of each side attribute in each described ID association subgraph;
Second determination unit, for determining that corresponding side attribute is corresponding using the corresponding judgment criteria of all kinds of ID quality of data features Abnormal ID data.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment execute data processing method according to any one of claims 1 to 5.
10. a kind of processor, is characterized in that, the processor is for running program, wherein executes when described program is run as weighed Benefit requires data processing method described in any one of 1-5.
CN201710916816.2A 2017-09-30 2017-09-30 Data processing method and device Active CN109598525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710916816.2A CN109598525B (en) 2017-09-30 2017-09-30 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710916816.2A CN109598525B (en) 2017-09-30 2017-09-30 Data processing method and device

Publications (2)

Publication Number Publication Date
CN109598525A true CN109598525A (en) 2019-04-09
CN109598525B CN109598525B (en) 2023-01-17

Family

ID=65955783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710916816.2A Active CN109598525B (en) 2017-09-30 2017-09-30 Data processing method and device

Country Status (1)

Country Link
CN (1) CN109598525B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523034A (en) * 2020-04-24 2020-08-11 腾讯科技(深圳)有限公司 Application processing method, device, equipment and medium
CN113396433A (en) * 2019-06-11 2021-09-14 深圳市欢太科技有限公司 User portrait construction method and related product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120114040A (en) * 2011-04-06 2012-10-16 주식회사 바닐라하우스텐 Abusing observing method and abusing observing system about advertisement type of cost per click
US20140137226A1 (en) * 2011-07-20 2014-05-15 Tencent Technology (Shenzhen) Company Ltd. Method and System for Processing Identity Information
CN103886068A (en) * 2014-03-20 2014-06-25 北京国双科技有限公司 Data processing method and device for Internet user behavior analysis
CN104216985A (en) * 2014-09-04 2014-12-17 深圳供电局有限公司 Method and system for discriminating abnormal data
CN106656929A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Information processing method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120114040A (en) * 2011-04-06 2012-10-16 주식회사 바닐라하우스텐 Abusing observing method and abusing observing system about advertisement type of cost per click
US20140137226A1 (en) * 2011-07-20 2014-05-15 Tencent Technology (Shenzhen) Company Ltd. Method and System for Processing Identity Information
CN103886068A (en) * 2014-03-20 2014-06-25 北京国双科技有限公司 Data processing method and device for Internet user behavior analysis
CN104216985A (en) * 2014-09-04 2014-12-17 深圳供电局有限公司 Method and system for discriminating abnormal data
CN106656929A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Information processing method and apparatus

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113396433A (en) * 2019-06-11 2021-09-14 深圳市欢太科技有限公司 User portrait construction method and related product
CN113396433B (en) * 2019-06-11 2023-12-26 深圳市欢太科技有限公司 User portrait construction method and related products
CN111523034A (en) * 2020-04-24 2020-08-11 腾讯科技(深圳)有限公司 Application processing method, device, equipment and medium
CN111523034B (en) * 2020-04-24 2023-08-18 腾讯科技(深圳)有限公司 Application processing method, device, equipment and medium

Also Published As

Publication number Publication date
CN109598525B (en) 2023-01-17

Similar Documents

Publication Publication Date Title
US20160140626A1 (en) Web page advertisement configuration and optimization with visual editor and automatic website and webpage analysis
CN109544166A (en) A kind of Risk Identification Method and device
CN107844518B (en) Method for evaluating download quantity of specified APP, data server, packaging platform and system
CN110163647A (en) A kind of data processing method and device
US20140089040A1 (en) System and Method for Customer Experience Measurement & Management
CN112751711B (en) Alarm information processing method and device, storage medium and electronic equipment
CN110362453A (en) Log statistic alarm method and device, terminal and storage medium
EP3570242A1 (en) Method and system for quantifying quality of customer experience (cx) of an application
CN110503545A (en) Loan is independently into part method, terminal device, storage medium and device
CN106713242B (en) Data request processing method and processing device
CN111368862A (en) Method for distinguishing indoor and outdoor marks, training method and device of classifier and medium
CN109598525A (en) Data processing method and device
CN113448834A (en) Buried point testing method and device, electronic equipment and storage medium
CN107430590B (en) System and method for data comparison
CN110377821A (en) Generate method, apparatus, computer equipment and the storage medium of interest tags
CN105677677A (en) Information classification and device
WO2016032531A1 (en) Improvement message based on element score
CN107818477A (en) One kind reward feedback method and device
CN109756762A (en) A kind of determination method and device of terminal class
CN108021464A (en) A kind of method and device of the processing of revealing all the details of application response data
CN111127050A (en) Content channel evaluation method and device, electronic equipment and storage medium
CN111694872B (en) Method and device for providing service handling data scheme
WO2016184318A1 (en) Barcode popularity display method and apparatus
KR20220117676A (en) Review reliability validation device and method of thereof
CN110618915B (en) Method and equipment for cluster deployment decision power evaluation tool and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100080 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant