WO2021034320A1 - Optimisation d'analyse de données à grande échelle - Google Patents

Optimisation d'analyse de données à grande échelle Download PDF

Info

Publication number
WO2021034320A1
WO2021034320A1 PCT/US2019/047393 US2019047393W WO2021034320A1 WO 2021034320 A1 WO2021034320 A1 WO 2021034320A1 US 2019047393 W US2019047393 W US 2019047393W WO 2021034320 A1 WO2021034320 A1 WO 2021034320A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
register
determining
registers
hashed parameter
Prior art date
Application number
PCT/US2019/047393
Other languages
English (en)
Inventor
Evgeny SKVORTSOV
Jeffrey Wilhelm
Yip Man TSANG
William George Kahn BRADBURY
Andreas Ulbrich
Zhaosheng BAO
Stuart Kendrick HARRELL
Original Assignee
Google Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Llc filed Critical Google Llc
Priority to KR1020207022602A priority Critical patent/KR20210023795A/ko
Priority to CN201980011648.XA priority patent/CN112771512A/zh
Priority to JP2020542129A priority patent/JP7098735B2/ja
Priority to PCT/US2019/047393 priority patent/WO2021034320A1/fr
Priority to US16/960,817 priority patent/US11768752B2/en
Priority to EP19765368.6A priority patent/EP3799638A1/fr
Publication of WO2021034320A1 publication Critical patent/WO2021034320A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3404Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for parallel or distributed programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3442Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/465Distributed object oriented systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute

Definitions

  • This specification generally relates to computing processes for resource and space efficient storage and analysis of large scale datasets.
  • Collecting and analyzing data about different objects in a digital environment can be beneficial to providers of content, products, and/or services.
  • providers can aggregate data for numerous (e.g., millions or billions) objects to, for example, improve the provider’s services and/or improve a user online experience.
  • providers may aggregate the data for components or resources of a server farm to determine how frequently components of the server farm are failing (or operating in a certain manner).
  • providers may aggregate the data about several devices interacting with certain content to determine how frequently these devices interact with the content.
  • one innovative aspect of the subject matter described in this specification can be embodied in methods that include the operations of obtaining activity data for a plurality of objects in a dataset, wherein each object in the dataset performs activities in a digital environment and the activity data represents the activities; for each data item in the dataset: generating, using an identifier for an object specified in the data item, a hashed parameter for the object, wherein the hashed parameter has a binary representation; identifying a register from among a set of registers based on the binary' representation of the hashed parameter, wherein each register in the set of registers is used to store data about objects in the dataset; determining, based on the binary' representation of the hashed parameter, that the hashed parameter for the object contributes to an aggregation amount that specifies a number of occurrences of the object in the dataset; and in response to determining that the hashed parameter for the object contributes to the aggregation amount, updating the aggregation amount stored in the register; and
  • inventions of this aspect include corresponding systems, devices, apparatus, and computer programs configured to perform the actions of the methods.
  • the computer programs e.g., instructions
  • each object represents a user; and an aggregation amount represents a frequency value.
  • identifying a register from among a set of registers based on the binary representation of the hashed parameter comprises: identifying a first portion of the binary representation of the hashed parameter; and identifying the register using the first portion of the binary representation of the hashed parameter.
  • each register in the set of registers comprises a data structure that stores data about a received hashed parameter, wherein the data structure includes: a first field for storing data specifying a number of leading zeroes in a second portion of the received hashed parameter, a second field for storing data specifying trailing bits in a second portion of the received hashed parameter; and a third field for storing data specifying an aggregation amount that indicates a number of occurrences when (i) an existing data value in the first field matches the number of leading zeroes and (ii) an existing data value in the second field matches the trailing bits
  • determining, based on the binary representation of the hashed parameter, that the hashed parameter for the object contributes to an aggregation amount comprises: determining a number of leading zeros from the second portion of the binary representation of the hashed parameter; determining trailing bits from the second portion of the binary representation of the hashed parameter; and determining, based on the number of leading zeros and the trailing bits, that the hashed parameter impacts an existing data value stored in the third field of the data structure of the register.
  • determining, based on the number of leading zeros and the maximum number of trailing bits, that the hashed parameter impacts an existing data value stored in the third field of the data structure of the register comprises: determining that the existing data value stored in the first field of the data structure of the register is the same as the number of leading zeros, and determining that the existing data value stored in the second field of the data structure of the register is the same as the maximum number of trailing bits.
  • updating the aggregation amount stored in the register comprises incrementing the existing data value stored in the third field of the data structure of the register by one.
  • generating, based on aggregate amounts stored in the set of registers, a reporting output that indicates a set of data items, wherein each data item identifies an estimated number of objects in the dataset that performed activities in the digital environment at a particular aggregation amount comprises: identifying a set of unique aggregate amounts based on aggregation amounts stored in the set of registers; for each particular aggregation amount in the set of aggregation amounts, determining an estimated number of objects of the dataset that performed activities at the particular aggregation amount, the determining includes: determining a number of registers storing an aggregation amount that matches the particular aggregation amount; adjusting the number of registers storing the aggregation amount that matches the particular aggregation amount based on a hash collision correction factor; determining an average number of object stored in each register of the set of registers; and scaling the adjusted number of registers by the average number of objects
  • HLL HyperLogLog
  • activity data as further described below
  • Conventional methods require substantially more computing and storage resources than those required by techniques and/or systems described in this specification, which is especially the case when performing these operations on large datasets.
  • the techniques and/or systems in this specification require substantially less storage and can perform more time and resource efficient processing of large datasets to determine a frequency distribution of the objects in the dataset based on the objects’ activity data.
  • Figure is a block diagram of an example computing system for computing information for a dataset.
  • Figure 2 is a flowchart of an example process for computing aggregate distributions based on activity data for objects in a dataset.
  • Figure 3 is a block diagram of a computing system that can be used in connection with methods described in this specification.
  • This specification describes techniques for using a probabilistic cardinality estimator, such as a HyperLogLog data structure, for providing a distribution of objects in a dataset across different aggregate values (e.g., frequencies) based on the activity data for the objects.
  • a probabilistic cardinality estimator such as a HyperLogLog data structure
  • the techniques described in this specification enhance conventional HyperLogLog (HLL) data structures in a manner that enables computing such aggregate (e.g., frequency) distributions, which is not possible using the conventional HLL data structures.
  • HLL HyperLogLog
  • An object can be an entity, resource, or component, such as users, spam events, system components, digital assets, etc.
  • Each object in the dataset is associated with or performs certain activities in a digital environment and the activity data in the dataset represents the activities of the objects.
  • This can include, for example, data describing device interactions with certain digital assets (e.g., portions of content), such as which users clicked on, viewed, or otherwise interacted with a content for a particular digital campaign.
  • the activity data can include log data about hardware/component events (e.g., failures, resets, outages, network calls, memory access, or other events) in a network environment.
  • the conventional HLL data structure can be used to measure or estimate the number of unique objects in a large dataset (i.e., the cardinality of the dataset).
  • the conventional HLL data structures cannot determine an aggregate distribution of the objects based on the activity data of the objects.
  • this data structure cannot be used to determine a distribution of the number of users who have viewed the content at particular frequencies (e.g., one time, two times, three times, etc ).
  • Examples of such aggregation counters can include, among others, (1) a frequency counter that counts the number of occurrences of the object in the dataset, (2) a counter that counts the most recent timestamp at which a particular event was recorded at, and (3) a counter that counts counting the number of times an error code was observed at each error logging levels.
  • An HLL data engine assigns objects in the dataset to a set of M registers.
  • the object s unique identifier (as further described below) is hashed using a hash function to generate a hashed parameter (as further described below) that has a binary representation.
  • the HLL data engine uses a certain number of bits (e.g., the first four bits) of the hashed parameter to assign the object to one of the M registers.
  • the HLL data engine determines an aggregate number of times that the object has been associated with or performed a certain activity. As described below and in greater detail throughout this specification, the HLL data engine accomplishes this by evaluating whether the remaining bits of the hashed parameter (i.e., the bits other than those that were used to identify the register) contribute to an aggregation amount, e.g , that specifies a number of occurrences of the object in the dataset.
  • an aggregation amount e.g , that specifies a number of occurrences of the object in the dataset.
  • the HLL data engine determines the number of leading zeros (which also represents the bit position of the most significant non-zero bit) for the remaining bits of the hashed parameter. If the number of leading zeros is the same as the value stored in the first field of the register, the HLL data engine determines a set of trailing bits for the previously determined most significant bit (or another appropriate stable identifier, as described above). If the determined trailing bits are the same as the value stored in the trailing r bits field of the register, the HLL data engine determines that the current object is the same as the object for which data is already stored in the register. As a result, the HLL data engine updates the aggregation counter field of the register, e.g , by incrementing the value stored in that field by one or by performing another appropriate commutative reduction operation.
  • the HLL data engine can determine the number of objects in the dataset that occurred at and/or above a certain aggregate value (e.g., frequency). The HLL data engine computes this value by scaling the number of registers (e.g., adjusted to account for any hash collisions) for which the aggregation counter was set to a certain aggregate value by the average number of objects per register.
  • a certain aggregate value e.g., frequency
  • a user may be provided with controls allowing the user to make an election as to both if and when systems, programs, or features described herein may enable collection of information (e.g., information about a user’s social network, social actions, or activities, profession, a user's preferences, or a user’s current location), and if the user is sent content or communications from a server.
  • information e.g., information about a user’s social network, social actions, or activities, profession, a user's preferences, or a user’s current location
  • certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed.
  • a users identity may be treated so that no personally identifiable information can be determined for the user, or a user's geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined.
  • location information such as to a city, ZIP code, or state level
  • the user may have control over what information is collected about the user, how that information is used, and what information is provided to the user.
  • FIG. 1 is a block diagram of an example computing system 100 for computing information for a dataset.
  • System 100 generally includes a computing server 102, a HLL data engine 104, a data storage device 130, and a data ingest engine 140.
  • the system 100 includes special-purpose hardware circuitry configured to execute specific computational rales that measure or estimate the aggregate distribution of the objects in a dataset based on the activity data for the objects. These techniques can be applied to various applications.
  • the techniques described in this specification can be used in digital campaign reach assessment, which includes generating data describing a distribution of users that have interacted with a particular campaign at different frequencies, e.g., how many unique users interacted (e.g., viewed, clicked on, etc.) with a digital content once, twice, thrice, etc.
  • the techniques described in this specification can be used to analyze hardware/component failures in a large scale network environment, which includes generating statistics about how frequently certain components or computing devices fail in the network environment. It will be understood that the techniques described in this specification may be used in other applications as well.
  • the system 100 includes a computing server 102, which is configured to use a HyperLogLog (HLL) data engine 104 to determine an aggregate distribution of objects in a dataset based on their activity levels.
  • HLL HyperLogLog
  • the term engine refers to a data processing apparatus that performs a set of tasks.
  • the HLL data engine 104 is included within computing server 102 as a sub- system of hardware circuits (e.g., special-purpose circuitry) that includes one or more processor microchips.
  • computing server 102 can include processors (e.g., central or graphics processing units), memory, and data storage devices 106 that collectively form computer systems of computing server 102. Processors of these computer systems process instructions for execution by server 102, including instructions stored in the memory or on the dataset storage device 106 to display graphical information for output at an example display monitor of system 100.
  • execution of the stored instructions causes one or more of the actions described in this specification to be performed by the computing server 102 or the HLL data engine 104.
  • multiple processors may be used, as appropriate, along with multiple memories and types of memory.
  • computing server 102 may be connected with multiple other computing devices, with each device (e.g., a server bank, groups of servers, modules, or a multi -processor system) performing portions of the actions, operations, or logical flows described in this specification.
  • System 100 can receive, via data ingest engine 140, a dataset including activity data for a plurality of objects in a digital environment.
  • the received dataset is provided to the FILL data engine 104 of the computing server 102
  • the HLL data engine 104 uses the logic engine 116, including the hashing logic 108, the leading zero logic 110, and the register ID logic 112, to store the data of the dataset in a set of M registers 125 in memory 106.
  • Data ingest engine 140 also receives queries, which request data about the number of objects in the dataset that are associated with or otherwise performed activities in the digital environment at particular frequencies. For example, a query 150 can request data about the number of unique users in the dataset that viewed, accessed, or otherwise interacted with content a certain number of times (e.g., one time, two times, three times, etc.). The data ingest engine 140 sends the query 150 to the computing server 102, which in turn uses the HLL data engine 104 (and in particular, the reporting logic 114) to determine the number of distinct users in a dataset and their distribution across different frequencies based on their activity data.
  • queries which request data about the number of objects in the dataset that are associated with or otherwise performed activities in the digital environment at particular frequencies.
  • a query 150 can request data about the number of unique users in the dataset that viewed, accessed, or otherwise interacted with content a certain number of times (e.g., one time, two times, three times, etc.).
  • the data ingest engine 140 sends
  • the HLL data engine 104 then, alone or in combination with a front end engine of the computing server 102, provides the determined distribution data as reporting output 180.
  • the reporting output 180 can be statistics in the form of text or a visual representation (e.g., a histogram, a pie chart, etc.) showing the number of users who are associated with or otherwise performed certain activities at different frequencies, e.g., one time, two times, etc.
  • the reporting output 180 may be in the form of a data structure that can be processed by computing server 102 or by another computing device.
  • FIG. 2 is a flowchart of an example process 200 for computing aggregate distributions based on activity data for objects in a dataset.
  • Process 200 can be implemented or executed using computing resources of system 100, and in particular the HLL data engine 104, described above. Operations of the process 200 are described below for illustration purposes only. Operations of the process 200 can be performed by any appropriate device or system, e.g., any appropriate data processing apparatus. Operations of the process 200 can also be implemented as programmed instructions stored on a non-transitory computer readable medium (such as the memory and/or data storage device 130, described with reference to Figure 1) and executed by at least one processor of the computing server 102.
  • the data ingest engine 140 obtains activity data for a plurality of objects in a dataset (at 202).
  • the data ingest engine 140 receives data logs specifying the activity data in a digital environment for objects in a dataset (wherein each object can occur one or more times in the dataset).
  • the data logs can include separate fields (or delimiters that can be used to delineate different data items) corresponding to an object identifier for the object and the corresponding activity data for the object.
  • the object identifier is a value (e.g., a number, alphanumeric string, data structure) that uniquely identifies a particular object in the dataset.
  • the object identifier is a byte (e.g., eight bits), while in other implementations the object identifier is a data word formed by, e.g., 12 bits, 16 bits, 32 bits, or 64 bits. In some cases, a variable number of bits can be used to form the object identifier, such as more than 64 bits or fewer than 64 bits.
  • the data ingest engine 140 sends the received dataset to the HLL data engine 104 of the computing server 102. For each data item in the dataset, the process 200 then performs the operations 204, 206, 208, and 210, which are further described below. As a result of performing these operations, the process 200 accumulates an aggregate distribution of objects in the data set based on the activity data associated with or performed by these objects.
  • the HLL data engine 104 generates a hashed parameter 128 for the object using the hashing logic 108 (at 204).
  • the hashing logic 108 applies one or more hash functions (which may include any conventional hash function/s) to the object identifier for the object to generate the hashed parameter (which may also be referred to as a hash, hash code, or hash value).
  • the hashed parameter has a binary representation whose length is dependent upon the hash function itself or the parameters of the hash function.
  • the hash of object identifier for the object is indicated as the hashed parameter 128, as shown in Figure 1.
  • the HLL data engine 104 identifies a register from among a set of registers that can be used to store data about the object (at 206).
  • data for a dataset can be stored in a set of M registers 125.
  • the register ID logic 1 12 identifies one of the M registers that can be used to store data about the object. For example, for the hashed parameter 128 (0001 0101 0100), the register ID logic 112 can uses the first four bits (0001) to identify one of the M registers. It will be appreciated that the number of registers 125 is less than the number of data items in the dataset.
  • the HLL data engine 104 determines whether the hashed parameter contributes to a frequency amount (at 208).
  • the hashing logic 108 identifies a second portion of the hashed parameter 128, which includes the bits of the hashed parameter without the first set of hits that are used to identify the appropriate register (as described above at operation 206)
  • the bits (0101 0100) do not include the first four hits that are used by the register ID logic 112 to identify the appropriate register (as described in the preceding paragraph).
  • the leading zero logic 1 10 determines the number of leading zeros (which also represents the bit position of the most significant non-zero bit) in the second portion or set of bits. In some implementations, the leading zero logic 110 determines the number of leading zeros by counting the number of zeros, from left to right, in the second set of bits until the bit position of the first “1” in the second set of bits is identified. For example, the number of leading zeros for the second set of bits (0101 0100) of the hashed parameter 128 is one because, when counting from left to right, one zero is identified before the first “1” is encountered.
  • the HLL data engine 104 determines the number of trailing bits for the most significant bit in the second set of bits, as identified in the previous paragraph.
  • the HLL data engine 110 determines the trailing bits by identifying all the bits in the second set of bits after the most significant bit, which is the location where the first “1” is identified when counting from left to right (as described in the preceding paragraph). For example, the trailing bits in the second set of bits (0101 0100) is “010100” because these are the bits that follow the first “1” that was identified when counting the leading zeros for the second set of bits.
  • each register in the set of M registers 125 includes a data structure 120 that has three fields: a field for the most significant bit 122, a field for the trailing p bits 124, and a field for the aggregation counter 126.
  • field 124 instead of storing the trailing p bits, stores any number of trailing bits for the most significant bit in the second set of bits or alternatively, a stable identifier for the object, such as a separate hash value made up of p bits.
  • the total amount of information stored in each register may only be two bytes (or 16 bits).
  • the standard HLL algorithm which only stores the number of leading zeros in each register, generally required six bits of data.
  • the HLL registers described in this specification can store additional data about objects in the dataset with only a marginal increase in storage requirement per register (as compared with storing the entirety of the activity data for objects in the dataset, which would require much more than two bytes of storage space).
  • the aggregation counter field 126 stores the frequency amount, which specifies a number of occurrences of the object in the dataset.
  • the object’s hashed parameter contributes to the aggregation amount based on a comparison of the number of leading zeros and the trailing bits of the hashed parameter (as determined by the HLL data engine 104) with the values stored in the most significant bit field 122 and the trailing p bits field 124 of the data structure 120 in the register (identified in operation 206), respectively.
  • the aggregation counter field 126 can aggregate information about objects with the same key (e g., counting the most recent timestamp that a particular event was recorded at, counting the number of times an error code was observed at each error logging levels, etc.).
  • the leading zero logic 110 When the number of leading zeros determined by the leading zero logic 110 is less than the value stored in field 122, the leading zero logic 110 does not update the data structure 120. In other words, the existing values in fields 122, 124, and 126 are retained. Because this operation does not result in updating the aggregation counter field 126, the object ’ s hashed parameter does not contribute to the aggregation (e.g., frequency) amount.
  • the leading zero logic 110 updates field 122 with the value of the most si gni ficant bit determined by the leading zero logic 110. In such instances, the HLL data engine 104 also (1) updates the value stored in field 124 with the trailing bits value calculated by the HLL data engine 104 and (2) resets the value stored in field 126 to zero.
  • the leading zero logic 110 does not update the value stored in the field 122.
  • the HLL data engine 104 also determines whether to update the values stored in the fields 124 and 126. As further described below, it does so by comparing the trailing bits determined by the HLL data engine 104 with the value stored in the trailing r bits field 124 of the data structure 120
  • the HLL data engine 104 (1) updates the field 124 with the value of the trailing hits determined by the HLL data engine 104 and (2) resets the value of the aggregation counter field 126 to zero.
  • the HLL data engine engine 104 retains (i.e., does not update) the values stored in fields 122, 124, and 126.
  • the HLL data engine 104 determines that the current object is the same as the object for which data is already stored in the data structure 120.
  • the HLL data engine 104 (1) does not update the value already stored in the trailing p bits field 124 and (2) updates the value stored in the aggregation counter field 126 based on the commutative reduction function involving the current value of the field and the object (at 210)
  • the aggregation counter field 126 is a frequency counter
  • the HLL data engine updates the value in this field by incrementing the value stored in this field 126 by one (e.g., if the value stored in the aggregation counter field 106 is 2, the HLL data engine 104 increments that value by one, which results in a value of 3)
  • the HLL data engine 104 uses the commutative reduction function to appropriate scale (e.g., multiplying, dividing, incrementing by more than one, etc.) the value in the field 126.
  • the HLL data engine 104 performs operations 206, 208, and 210 based on the single hash representation generated for the object at operation 204.
  • the HLL data engine 104 can perform operations 206, 208, and 210 using separate hash representations.
  • the hashing logic 108 can use the object identifier to generate separate hash representations: one hash representation can be used to identify the appropriate register in the set of M registers 125, a second hash representation from which the number of leading zeros are determined, and a third hash representation from which the trailing bits are determined. The above described operations 206 to 210 can then be performed using these separate hash representations.
  • the data ingest engine 140 receives a query 150 requesting an aggregation distribution of the number of objects in the dataset that performed activities in the digital environment at different frequencies (at 212).
  • the query 150 can request a frequency distribution of the number of users in a dataset that interacted with certain digital content at different frequencies (one time, two times, three times, etc.).
  • the data ingest engine 140 sends the query 150 to the computing server 102, which in turn routes the query 150 to the reporting logic 114 of the logic engine 116.
  • the reporting logic 114 In response to the query 150, the reporting logic 114 generates a reporting output that represents an aggregate distribution of the objects in the dataset based on the associated activities or activities performed by these objects in the digital environment (at 212). The reporting logic 114 estimate the aggregate distribution based on the aggregate value stored in the registers 125. The reporting logic 114 generates this reporting output by performing the following operations. In some implementations, the reporting logic 114 determines the different possible aggregate values by identifying a set of values including the unique aggregate values stored in aggregation counter field 126 in the set of registers 125. In some implementations, the query 150 may identify the aggregate values, in which case, the reporting logic 114 can skip the operation of identifying the different possible aggregate values stored in field 126 of the registers. In some implementations, the reporting logic 114 may access a set of aggregate values specified by an administrator of the system (and stored in the data storage device 130), in which case, the reporting logic 114 can skip the operation of identifying the different possible aggregate values stored in the registers.
  • the reporting logic 114 determines a number of registers that have the same value stored in the aggregation counter field 126 as the identified aggregate value. In such implementations, the reporting logic 114 counts all registers for which the value in the aggregation counter field 126 is the same as the identified aggregate value. In other implementations, the reporting logic 114 counts all registers for which the value in the aggregation counter field 126 is the same as or greater than the identified aggregate value.
  • hash collisions may arise when storing and updating values in the data structure 120 of the registers 125.
  • two object identifiers for two different objects in the dataset when hashed by the hashing logic 108, may update the same register and may have the same number of leading zeros and the same trailing bits.
  • the value of this field should only be incremented by one in this scenario; however, because of the hash collision, value of this field 126 is instead incorrectly incremented by two.
  • the aggregation counter field 126 may incorrectly reflect that a single object interacted with the same content twice.
  • the reporting logic 114 counts all registers that satisfy some criteria, wiiich can be specified in the query (e.g. having more errors at one reporting level than another, or having a value between two bounds), that provides a function to map the value in field 126 to a boolean (e.g., include in the count or not).
  • the reporting logic 114 obtains the count of registers for which the value in the aggregation counter field 126 is the same as or greater than the identified aggregate value and then adjusts (e.g., reduces) this count by a correction factor.
  • the correction factor (also referred to as a hash collision correction factor), F, can be represented by F(C, M, n)), and estimates the number of hash collisions expected at the identified aggregate value (n) for a number of distinct objects (C) in the dataset that have performed or are associated with certain activity, which are stored in the set of M registers 125.
  • the number of distinct elements that have performed or are associated with certain activity is determined using the standard HLL algorithm.
  • the correction factor is based on empirically determined lookup table of reduction values indexed by C, M, and f.
  • the reporting logic 114 scales (e.g., multiplies) the adjusted number of regi sters (as determined in the previous paragraph) at the particular aggregate value by the average number of objects per register.
  • the average number of objects per register is determined by dividing the cardinality of the dataset C (as determined using the standard HLL algorithm) by M, which is the number of registers 125.
  • the reporting logic 114 repeats the above operations for each identified frequency.
  • the total number of object at a particular aggregate value can be represented using the following equation: where (1) Rn is the number of objects at a particular aggregate value n, (2) Bn is the number of buckets with the aggregation counter field set to n, (3) C is the cardinality of the dataset,
  • reporting logic 114 sends the identified frequencies and the corresponding number of determined objects to a front end engine of the computing server 102, which uses these values to generate a report, e.g., reporting output 180, that is provided to the entity from which the query 150 was received.
  • the front end engine can use the values provided by the reporting logic 114 to generate stati stics that include a set of data items, in which each data item identifies an estimated number of objects in the dataset that is associated with or performed activities in the digital environment at a particular frequency.
  • These statistics can be in the form of text and/or visuals (e.g., a histogram, a pie chart, etc.) on the reporting output 180, and show the distribution of the number of objects at different frequencies based on the activity data of the objects
  • Fig. 3 is a block diagram of computing devices 300, 350 that may be used to implement the systems and methods described in this document, either as a client or as a server or plurality of servers.
  • Computing device 300 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
  • Computing device 350 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, smartwatches, head-wom devices, and other similar computing devices.
  • the components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations described and/or claimed in this document.
  • Computing device 300 includes a processor 302, memory 304, a storage device 306, a high-speed interface 308 connecting to memory 304 and high-speed expansion ports 310, and a low speed interface 312 connecting to low speed bus 314 and storage device 306.
  • processor 302 memory 304
  • storage device 306 storage device 306
  • high-speed interface 308 connecting to memory 304 and high-speed expansion ports 310
  • low speed interface 312 connecting to low speed bus 314 and storage device 306.
  • Each of the components 302, 304, 306, 308, 310, and 312 are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 302 can process instructions for execution within the computing device 300, including instructions stored in the memory 304 or on the storage device 306 to display graphical information for a GUI on an external input/output device, such as display 316 coupled to high speed interface 308
  • multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
  • multiple computing devices 300 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
  • the memory 304 stores information within the computing device 300.
  • the memory 304 is a computer-readable medium.
  • the memory 304 is a volatile memory unit or units.
  • the memory 304 is a non-volatile memory unit or units.
  • the storage device 306 is capable of providing mass storage for the computing device 300
  • the storage device 306 is a computer-readable medium.
  • the storage device 306 may be a hard disk device, an optical disk device, or a tape device, a flash memory' or other similar solid state memory' device, or an array of devices, including devices in a storage area network or other configurations.
  • a computer program product is tangibly embodied in an information carrier.
  • the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 304, the storage device 306, or memory on processor 302.
  • the high-speed controller 308 manages bandwidth-intensive operations for the computing device 300, while the low speed controller 312 manages lower bandwidth- intensive operations. Such allocation of duties is exemplary only.
  • the high-speed controller 308 is coupled to memory 304, display 316 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 310, which may accept various expansion cards (not shown).
  • low-speed controller 312 is coupled to storage device 306 and low-speed expansion port 314.
  • the low-speed expansion port which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
  • the computing device 300 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 320, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 324. In addition, it may be implemented in a personal computer such as a laptop computer 322.
  • components from computing device 300 may be combined with other components in a mobile device (not shown), such as device 350.
  • a mobile device not shown
  • Each of such devices may contain one or more of computing device 300, 350, and an entire system may be made up of multiple computing devices 300, 350 communicating with each other.
  • Computing device 350 includes a processor 352, memory 364, an input/output device such as a display 354, a communication interface 366, and a transceiver 368, among other components.
  • the device 350 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage.
  • a storage device such as a microdrive or other device, to provide additional storage.
  • Each of the components 350, 352, 364, 354, 366, and 368 are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 352 can process instructions for execution within the computing device 350, including instructions stored in the memory 364.
  • the processor may also include separate analog and digital processors.
  • the processor may provide, for example, for coordination of the other components of the device 350, such as control of user interfaces, applications am by device 350, and wireless communication by device 350.
  • Processor 352 may communicate with a user through control interface 358 and display interface 356 coupled to a display 354.
  • the display 354 may be, for example, a TFT LCD display or an OLED display, or other appropriate display technology.
  • the display interface 356 may comprise appropriate circuitry for driving the display 354 to present graphical and other information to a user.
  • the control interface 358 may receive commands from a user and convert them for submission to the processor 352.
  • an external interface 362 may be provided in communication with processor 352, so as to enable near area communication of device 350 with other devices.
  • External interface 362 may provide, for example, for wired communication (e.g , via a docking procedure) or for wireless communication (e.g., via Bluetooth or other such technologies).
  • the memory 364 stores information within the computing device 350.
  • the memory 364 is a computer-readable medium.
  • the memory 364 is a volatile memory unit or units.
  • the memory 364 is a non-volatile memory unit or units.
  • Expansion memory 374 may also be provided and connected to device 350 through expansion interface 372, which may include, for example, a SIMM card interface. Such expansion memory 374 may provide extra storage space for device 350, or may also store applications or other information for device 350.
  • expansion memory 374 may include instructions to carry out or supplement the processes described above, and may include secure information also.
  • expansion memory 374 may be provided as a security module for device 350, and may be programmed with instructions that permit secure use of device 350.
  • secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner.
  • the memory may include for example, flash memory and/or MRAM memory, as discussed below.
  • a computer program product is tangibly embodied in an information carrier.
  • the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 364, expansion memory 374, or memory on processor 352.
  • Device 350 may communicate wirelessly through communication interface 366, which may include digital signal processing circuitry where necessary. Communication interface 366 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDM A, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur, for example, through radio-frequency transceiver 368. In addition, short-range communication may occur, such as using a Bluetooth, WiFi, or other such transceiver (not shown). In addition, GPS receiver module 370 may provide additional wireless data to device 350, which may be used as appropriate by applications running on device 350
  • Device 350 may also communicate audibly using audio codec 360, which may receive spoken information from a user and convert it to usable digital information. Audio codec 360 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of devi ce 350 Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on device 350
  • the computing device 350 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 380. It may also be implemented as part of a smartphone 382, personal digital assistant, or other similar mobile device.
  • Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs, computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • the systems and techniques described here can be implemented on a computer having a display device, e.g , a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g , a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • the systems and techniques described here can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component such as an application server, or that includes a front end component such as a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here, or any combination of such back end, middleware, or front end components.
  • the components of the system can be interconnected by any form or medium of digital data communication such as, a communication network. Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
  • LAN local area network
  • WAN wide area network
  • the Internet the global information network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • module As used in this specification, the terms “module,” “engine,” and “component” are is intended to include, but is not limited to, one or more computers configured to execute one or more software programs that include program code that causes a processing unit(s)/device(s) of the computer to execute one or more functions.
  • computer is intended to include any data processing or computing devices/sy stems, such as a desktop computer, a laptop computer, a mainframe computer, a personal digital assistant, a server, a handheld device, a smartphone, a tablet computer, an electronic reader, or any other electronic device able to process data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne des procédés, des systèmes et un appareil, y compris des programmes d'ordinateur codés sur un support de stockage informatique, qui facilitent une analyse efficace de ressource et d'espace d'ensembles de données à grande échelle. Les procédés consistent à obtenir des données d'activité pour des objets dans un ensemble de données. Pour chaque élément de données dans l'ensemble de données, un paramètre ayant fait l'objet d'un hachage ayant une représentation binaire est généré à l'aide d'un identificateur pour l'objet. Un registre est identifié parmi un ensemble de registres sur la base du paramètre ayant fait l'objet d'un hachage. Il est déterminé que le paramètre ayant fait l'objet d'un hachage pour l'objet contribue à une quantité d'agrégation qui spécifie un nombre d'occurrences de l'objet dans l'ensemble de données. Sur la base de cette détermination, une quantité d'agrégation stockée dans le registre est mise à jour. Sur la base de quantités d'agrégation stockées dans l'ensemble de registres, une sortie de rapport est générée, laquelle fournit une distribution agrégée des objets dans l'ensemble de données sur la base des données d'activité pour les objets.
PCT/US2019/047393 2019-08-21 2019-08-21 Optimisation d'analyse de données à grande échelle WO2021034320A1 (fr)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020207022602A KR20210023795A (ko) 2019-08-21 2019-08-21 대규모 데이터 분석 최적화
CN201980011648.XA CN112771512A (zh) 2019-08-21 2019-08-21 优化大规模数据分析
JP2020542129A JP7098735B2 (ja) 2019-08-21 2019-08-21 大規模データ分析の最適化
PCT/US2019/047393 WO2021034320A1 (fr) 2019-08-21 2019-08-21 Optimisation d'analyse de données à grande échelle
US16/960,817 US11768752B2 (en) 2019-08-21 2019-08-21 Optimizing large scale data analysis
EP19765368.6A EP3799638A1 (fr) 2019-08-21 2019-08-21 Optimisation d'analyse de données à grande échelle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2019/047393 WO2021034320A1 (fr) 2019-08-21 2019-08-21 Optimisation d'analyse de données à grande échelle

Publications (1)

Publication Number Publication Date
WO2021034320A1 true WO2021034320A1 (fr) 2021-02-25

Family

ID=67874522

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/047393 WO2021034320A1 (fr) 2019-08-21 2019-08-21 Optimisation d'analyse de données à grande échelle

Country Status (6)

Country Link
US (1) US11768752B2 (fr)
EP (1) EP3799638A1 (fr)
JP (1) JP7098735B2 (fr)
KR (1) KR20210023795A (fr)
CN (1) CN112771512A (fr)
WO (1) WO2021034320A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023034218A1 (fr) * 2021-08-30 2023-03-09 The Nielsen Company (Us), Llc Procédé et système d'estimation de cardinalité d'informations

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220398220A1 (en) * 2021-06-14 2022-12-15 EMC IP Holding Company LLC Systems and methods for physical capacity estimation of logical space units

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106666A1 (en) * 2005-11-10 2007-05-10 Beckerle Michael J Computing frequency distribution for many fields in one pass in parallel
US20180268167A1 (en) * 2017-03-17 2018-09-20 Mediasift Limited Event processing system

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030040263A (ko) 2003-04-07 2003-05-22 주식회사 드림나우 인터넷 홈페이지 이용자의 행태 정보 획득 방법 및 그 장치
KR100798008B1 (ko) 2005-09-07 2008-01-24 노키아 코포레이션 빈도수 카운팅을 위한 방법 및 장치
US8316064B2 (en) * 2008-08-25 2012-11-20 Emc Corporation Method and apparatus for managing data objects of a data storage system
US10198363B2 (en) * 2015-10-23 2019-02-05 Oracle International Corporation Reducing data I/O using in-memory data structures
US10055506B2 (en) * 2014-03-18 2018-08-21 Excalibur Ip, Llc System and method for enhanced accuracy cardinality estimation
US9886301B2 (en) * 2015-05-04 2018-02-06 Strato Scale Ltd. Probabilistic deduplication-aware workload migration
US10983976B2 (en) * 2016-04-18 2021-04-20 Verizon Media Inc. Optimized full-spectrum cardinality estimation based on unified counting and ordering estimation techniques
US10009239B2 (en) * 2016-08-09 2018-06-26 Airmagnet, Inc. Method and apparatus of estimating conversation in a distributed netflow environment
US11074237B2 (en) 2017-04-14 2021-07-27 Dynatrace Llc Method and system to estimate the cardinality of sets and set operation results from single and multiple HyperLogLog sketches
US10579827B2 (en) 2017-07-24 2020-03-03 Meltwater News International Holdings Gmbh Event processing system to estimate unique user count

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106666A1 (en) * 2005-11-10 2007-05-10 Beckerle Michael J Computing frequency distribution for many fields in one pass in parallel
US20180268167A1 (en) * 2017-03-17 2018-09-20 Mediasift Limited Event processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FISCHER LORENZ ET AL: "Timely Semantics: A Study of a Stream-Based Ranking System for Entity Relationships", 24 October 2015, INTERNATIONAL CONFERENCE ON FINANCIAL CRYPTOGRAPHY AND DATA SECURITY; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER, BERLIN, HEIDELBERG, PAGE(S) 429 - 445, ISBN: 978-3-642-17318-9, XP047414449 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023034218A1 (fr) * 2021-08-30 2023-03-09 The Nielsen Company (Us), Llc Procédé et système d'estimation de cardinalité d'informations
US11847119B2 (en) 2021-08-30 2023-12-19 The Nielsen Company (Us), Llc Method and system for estimating the cardinality of information

Also Published As

Publication number Publication date
JP7098735B2 (ja) 2022-07-11
EP3799638A1 (fr) 2021-04-07
KR20210023795A (ko) 2021-03-04
US11768752B2 (en) 2023-09-26
CN112771512A (zh) 2021-05-07
JP2022500714A (ja) 2022-01-04
US20220171693A1 (en) 2022-06-02

Similar Documents

Publication Publication Date Title
US20200159702A1 (en) Method, apparatus, and computer program product for data quality analysis
US8621586B1 (en) Using baseline profiles in adaptive authentication
CN109388657B (zh) 数据处理方法、装置、计算机设备及存储介质
US10664481B2 (en) Computer system programmed to identify common subsequences in logs
US11768752B2 (en) Optimizing large scale data analysis
US20210240860A1 (en) Index creation for data records
US11693842B2 (en) Generating compact data structures for monitoring data processing performance across high scale network infrastructures
US20230144763A1 (en) Differentially Private Frequency Deduplication
CN116137908A (zh) 动态确定端到端链路的信任级别
US10313209B2 (en) System and method to sample a large data set of network traffic records
US20190109871A1 (en) Techniques for computing an overall trust score for a domain based upon trust scores provided by users
CN116783588A (zh) 用于大元数据管理的列技术
US20220172086A1 (en) System and method for providing unsupervised model health monitoring
WO2018054352A1 (fr) Procédé de détermination d'ensemble d'éléments, appareil, dispositif de traitement et support de stockage
US8788506B1 (en) Methods and systems for estimating a count of unique items
US10983888B1 (en) System and method for generating dynamic sparse exponential histograms
US20170300529A1 (en) Optimized full-spectrum order statistics-based cardinality estimation
US11593014B2 (en) System and method for approximating replication completion time
US10972353B1 (en) Identifying change windows for performing maintenance on a service
JP6845344B2 (ja) データ漏洩リスクの評価
EP3682343A1 (fr) Optimisation d'analyse de données à grande échelle
CN116894229A (zh) 一种同类多数据源融合方法、装置、设备及存储介质
US20230061914A1 (en) Rule based machine learning for precise fraud detection
US8700868B1 (en) Methods and systems for incrementing a logarithmic count
US20240020172A1 (en) Preventing jitter in high performance computing systems

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019765368

Country of ref document: EP

Effective date: 20200723

ENP Entry into the national phase

Ref document number: 2020542129

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE