GB2513720A - Computer-implemented systems and methods for comparing and associating objects - Google Patents

Computer-implemented systems and methods for comparing and associating objects Download PDF

Info

Publication number
GB2513720A
GB2513720A GB1404486.1A GB201404486A GB2513720A GB 2513720 A GB2513720 A GB 2513720A GB 201404486 A GB201404486 A GB 201404486A GB 2513720 A GB2513720 A GB 2513720A
Authority
GB
United Kingdom
Prior art keywords
objects
properties
slug
bloom filter
multimap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1404486.1A
Other versions
GB201404486D0 (en
Inventor
Mark Elliot
Allen Chang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Palantir Technologies Inc
Original Assignee
Palantir Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/099,661 external-priority patent/US8924388B2/en
Application filed by Palantir Technologies Inc filed Critical Palantir Technologies Inc
Publication of GB201404486D0 publication Critical patent/GB201404486D0/en
Publication of GB2513720A publication Critical patent/GB2513720A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

A method for associating a first object with one or more objects within a plurality of objects, each object comprising a first and second plurality of properties, each property comprising data reflecting a characteristic of an entity represented by the object, the associated objects comprising matching data in corresponding properties for the second plurality of properties. The method comprises executing, for each object within the plurality of objects and for the first object, the following: creating a slug for the object, the slug comprising the second plurality of properties from the object; and inputting the slug for the object into a Bloom filter, preferably a counting Bloom filter. Further, the method may include creating for a bin within the Bloom filter corresponding to the slug for the first object, an association between objects whose slugs correspond to the bin if the slugs for those objects match. Identifying a match is preferably done by using a multimap. This allows for greater computational throughput and acceptable memory consumption without a reduction in comparison accuracy and for dataset sizes that were previously impractical or impossible at acceptable levels of computational throughput.

Description

Computer-Implemented Systems And Methods For Comparing And Associating Objects
Field
This relates to comparing and associating objects and in particular although not exclusively to object comparators that compare objects in a way that allows for greater computational throughput and acceptable memory consumption without a reduction in comparison accuracy and for dataset sizes that were previously impractical or impossible at acceptable levels of computational throughput.
Background
Numerous organizations, induding industry and government entities, recognize that important conclusions can be drawn if massive data sets can be analyzed to identify patterns of behaviour that suggest dangers to public safety or evidence illegality. These analyses often involve matching data associated with a person or thing of interest with other data associated with the same person or thing to determine that the same person or thing has been involved in multiple acts that raise safety or criminal concerns.
Yet, the quality of the analytical result arising from use of sophisticated analytical tools can be limited by the quality of data the tool utilizes. For certain types of analyses, an acceptable error rate must be literally or nearly zero for an analytical conclusion drawn from the data to be sound. Achieving this zero or near-zero error rate for datasets comprising tens or hundreds of millions of records can be problematic. Present data comparison tools are not well suited to solve these issues.
The issues discussed above are particularly acute for analyses involving data related to identifying persons or things for inquiries relating to public safety. For example, analytical tools for identifying potential safety threats generally do not have an acceptable error rate greater than zero because the cost of mistakenly identifying the presence of a safety threat (i.e., a "false positive") or allowing a safety threat to go undetected (i.e., a "false negative") is unacceptably high. Therefore, tools supporting public safety must correctly relate data associated with persons or things of interest with other data related to the same person or thing.
Some todls exist for accurately comparing data, bnt they are computationally impractical to use with datasets containing millions of records. For example, one solution to determining whether two particular objects are associated with the same person or thing of interest is to compare each element of one object to a corresponding element in the second object. For example, for objects containing M elements, a first element in the first object may be compared to a corresponding first e'ement in the second object, and corresponding comparisons maybe made for each of the remaining M-i elements common to the first and second objects. If the elements within each object are collectiv&y adequate to uniquely identify the represented person or thing jo with certainty, and corresponding elements within the first and second objects match, a conclusion may reasonably be drawn that the objects reflect the same person or thing.
As an alternative, each object could be converted (serialized) into a single string reflecting the contents of each element to be compared. Thereafter, a string generated from one object could be compared to a string generated from another object as a form of object comparison.
For certain datasets, the above approaches may consume little memory or system resources, because the objects or their serialized strings can be stored on disk rather than in main memory. However, the above approaches may quickly become impractical with large or non-trivial datasets. As the number of objects to compare increases, the number of comparisons and thus the processing time of the comparisons increases exponentially; i.e., proportional to n2/2, where n represents the number of objects to be compared. Thus, a comparison of 500 objects using a serialized approach, whose processing time may be approximated as the time to perform 125,000 string comparisons, may be computationally tractable. However, a comparison of 100 million (looM) records using that approach, whose processing time may be approximated as the time to perform 5 quadrillion (ei) string comparisons, may be computationally intractaNe. Additionafly, reading strings from disk rather than reading them from mcmory may add additional proccssing timc. 3°
Another solution for identifying matching objects within a corpus of objects is to store each object in a multimap. This multimap is an associative array that stores multip'e values for each key. Importing the objects into the multimap leads to objects with the same element data being stored in a sing'e entry of the mukimap. Thus, use of a multimap associates identical objects.
One drawback to using a multimap for object comparisons is that the mukimap is typically stored in main memory, due to algorithmic considerations related to key organization within the multimap, so an object comparator must have sufficient main memolyto hold a multimap comprising the entire corpus in memory. Therefore, a multimap solution can be impractical for datasets at or above looM objects. Similar drawbacks exist to each approach as applied to other object comparison problems, such as efficiently identifying unique objects within a corpus of objects and efficiently comparing a single object to all objects within a corpus of object.
jo Neither solution is viable for datasets approaching or exceeding iooM objects. Yet, object datasets comprising iooM or more objects are not uncommon today. Therefore, the problems described above are quite real and a need exists for improved object comparators.
Brief Description Of The Drawings
Reference win now be made to the accompanying drawings showing example embodiments of the present application, and in which: FIC.i illustrates a flowchart of an exemplary process for comparing a target object to at least some objects in a corpus, consistent with some embodiments of the present
disclosure.
FIC.2 illustrates a flowchart of an exemplary process for comparing all objects in a corpus to all other objects in the corpus, to determine matches within the corpus, consistent with some embodiments of the present disclosure.
FIG.3 illustrates a flowchart of an exemplary process for comparing all objects in a corpus to all other objects in the corpus, to determine unique objects within the corpus, consistent with somc cmbodimcnts of the prcscnt disclosurc.
FTG. 4 illustrates an exemplary computing environment within which embodiments of
the present disclosure can be implemented.
Detailed Description Of Exemplary Embodiments
Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings. Whenever possible, consistent reference numbers win be used throughout the drawings to refer to the same or like parts.
Embodiments of the present disclosure can avoid the shortcomings of traditional object comparators by providing computer-implemented systems and methods for comparing objects in a way that allows for greater computational throughput and acceptable memory consumption without a reduction in comparison accuracy and for dataset sizes that were previously impractical or impossible at acceptable levels of computational Jo throughput.
Embodiments of the present disclosure address a class of computational problems related to object comparison. One member of this class involves efficient object comparison of a particular object to a corpus of objects. Another member of this class involves efficient comparison of each object in a corpus to all other objects in the corpus. An additional member of this dass involves efficient identification of unique objects within a corpus of objects.
The following detailed description begins with a general overview of object comparison.
Some examples of objects to be compared or analyzed are provided. The description then explains an exemplary embodiment that addresses the first class of problem discussed above (i.e., efficiently comparing one object to all objects in a corpus). The description then expands the solution to the first class of problem to address the second class of problem discussed above (i.e., efficient comparison of each object in a corpus to all other objects in the corpus). The detailed description then discloses a solution to the third class of problem (i.e., efficient identification of unique objects within a corpus of objects). An introduction to objects and an overview of object comparison follows.
Scvcral typcs of objccts cxist within thc field of computcr science. Onc typc of objcct that is well known within the field of computer science is an object in the object-oriented sense. Wikipedia describes an object of this type as a set of elements (i.e., data structures) and methods, which are similar to functions. Without necessarily endorsing that rather simplistic description, embodiments implementing the object comparison solutions discussed herein are compatible with comparing objects of this type.
Another type of object within the field of computer science field is a data structure that reflects the properties of a person or thing relevant to a particular task or data processing environment. In some embodiments, these properties are reflected by strings. Tn other embodiments, properties maybe reflected by strings, integers, real numbers, times or dates, binary values, structures in the C programming sense, enumerated variables, and/or other forms of data. In some embodiments, properties within either type of object may be converted to strings prior to comparison. In other embodiments, some properties may be strings or may be converted to strings while other properties may not be strings and may not be converted to strings. The jo embodiments of the present disclosure may operate on string or non-string properties.
Moreover, the notion of a "data structure" is very flexible in this context. The term "data structure" can reflect any type of structured data, from information stored in a database (with table columns reflecting elements within an object or data structure and table rows reflecting instances of the object or data structure) to formatted text in a text file (such as data within an XML structure) to data stored within an executing computer program. Accordingly, because a data structure broadly encompasses the types of structured data described above, objects also broadly encompass these types of structured data. Moreover, the object comparison solutions discussed herein are also compatible with comparing objects of these types.
In some embodiments, effective object comparison involves considering which properties of the objects to be compared are relevant to performing the comparison because the entities (e.g., persons or things) reflected by those objects may have different relevant properties in different environments. For example, an object can store properties of an automobile that may be relevant to a state's motor vehicle department by storing the following information: vehicle identification number (YIN), year of manufacture, make, model, expiration date of the vehicle's registration, and a dircct or indircct indication of thc pcrson that owns thc vchiclc.
For automobiles being sold on an auction website such as eBay, however, the relevant properties of an automobile may differ from those relevant to the state's motor vehicle department. For example, a data structure for storing properties of an automobile listed for sale on eBay may include: \IN, year, make, model, odometer reading, condition of the automobile, minimum auction bid, and a direct or indirect indication of the person listing the vehicle for sale. Thus, properties of an entity (e.g., a person or thing) relevant to one environment may differ from properties of the entity relevant to another environment. Accordingly, an object's properties considered during object comparison in one environment may differ from those considered during object comparison in a second environment.
In some embodiments, effective data comparison may also involve considering which properties tend to distinguish an entity (e.g., a person or thing) from other instances of the entity. For example, a YIN for an automobile should by design be unique to that automobile. However, occasional situations may arise where a YIN is not unique to a o particular automobile. Such situations may arise from intentional errors or accidental errors. An examp'e of an intentional error is attempting fraudifient registration of a stolen vehicle under an assumed YIN. An example of an accidental error occurs when a smog check worker incorrectly enters a YIN into a computer at a smog check station, which leads to a smog check record with an incorrect VIN subsequently being communicated to a state database. Data errors exist in real world data processing environments, so some embodiments of the present disclosure minimize or eliminate errors by identifying objects through a combination of several object properties rather than identifying objects through use of a single object property.
In some embodiments, one or more identifying properties of an object are extracted from the object and stored in a data structure. This data structure is referred to as a "slug"; it contains information that may be sufficient to uniquely identify an entity (e.g., a person or thing) with some degree of information redundancy to allow for detecting errors in the properties within the slug. In some embodiments, the slug comprises a concatenation of strings separated by a delimiter character. In some embodiments, the delimiter character is a NULL character while in other embodiments the delimiter character may be a character not otherwise present in the concatenated string. In some embodiments, the concatenated strings may be delimited by a delimiter string (e.g.,"-- ") rathcr than a dclimitcr charactcr. In cmbodimcnts cmploying a dclimitcr string, thc delimiter string may be any string that is not otherwise present in the strings that were concatenated. In other embodiments, the slug comprises a data structure such as an object, array, structure, or associative array.
For example, in one embodiment, slug for an automobile may contain properties reflecting a yIN, make, model, and year for the automobile. Inclusion of make, model, and year properties for the automobile within the slug provides a capability for detecting errors in the VIN property because the V1N property is not the only object property being compared. For slugs associated with two automobiles to match in the presence of an error in the yIN property of one automobile object, an automobile object with the same VTN property as the erroneous VTN must a'so have the same make, model, and year properties.
The odds of this coincidental match of multiple properties between two or more objects may be fleetingly low. Therefore, inclusion of some degree of information redundancy should avoid or at least substantially reduce erroneous object comparison matches jo rdative to object comparisons only comparing a single property between objects notwithstanding that the single propertywas intended to uniquely identify its corresponding entity (e.g., person or thing).
Exemplary embodiments will now be described that solve the first proNem discussed above, i.e., efficiently comparing a particular object (hereinafter a "target object") to all objects in a corpus. The disclosed embodiments utilize a Bloom filter to identify slugs associated with objects in the corpus that do not match the s'ug for the target object.
This quick recognition is performed by discarding slugs that are associated with a different bin in the Bloom filter than the bin associated with the slug for the target object.
Bloom filters have the property that two s'ugs falling into different bins within the Bloom filter are certain to have different properties and thus reflect different objects.
Therefore, if the slug for the target object does not fall into the same bin as the slug for a particular object in the corpus, the target object does not match the particular object in the corpus and may thus be removed from future consideration in such embodiments.
FIC.i illustratcs a flowchart of an cxcmplaryproccss 100 for comparing a targct objcct to at least some objects in a corpus, consistent with some embodiments of the present disclosure. In some embodiments, the target object to be compared to at least some objects in the corpus is a member of the corpus. Tn these embodiments, a comparison between the target object and all other objects in the corpus is performed. In other embodiments, the object to be compared to at least some objects in the corpus is not a member of the corpus. tn these other embodiments, a comparison between the target object and all objects in the corpus is performed.
As illustrated, in step 102, a Bloom filter is sized and created with consideration for the error rate that win result for the corpus size that is being processed. For example, increasing the number of bins in a Bloom filter may tend to decrease the error rate for a specific corpus size while reducing the number of bins in a Bloom filter may tend to increase the error rate for a specific corpus size. Techniques for sizing a Bloom filter to achieve a target error rate for a specific corpus size are well known in the art, so these techniques are not discussed herein.
o Tn step 104, a slug for the target object (i.e., the object against which afl objects in the corpus will be compared) is generated. Considerations for selecting which properties of an object to include in a slug were discussed above. Instep io6, a Bloom filter bin corresponding to the slug for the target object is determined. In some embodiments, a Bloom filter bin for a slug may be determined by inputting the slug to a Bloom filter and directing the Bloom filter to disclose the bin into which the shig was added.
Tn other embodiments, a moom filter bin for a slug may be determined by presenting the slug as a input to a software function associated with the Bloom filter without storing the slug in the Bloom filter. In additional embodiments, a bin for a slug may be determined by inputting the slug into a software function reflecting a bin selection algorithm for a Bloom filter in the absence of using an actual and/or complete Bloom filter and receiving the Bloom filter bin as an output of that software function. tn other embodiments, other approaches to yielding a Bloom filter bin from a slug may be utilized. These approaches for identi'ing a Bloom filter bin for a slug, consistent with the embodiments discussed above, are collectively referred to in steps io6, io8. The determined Bloom filter bin will be utilized to identify slug comparison matches, some of which may be false positives", using the Bloom filter as discussed below.
Tn stcp 108, a slug for cach objcct in thc corpus is gcncratcd. In stcp 110, a Bloom filtcr bin for each object in the corpus is determined. Tn some embodiments, a Bthom filter bin for an object may be determined by inputting the object's slug into the Bloom filter and directing the Bloom filter to disclose the bin into which the s'ug was added.
After completion of step no, slugs corresponding to the bin identified in step 108 reflect matches with the slug for the target object. Some of these matches, however, may be false positive matches rather than true matches. Therefore, steps 112 and 114 filter out the false positive matches through use of a muttimap.
Tn step 112, for each slug corresponding to an object in the corpus whose bin in the Bloom filter is the same bin as the slug for the target object, the slug corresponding to an object in the corpus and its corresponding object in the corpus is added to a multimap. When adding the slug and its corresponding object to the multimap, the slug represents the key to the multimap and the object in the corpus represents the value to the multimap. This multimap will be utilized to remove false positives from jo processing. In step 114, the process concludes by selecting the true positive matches identified in the multimap. These non-fa'se positive matches can be retrieved from the multimap by reading data from the multimap with the slug for the target object as a key.
Tn some embodiments, process 100 maybe distributed across multiple processors. For examp'e, a Bloom filter may exist on each of several processors and steps 102 through 114 can be executed on each of the several processors. The corpus of objects maybe distributed among the various processors so that all objects are processed by one processor, but no object is processed by more than one processor. In such embodiments, each of the multiple processors outputs a portion of the objects in the corpus that match the target object.
Exemplary embodiments will now be described that solve the second problem discussed above, i.e., efficiently comparing all objects to all objects in a corpus. These embodiments utilize a counting Bloom filter to quickly identify slugs associated with objects in the corpus that do not match the slug for the target object. Counting Bloom filters are well known in the art, so their structure and construction are not discussed herein.
Tn particular, if a bin in the counting Bloom filter has a value of zero or one after slugs for all of the objects in the corpus have been input to the moom filter, no object whose slug is associated with that bin could match another s'ug, so these s'ugs are removed from further consideration. These slugs can be removed because those skilled in the art will recognize that Bloom filters can have false positives but they cannot have false negatives. Therefore, a counting Bloom filter bin whose count is less than two reflects an accurate determination that no match exists between slugs associated with that bin -10-because any match would create a count of at least two. However, false positive may exist among objects whose slugs are associated with the same Bloom filter bin, so false positives may be removed through additional processing, as discussed below.
FIG.2 illustrates a flowchart of an exemplary process 200 for comparing all objects in a corpus to all other objects in the corpus, to determine matches within the corpus, consistent with some embodiments of the present disclosure. As illustrated, in step 202, a counting Bloom filler is sized and created with consideration for the error rate that will result for the corpus size that is being processed. For example, increasing the o number of bins in a counting Bloom filter may tend to decrease the error rate for a specific corpus size while reducing the number of bins in a counting Bloom filter may tend to increase the error rate for a specific corpus size. Techniques for sizing a counting Bloom filter to achieve a target error rate for a specific corpus size are well known in the art, so these techniques are not discussed herein.
Tn some embodiments, the counting Bloom filter may comprise an N-bit counter and these counters may be imp'emented as two-bit counters (i.e., N=2). in other embodiments, these counters may be one-bit counters or counters of more than two bits. In additional embodiments, these counters are saturation counters; i.e., these counters will count up to a maximum value and then not exceed that value.
In step 204, a slug for each object in the corpus is generated. in step 206, each slug is input to the counting Bloom filter, which causes a counter in a bin corresponding to a slug to be incremented. After completion of step 206, bins whose counters have a value greater than one reflect one or more matching slugs. Some of these matches, however, may be false positive matches rather than true matches. Therefore, steps 208 and 210 filter out the false positive matches through use of a multimap.
Tn stcp 208, for slugs associatcd with bins in thc counting Bloom filtcr whosc countcrs have a value greater than 1, the shig and its associated object are added to a multimap.
When adding the slug and its corresponding object to the multimap, the slug represents the key to the multimap and the object in the corpus represents the value to the multimap. This multimap will be utilized to remove false positives from processing. In step 210, the process 200 concludes by outputting a value for any key in the multimap that has two or more values. The outputted values reflect objects whose slugs matched slugs of at least one other object in the corpus. Thus, the objects outputted identify -11 -objects whose selected properties, as reflected in an object's slug, unambiguously match at least one other object in the corpus.
Tn some embodiments, process 200 maybe distributed across multiple processors. For example, a connting Bloom ifiter may exist on each of several processors and steps 202, 204, and 206 can be executed on each of the several processors. The corpus of objects maybe distributed among the various processors so that all objects are processed by one processor, but no object is processed by more than one processor. In such embodiments, prior to executing step 208, counters for each bin in the counting Bloom jo filter are summed together with counters for the same bin in counting Bloom filters on other processors. Thereafter, process 200 continues by executing steps 208 and 210 on a single processor.
Exemplary embodiments will now be described that solve the third problem discussed above, i.e., efficiently identifying unique objects in a corpus. These embodiments utilize a counting Bloom filter and a multimap to quicldyidentify unique objects. Upon inputting slugs for afi objects in the corpus into the counting Bloom filter, any bin with a count va'ue of one reflects a unique object because Bloom filters do not generate false negatives. Additionally, to the extent that bins have count values of two or more, those count values could reflect false positives. Therefore, a multimap allows a determination of whether the matches reflected in the count values were false or true positives.
FIC.3 illustrates a flowchart of an exemplary process 300 for comparing all objects in a corpus to all other objects in the corpus, to determine unique objects within the corpus, consistent with some embodiments of the present disclosure. As illustrated, in step 302, a counting Bloom filler is sized and created with consideration for the error rate that will result for the corpus size that is being processed. For example, increasing the number of bins in a counting Bloom filter may tend to decrease the error rate for a spccific corpus sizc whilc rcducing thc numbcr of bins in a counting Bloom flltcr may tend to increase the error rate for a specific corpus size. Techniques for sizing a counting Bloom filter to achieve a target error rate for a specific corpus size are well known so these techniques are not discussed herein.
In some embodiments, the counting Bloom filler may comprise an N-bit counter and these counters may be implemented as two-bit counters (i.e., N=2). in other embodiments, these counters may be one-bit counters or counters of more than two -12 -bits. In additional embodiments, these counters are saturation counters; i.e., these counters will count up to a maximum value and then not exceed that value.
Tn step 304, a slug for each object in the corpus is generated. In step 306, each slug is input to the counting Bloom filter, which causes a counter in a bin corresponding to the slug to be incremented. As previously discussed, after slugs for all objects in the corpus have been input to the counting Bloom filter, any bin with a count value of one reflects a unique object within the corpus because the counting Bloom filter does not generate false negatives. Therefore, in step 308, for each slug whose counter in the counting jo Bloom filter is one, the slugs corresponding object is output as a unique object within the corpus.
After completion of step 308, bins whose counters have a value greater than one reflect one or more matching slugs; i.e., slugs that are not unique. Some of these matches, however, may be false positive matches rather than trite matches due to the nature of Bloom filters, as discussed above. Therefore, steps 310 and 312 filter out the false positive matches through use of a multimap.
Steps 310 and 312 determine whether the counting Bloom filter is masking the existence of other unique objects because the Bloom filter allows for false positives. In step 310, for each slug whose associated bin has a counter va'ue greater than one, the slug is input as a key to a multimap and the object corresponding to the s'ug is input as a value for that key. In step 312, the process terminates after outputting each value in the multimap for keys that have only one va'ue. Unique objects within the corpus are reflected by the collection of objects output from step 308 and the collection of objects output by step 312 because the former reflects objects whose slugs were the only slug in a counting Bloom filter's bin and were therefore unique among slugs associated with objects in the corpus while the latter reflects slugs that were ftilse positives within the counting Bloom filter but were disambiguated by the muftimap.
Tn some embodiments, process 300 maybe distributed across multiple processors. For examp'e, a counting Bloom filter may exist on each of severa' processors and steps 302, 304, and 306 can be executed on each of several processors. The corpus of objects may be distributed among the various processors so that all objects are processed by one processor, but no object is processed by more than one processor. in such embodiments, prior to executing step 308, counters for each bin in the counting Bloom -13 -filter are summed together with counters for the same bin in counting Bloom filters on other processors. Thereafter, process 300 continues by execnting steps 308, 310, and 312 on a single processor.
FIG. 4 illustrates an exemplary computing environment within which the embodiments
of the present disclosure can be implemented.
Computer system 400 includes a bus 402 or other communication mechanism for communicating information, and a hardware processor 404 coupled with bus 402 for jo processing information. In some embodiments, hardware processor 404 can be, for examp'e, a general-purpose microprocessor or it can be a reduced instruction set microprocessor.
Computer system 400 also includes a main memory 406, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executedbyprocessor4o4. Main memory 406 a'so can be used for storing temporary variables or other intermediate information during execution of instructions by processor 404. Such instructions, when stored in non-transitory storage media accessible to processor 404, render computer system 400 into a special-purpose machine that is customized to perform the operations specified in the instructions.
In some embodiments, computer system 400 further includes a read only memory (RUM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404. A storage device 410, such as a magnetic disk or optical disk, is provided and coupled to bus 402 for storing information and instructions.
Computer systcm 400 can bc couplcd via bus 402 to a display 412, such as a cathode ray tube (CRT) or LCD panel, for displaying information to a computer user. An input device 414, induding alphanumeric and other keys, is coupled to bus 402 for communicating information and command selections to processor 404. Another type of user input device is cursor control 416. such as a mouse, a trackbafl, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412. The input device typically has degrees of freedom in two axes, a first axis (for example, x) and a second axis (for example, y), that allows the device to specify positions in a plane.
Computer system 400 can implement the processes and techniques described herein using customized hard-wired logic, one or more ASICs or FPGAS, firmware and/or program logic which in combination with the computer system causes or programs computer system 400 to be a special-purpose machine. In some embodiments, the processes and techniques herein are performed by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained jo in main memory 406. Such instructions can be read into main memory 406 from another storage medium, such as storage device 410. Execution of the sequences of instructions contained in main memoiy 406 causes processor 404 to perform the process steps described herein. In other embodiments, hard-wired circuitry can be used in place of or in combination with software instructions.
The term "storage media" as used herein refers to any non-transitory media that store data and/or instnictions that cause a machine to operate in a specific manner. Such storage media can comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 410.
Volatile media includes dynamic memory, such as main memory 406. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
Storage media is distinct from but can be used in conjunction with transmission media.
Transmission media participates in transferring information between storage media.
For cxamplc, transmission mcdia includcs coaxial cabics, coppcr wirc and flbrc optics, including the wires that comprise bus 402. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Various forms of media can be involved in carrying one or more sequences of one or more instructions to processor 404 for execution. For example, the instructions can initially be carried on a magnetic disk or solid state drive of a remote computer. The -15 -remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 400 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 402.
Bus 402 carnes the data to main memory 406, from which processor 404 retrieves and executes the instructions. The instructions received by main memory 406 can optionally be stored on storage device 410 either before or after execution by processor 404.
Computer system 400 also includes a communication interface 418 coupled to bus 402.
Communication interface 418 provides a two-way data communication coupling to a network link 420 that is connected to a local network 422. For example, communication interface 418 can be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 418 can be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented. In any such implementation, communication interface 418 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 420 typically provides data communication through one or more networks to other data devices. For example, network link 420 can provide a connection through local network 422 to a host computer 424 or to data equipment operated by an Internet Service Provider (ISP) 426. ISP 426 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 428. Local network 422 and Internet 428 both usc clcctrical, clcctromagnctic or optical signals that carry digital data strcams.
The signals through the various networks and the signals on network fink 420 and through communication interface 418, which carry the digital data to and from computer system 400, are example forms of transmission media.
Computer system 400 can send messages and receive data, including program code, through the network(s), network link 420 and communication interface 418. In the Internet example, a server 430 might transmit a requested code for an application program through Internet 428, ISP 426, local network 422 and communication interface 418. The received code can be executed by processor 404 as it is received, and/or stored in storage device 410, or other non-volatile storage for later execution. -17-

Claims (21)

  1. Claims 1. A method for associating a first object with one or more objects within a plurality of objects, each object comprising a first plurality of properties, each property comprising data reflecting a characteristic of an entity represented by the object, the associated objects comprising matching data in corresponding properties for a second plurality of properties, the method comprising the following operations performed by one or more processors: executing, for each object within the plurality of objects and for the first object, Jo the foB owing: creating a slug for the object, the slug comprising the second plurality of properties from the object; and inputting the slug for the object into a Bloom filter; and creating for a bin within the Bloom filter corresponding to the slug for the first object, an association between objects whose slugs correspond to the bin if the slugs for those objects match.
  2. 2. The method of claim 1, further comprising: sizing the Bloom filter for a predetermined error rate and number of objects within the plurality of objects.
  3. 3. The method of claim 1 or claim 2, further comprising: reading the plurality of objects from at least one database.
  4. 4. The method of claim 3, wherein the step of determining if the slugs within the bin corresponding to the slug for the first object match the slug for the first object comprises the following operations performed by one or more processors: inputting for each slug in the bin corresponding to the slug for the first object, thc cach slug and its corrcsponding objcct into a multimap, whcrcin thc slug for the each object is a key to the multimap and its corresponding object is a value to the multimap; and associating objects in the multimap whose key matches the slug for the first object.
  5. 5. The method of claim 4, wherein the slug comprises a concatenation of two or more strings separated by a delimiter between each concatenated string.
    -18 -
  6. 6. The method of claim 5, wherein the delimiter comprises a character that is not otherwise present in the strings that were concatenated.
  7. 7. The method of claim 5 or claim 6, wherein the delimiter comprises a sequence of two or more characters and the seqnence of two or more characters is not present in any of the two or more strings that were concatenated.
  8. 8. The method of any of claims 5 to 7, wherein the number of properties in the first o plurality of properties equals the number of properties in the second phirality of properties.
  9. 9. The method of any of claims 5 to 8, wherein the number of properties in the first plurality of properties is greater than the number of properties in the second plurality of properties.
  10. 10. A method for associating objects within one or more groups of objects within a plurality of objects, each object comprising a first plurality of properties, each property comprising data reflecting a characteristic of an entity represented by the object, the associated objects within a group of objects comprising matching data in corresponding properties for a second plurality of properties, the method comprising the following operations performed by one or more processors: executing, for each object within the plurality of objects, the following: creating a slug for the object, the slug comprising the second plurality of properties from the object; and inputting the slug for the object into a counting Bloom filter; inputting for each created slug, the slug and its corresponding object into a multimap, if a bin within the counting Bloom filter corresponding to the slug has a count value greatcr than 1, whcrcin thc slug is a kcy to the multimap and thc objcct is a vahie to the multimap; and associating the objects stored as values for each multimap key with two or more corresponding values.
  11. ii. The method of claim 10, further comprising: sizing the counting Bloom filter for a predetermined error rate and number of objects within the plurality of objects.
  12. 12. The method of claim ii, further comprising: reading the plurality of objects from at least one database.
  13. 13. The method of claim 12, wherein each entry in the counting Bloom filter comprises a 2-bit counter.
  14. 14. The method of claim 13, wherein each 2-bit counter is a saturation counter.
  15. 15. The method of any of claims 10 to 14, wherein the slug comprises a concatenation of two or more strings separated by a delimiter between each concatenated string.
  16. 16. The method of claim 15, wherein the delimiter comprises a character that is not otherwise present in the strings that were concatenated.
  17. 17. The method of daim 15 or claim 16, wherein the delimiter comprises a sequence of two or more characters and the sequence of two or more characters is not present in any of the two or more strings that were concatenated.
  18. 18. The method of any of claims 15 to 17, wherein the number of properties in the first plurality of properties equals the number of properties in the second plurality of properties.
  19. 19. The method of any of claims 15 to i8, wherein the number of properties in the first plurality of properties is greater than the number of properties in the second plurality of properties.
  20. 20. A computer program comprising machinc readable instructions that when executed by computing apparatus cause it to perform the method of any preceding claim.
  21. 21. A system for associating a first object with one or more objects within a p'urality of objects, the system being configured to perform the method of any of claims 1 to 20.
GB1404486.1A 2013-03-15 2014-03-13 Computer-implemented systems and methods for comparing and associating objects Withdrawn GB2513720A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361801297P 2013-03-15 2013-03-15
US14/099,661 US8924388B2 (en) 2013-03-15 2013-12-06 Computer-implemented systems and methods for comparing and associating objects

Publications (2)

Publication Number Publication Date
GB201404486D0 GB201404486D0 (en) 2014-04-30
GB2513720A true GB2513720A (en) 2014-11-05

Family

ID=50634706

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1404486.1A Withdrawn GB2513720A (en) 2013-03-15 2014-03-13 Computer-implemented systems and methods for comparing and associating objects

Country Status (2)

Country Link
DE (1) DE102014204830A1 (en)
GB (1) GB2513720A (en)

Families Citing this family (127)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8688749B1 (en) 2011-03-31 2014-04-01 Palantir Technologies, Inc. Cross-ontology multi-master replication
US8554719B2 (en) 2007-10-18 2013-10-08 Palantir Technologies, Inc. Resolving database entity information
US10747952B2 (en) 2008-09-15 2020-08-18 Palantir Technologies, Inc. Automatic creation and server push of multiple distinct drafts
US9104695B1 (en) 2009-07-27 2015-08-11 Palantir Technologies, Inc. Geotagging structured data
US9547693B1 (en) 2011-06-23 2017-01-17 Palantir Technologies Inc. Periodic database search manager for multiple data sources
US8732574B2 (en) 2011-08-25 2014-05-20 Palantir Technologies, Inc. System and method for parameterizing documents for automatic workflow generation
US8782004B2 (en) 2012-01-23 2014-07-15 Palantir Technologies, Inc. Cross-ACL multi-master replication
US9798768B2 (en) 2012-09-10 2017-10-24 Palantir Technologies, Inc. Search around visual queries
US9348677B2 (en) 2012-10-22 2016-05-24 Palantir Technologies Inc. System and method for batch evaluation programs
US9501507B1 (en) 2012-12-27 2016-11-22 Palantir Technologies Inc. Geo-temporal indexing and searching
US10140664B2 (en) 2013-03-14 2018-11-27 Palantir Technologies Inc. Resolving similar entities from a transaction database
US8909656B2 (en) 2013-03-15 2014-12-09 Palantir Technologies Inc. Filter chains with associated multipath views for exploring large data sets
US8868486B2 (en) 2013-03-15 2014-10-21 Palantir Technologies Inc. Time-sensitive cube
US8903717B2 (en) 2013-03-15 2014-12-02 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US10275778B1 (en) 2013-03-15 2019-04-30 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures
US8924388B2 (en) 2013-03-15 2014-12-30 Palantir Technologies Inc. Computer-implemented systems and methods for comparing and associating objects
US8799799B1 (en) 2013-05-07 2014-08-05 Palantir Technologies Inc. Interactive geospatial map
US8886601B1 (en) 2013-06-20 2014-11-11 Palantir Technologies, Inc. System and method for incrementally replicating investigative analysis data
US8601326B1 (en) 2013-07-05 2013-12-03 Palantir Technologies, Inc. Data quality monitors
US9785317B2 (en) 2013-09-24 2017-10-10 Palantir Technologies Inc. Presentation and analysis of user interaction data
US8938686B1 (en) 2013-10-03 2015-01-20 Palantir Technologies Inc. Systems and methods for analyzing performance of an entity
US8812960B1 (en) 2013-10-07 2014-08-19 Palantir Technologies Inc. Cohort-based presentation of user interaction data
US9116975B2 (en) 2013-10-18 2015-08-25 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores
US9105000B1 (en) 2013-12-10 2015-08-11 Palantir Technologies Inc. Aggregating data from a plurality of data sources
US10579647B1 (en) 2013-12-16 2020-03-03 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US9734217B2 (en) 2013-12-16 2017-08-15 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US10356032B2 (en) 2013-12-26 2019-07-16 Palantir Technologies Inc. System and method for detecting confidential information emails
US8832832B1 (en) 2014-01-03 2014-09-09 Palantir Technologies Inc. IP reputation
US8935201B1 (en) 2014-03-18 2015-01-13 Palantir Technologies Inc. Determining and extracting changed data from a data source
US9836580B2 (en) 2014-03-21 2017-12-05 Palantir Technologies Inc. Provider portal
US9619557B2 (en) 2014-06-30 2017-04-11 Palantir Technologies, Inc. Systems and methods for key phrase characterization of documents
US9129219B1 (en) 2014-06-30 2015-09-08 Palantir Technologies, Inc. Crime risk forecasting
US9535974B1 (en) 2014-06-30 2017-01-03 Palantir Technologies Inc. Systems and methods for identifying key phrase clusters within documents
US9256664B2 (en) 2014-07-03 2016-02-09 Palantir Technologies Inc. System and method for news events detection and visualization
US20160026923A1 (en) 2014-07-22 2016-01-28 Palantir Technologies Inc. System and method for determining a propensity of entity to take a specified action
US9390086B2 (en) 2014-09-11 2016-07-12 Palantir Technologies Inc. Classification system with methodology for efficient verification
US9501851B2 (en) 2014-10-03 2016-11-22 Palantir Technologies Inc. Time-series analysis system
US9767172B2 (en) 2014-10-03 2017-09-19 Palantir Technologies Inc. Data aggregation and analysis system
US9785328B2 (en) 2014-10-06 2017-10-10 Palantir Technologies Inc. Presentation of multivariate data on a graphical user interface of a computing system
US9229952B1 (en) 2014-11-05 2016-01-05 Palantir Technologies, Inc. History preserving data pipeline system and method
US9043894B1 (en) 2014-11-06 2015-05-26 Palantir Technologies Inc. Malicious software detection in a computing system
US9483546B2 (en) 2014-12-15 2016-11-01 Palantir Technologies Inc. System and method for associating related records to common entities across multiple lists
US10552994B2 (en) 2014-12-22 2020-02-04 Palantir Technologies Inc. Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items
US10362133B1 (en) 2014-12-22 2019-07-23 Palantir Technologies Inc. Communication data processing architecture
US9348920B1 (en) 2014-12-22 2016-05-24 Palantir Technologies Inc. Concept indexing among database of documents using machine learning techniques
US10452651B1 (en) 2014-12-23 2019-10-22 Palantir Technologies Inc. Searching charts
US9335911B1 (en) 2014-12-29 2016-05-10 Palantir Technologies Inc. Interactive user interface for dynamic data analysis exploration and query processing
US9817563B1 (en) 2014-12-29 2017-11-14 Palantir Technologies Inc. System and method of generating data points from one or more data stores of data items for chart creation and manipulation
US11302426B1 (en) 2015-01-02 2022-04-12 Palantir Technologies Inc. Unified data interface and system
US9727560B2 (en) 2015-02-25 2017-08-08 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
EP3070622A1 (en) 2015-03-16 2016-09-21 Palantir Technologies, Inc. Interactive user interfaces for location-based data analysis
US9886467B2 (en) 2015-03-19 2018-02-06 Plantir Technologies Inc. System and method for comparing and visualizing data entities and data entity series
US9348880B1 (en) 2015-04-01 2016-05-24 Palantir Technologies, Inc. Federated search of multiple sources with conflict resolution
US10103953B1 (en) 2015-05-12 2018-10-16 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US10628834B1 (en) 2015-06-16 2020-04-21 Palantir Technologies Inc. Fraud lead detection system for efficiently processing database-stored data and automatically generating natural language explanatory information of system results for display in interactive user interfaces
US9418337B1 (en) 2015-07-21 2016-08-16 Palantir Technologies Inc. Systems and models for data analytics
US9392008B1 (en) 2015-07-23 2016-07-12 Palantir Technologies Inc. Systems and methods for identifying information related to payment card breaches
US9996595B2 (en) 2015-08-03 2018-06-12 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US9456000B1 (en) 2015-08-06 2016-09-27 Palantir Technologies Inc. Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications
US9600146B2 (en) 2015-08-17 2017-03-21 Palantir Technologies Inc. Interactive geospatial map
US10127289B2 (en) 2015-08-19 2018-11-13 Palantir Technologies Inc. Systems and methods for automatic clustering and canonical designation of related data in various data structures
US9671776B1 (en) 2015-08-20 2017-06-06 Palantir Technologies Inc. Quantifying, tracking, and anticipating risk at a manufacturing facility, taking deviation type and staffing conditions into account
US11150917B2 (en) 2015-08-26 2021-10-19 Palantir Technologies Inc. System for data aggregation and analysis of data from a plurality of data sources
US9485265B1 (en) 2015-08-28 2016-11-01 Palantir Technologies Inc. Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces
US10706434B1 (en) 2015-09-01 2020-07-07 Palantir Technologies Inc. Methods and systems for determining location information
US9639580B1 (en) 2015-09-04 2017-05-02 Palantir Technologies, Inc. Computer-implemented systems and methods for data management and visualization
US9984428B2 (en) 2015-09-04 2018-05-29 Palantir Technologies Inc. Systems and methods for structuring data from unstructured electronic data files
US9576015B1 (en) 2015-09-09 2017-02-21 Palantir Technologies, Inc. Domain-specific language for dataset transformations
US9424669B1 (en) 2015-10-21 2016-08-23 Palantir Technologies Inc. Generating graphical representations of event participation flow
US10706056B1 (en) 2015-12-02 2020-07-07 Palantir Technologies Inc. Audit log report generator
US9514414B1 (en) 2015-12-11 2016-12-06 Palantir Technologies Inc. Systems and methods for identifying and categorizing electronic documents through machine learning
US9760556B1 (en) 2015-12-11 2017-09-12 Palantir Technologies Inc. Systems and methods for annotating and linking electronic documents
US10114884B1 (en) 2015-12-16 2018-10-30 Palantir Technologies Inc. Systems and methods for attribute analysis of one or more databases
US10373099B1 (en) 2015-12-18 2019-08-06 Palantir Technologies Inc. Misalignment detection system for efficiently processing database-stored data and automatically generating misalignment information for display in interactive user interfaces
US10871878B1 (en) 2015-12-29 2020-12-22 Palantir Technologies Inc. System log analysis and object user interaction correlation system
US9792020B1 (en) 2015-12-30 2017-10-17 Palantir Technologies Inc. Systems for collecting, aggregating, and storing data, generating interactive user interfaces for analyzing data, and generating alerts based upon collected data
US10698938B2 (en) 2016-03-18 2020-06-30 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
US9652139B1 (en) 2016-04-06 2017-05-16 Palantir Technologies Inc. Graphical representation of an output
US10068199B1 (en) 2016-05-13 2018-09-04 Palantir Technologies Inc. System to catalogue tracking data
US10007674B2 (en) 2016-06-13 2018-06-26 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US10545975B1 (en) 2016-06-22 2020-01-28 Palantir Technologies Inc. Visual analysis of data using sequenced dataset reduction
US10909130B1 (en) 2016-07-01 2021-02-02 Palantir Technologies Inc. Graphical user interface for a database system
US11106692B1 (en) 2016-08-04 2021-08-31 Palantir Technologies Inc. Data record resolution and correlation system
US10552002B1 (en) 2016-09-27 2020-02-04 Palantir Technologies Inc. User interface based variable machine modeling
US10133588B1 (en) 2016-10-20 2018-11-20 Palantir Technologies Inc. Transforming instructions for collaborative updates
US10726507B1 (en) 2016-11-11 2020-07-28 Palantir Technologies Inc. Graphical representation of a complex task
US9842338B1 (en) 2016-11-21 2017-12-12 Palantir Technologies Inc. System to identify vulnerable card readers
US10318630B1 (en) 2016-11-21 2019-06-11 Palantir Technologies Inc. Analysis of large bodies of textual data
US11250425B1 (en) 2016-11-30 2022-02-15 Palantir Technologies Inc. Generating a statistic using electronic transaction data
GB201621434D0 (en) 2016-12-16 2017-02-01 Palantir Technologies Inc Processing sensor logs
US9886525B1 (en) 2016-12-16 2018-02-06 Palantir Technologies Inc. Data item aggregate probability analysis system
US10249033B1 (en) 2016-12-20 2019-04-02 Palantir Technologies Inc. User interface for managing defects
US10728262B1 (en) 2016-12-21 2020-07-28 Palantir Technologies Inc. Context-aware network-based malicious activity warning systems
US11373752B2 (en) 2016-12-22 2022-06-28 Palantir Technologies Inc. Detection of misuse of a benefit system
US10360238B1 (en) 2016-12-22 2019-07-23 Palantir Technologies Inc. Database systems and user interfaces for interactive data association, analysis, and presentation
US10721262B2 (en) 2016-12-28 2020-07-21 Palantir Technologies Inc. Resource-centric network cyber attack warning system
US10762471B1 (en) 2017-01-09 2020-09-01 Palantir Technologies Inc. Automating management of integrated workflows based on disparate subsidiary data sources
US10133621B1 (en) 2017-01-18 2018-11-20 Palantir Technologies Inc. Data analysis system to facilitate investigative process
US10509844B1 (en) 2017-01-19 2019-12-17 Palantir Technologies Inc. Network graph parser
US10515109B2 (en) 2017-02-15 2019-12-24 Palantir Technologies Inc. Real-time auditing of industrial equipment condition
US10866936B1 (en) 2017-03-29 2020-12-15 Palantir Technologies Inc. Model object management and storage system
US10581954B2 (en) 2017-03-29 2020-03-03 Palantir Technologies Inc. Metric collection and aggregation for distributed software services
US10133783B2 (en) 2017-04-11 2018-11-20 Palantir Technologies Inc. Systems and methods for constraint driven database searching
US11074277B1 (en) 2017-05-01 2021-07-27 Palantir Technologies Inc. Secure resolution of canonical entities
US10606872B1 (en) 2017-05-22 2020-03-31 Palantir Technologies Inc. Graphical user interface for a database system
US10795749B1 (en) 2017-05-31 2020-10-06 Palantir Technologies Inc. Systems and methods for providing fault analysis user interface
US10956406B2 (en) 2017-06-12 2021-03-23 Palantir Technologies Inc. Propagated deletion of database records and derived data
US11216762B1 (en) 2017-07-13 2022-01-04 Palantir Technologies Inc. Automated risk visualization using customer-centric data analysis
US10430444B1 (en) 2017-07-24 2019-10-01 Palantir Technologies Inc. Interactive geospatial map and geospatial visualization systems
US10235533B1 (en) 2017-12-01 2019-03-19 Palantir Technologies Inc. Multi-user access controls in electronic simultaneously editable document editor
US11281726B2 (en) 2017-12-01 2022-03-22 Palantir Technologies Inc. System and methods for faster processor comparisons of visual graph features
US10783162B1 (en) 2017-12-07 2020-09-22 Palantir Technologies Inc. Workflow assistant
US10877984B1 (en) 2017-12-07 2020-12-29 Palantir Technologies Inc. Systems and methods for filtering and visualizing large scale datasets
US10769171B1 (en) 2017-12-07 2020-09-08 Palantir Technologies Inc. Relationship analysis and mapping for interrelated multi-layered datasets
US11314721B1 (en) 2017-12-07 2022-04-26 Palantir Technologies Inc. User-interactive defect analysis for root cause
US11061874B1 (en) 2017-12-14 2021-07-13 Palantir Technologies Inc. Systems and methods for resolving entity data across various data structures
US10838987B1 (en) 2017-12-20 2020-11-17 Palantir Technologies Inc. Adaptive and transparent entity screening
US11263382B1 (en) 2017-12-22 2022-03-01 Palantir Technologies Inc. Data normalization and irregularity detection system
US10877654B1 (en) 2018-04-03 2020-12-29 Palantir Technologies Inc. Graphical user interfaces for optimizations
US10754822B1 (en) 2018-04-18 2020-08-25 Palantir Technologies Inc. Systems and methods for ontology migration
US10885021B1 (en) 2018-05-02 2021-01-05 Palantir Technologies Inc. Interactive interpreter and graphical user interface
US10754946B1 (en) 2018-05-08 2020-08-25 Palantir Technologies Inc. Systems and methods for implementing a machine learning approach to modeling entity behavior
US11061542B1 (en) 2018-06-01 2021-07-13 Palantir Technologies Inc. Systems and methods for determining and displaying optimal associations of data items
US10795909B1 (en) 2018-06-14 2020-10-06 Palantir Technologies Inc. Minimized and collapsed resource dependency path
US11119630B1 (en) 2018-06-19 2021-09-14 Palantir Technologies Inc. Artificial intelligence assisted evaluations and user interface for same
US11126638B1 (en) 2018-09-13 2021-09-21 Palantir Technologies Inc. Data visualization and parsing system
US11294928B1 (en) 2018-10-12 2022-04-05 Palantir Technologies Inc. System architecture for relating and linking data objects

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
GB201404486D0 (en) 2014-04-30
DE102014204830A1 (en) 2014-09-18

Similar Documents

Publication Publication Date Title
US10152531B2 (en) Computer-implemented systems and methods for comparing and associating objects
GB2513720A (en) Computer-implemented systems and methods for comparing and associating objects
GB2513721A (en) Computer-implemented systems and methods for comparing and associating objects
US9569528B2 (en) Detection of confidential information
CA2792070C (en) System and method for matching of database records based on similarities to search queries
US11201850B2 (en) Domain name processing systems and methods
US20140122294A1 (en) Determining a characteristic group
US8352460B2 (en) Multiple candidate selection in an entity resolution system
CN108009435B (en) Data desensitization method, device and storage medium
US11609897B2 (en) Methods and systems for improved search for data loss prevention
US11182375B2 (en) Metadata validation tool
US11308130B1 (en) Constructing ground truth when classifying data
US10187495B2 (en) Identifying problematic messages
US11449499B1 (en) System and method for retrieving data
CN111177362A (en) Information processing method, device, server and medium
US11971891B1 (en) Accessing siloed data across disparate locations via a unified metadata graph systems and methods
EP3786825B1 (en) Natural language processing systems and methods for automatic reduction of false positives in domain discovery
WO2023001708A1 (en) A method and a system for checking ownership and integrity of an ai model using distributed ledger technology (dlt)

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)