US20190243840A1 - Identification and compiling of information relating to an entity - Google Patents

Identification and compiling of information relating to an entity Download PDF

Info

Publication number
US20190243840A1
US20190243840A1 US16/389,300 US201916389300A US2019243840A1 US 20190243840 A1 US20190243840 A1 US 20190243840A1 US 201916389300 A US201916389300 A US 201916389300A US 2019243840 A1 US2019243840 A1 US 2019243840A1
Authority
US
United States
Prior art keywords
records
entity
record
search
interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US16/389,300
Other versions
US11475031B2 (en
Inventor
Mitchell Beard
Allen Chang
Chris Hammett
Jeremy LIU
Matthew Elkherj
Ryan Beiermeister
Ryan Smith
Tatyana Gordeeva
William Ayers
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Palantir Technologies Inc
Original Assignee
Palantir Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Palantir Technologies Inc filed Critical Palantir Technologies Inc
Priority to US16/389,300 priority Critical patent/US11475031B2/en
Publication of US20190243840A1 publication Critical patent/US20190243840A1/en
Assigned to Palantir Technologies Inc. reassignment Palantir Technologies Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Gordeeva, Tatyana, BEIERMEISTER, RYAN, SMITH, RYAN, AYERS, WILLIAM, CHANG, ALLEN, LIU, JEREMY, ELKHERJ, MATTHEW, BEARD, MITCHELL, Hammett, Chris
Assigned to ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT reassignment ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Palantir Technologies Inc.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT reassignment MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Palantir Technologies Inc.
Assigned to Palantir Technologies Inc. reassignment Palantir Technologies Inc. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: ROYAL BANK OF CANADA
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Palantir Technologies Inc.
Assigned to Palantir Technologies Inc. reassignment Palantir Technologies Inc. CORRECTIVE ASSIGNMENT TO CORRECT THE ERRONEOUSLY LISTED PATENT BY REMOVING APPLICATION NO. 16/832267 FROM THE RELEASE OF SECURITY INTEREST PREVIOUSLY RECORDED ON REEL 052856 FRAME 0382. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST. Assignors: ROYAL BANK OF CANADA
Assigned to WELLS FARGO BANK, N.A. reassignment WELLS FARGO BANK, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Palantir Technologies Inc.
Assigned to WELLS FARGO BANK, N.A. reassignment WELLS FARGO BANK, N.A. ASSIGNMENT OF INTELLECTUAL PROPERTY SECURITY AGREEMENTS Assignors: MORGAN STANLEY SENIOR FUNDING, INC.
Publication of US11475031B2 publication Critical patent/US11475031B2/en
Application granted granted Critical
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling

Definitions

  • This disclosure relates to approaches for identifying and compiling information relating to an entity for investigative analysis.
  • Collection of all available digital records of an entity is useful for investigation, such as by the police department or a potential employer as part of a background check.
  • each record may not be associated with the complete or correct identifying information for the entity, and different databases may have entirely different structure or ontology, making collection of such record challenging.
  • Various embodiments of the present disclosure can include systems, methods, and non-transitory computer readable media configured to identify and compile information relating to an entity for investigative analysis.
  • the systems, methods, and non-transitory computer readable media are configured to implement a method that entails searching, in one or more data sources, with a plurality of known characteristics of an entity to obtain a first plurality of records, identifying, from the first plurality of records, a subset of records that match the known characteristics with a substantial confidence, compiling the subset of records to form a unified record representing the entity, and conducting a second search with information from the unified record to obtain a second plurality of search results.
  • the method further comprises presenting, on an interface, at least part of the records from the first plurality and the second plurality, wherein the interface is configured to allow a user to annotate the records.
  • the annotation comprises confirmation that a record is associated with the entity.
  • the method further comprises storing the annotation in a library in a non-transitory medium.
  • the method further comprises ranking the records before presenting the records on the interface.
  • the records of the subset are those that have perfect match to the known characteristics.
  • the method further comprises generating variations of the known characteristics as additional queries for the search.
  • FIG. 1 illustrates a procedure for obtaining, compiling and presenting information relating to an entity for user analysis.
  • FIG. 2 illustrates a flowchart of an example method for obtaining, compiling and presenting information relating to an entity for user analysis.
  • FIG. 3 is a block diagram that illustrates a computer system upon which any of the embodiments described herein may be implemented.
  • Information relating to entities is scattered in different databases. Different records of an entity, such as financial transactions, are often stored individually rather than collectively, which makes the retrieval, visualization and analysis difficult for end users. Moreover, the entities in each record may be identified with different identifications or characteristics of the entity. This further presents a challenge for identifying all relevant information for effective investigation of the entity. Also, redundant or duplicative information about the entity can present challenges for data management or even skew the analysis. A solution is needed for identifying and compiling all available information relating to the entity and enabling an investigator to conduct investigation with the information.
  • a method entails collecting basic information (e.g., name, phone number, date of birth, social security number, email address and address) of an entity and generating one or more search queries.
  • the search queries are used to search against a plurality of data sources for exact as well as approximate matches.
  • the different data sources may be de-centralized, or federated where no master data management systems or defined standards are employed to manage the data sources. All of the matches are collected, and those that most likely relate to the entity (e.g., having perfect matches on name and social security number) are combined into a single record. Optionally, identical records can be merged to remove redundancy.
  • additional searches can be formulated based on the initial search results.
  • the search query can include the basic information of the entity as used in the previous step and can also include relevant information returned from the last search. All of the search results can be combined and presented, on a user interface, to an investigator. The search results can be ranked based on significance or relevance, facilitating analysis by the investigator. The interface can also enable the investigator to annotate the search results, and delete certain results as needed. Any annotation or change that the investigator makes can be optionally saved to a library, which can be shared with other investigators or archived for future use.
  • all or part of the search results can be added back to a data repository serving to enrich the knowledge of the entity.
  • the addition and accumulation of such added information can improve further searches of the entity.
  • an alert can be set up by a user such that a search can be carried out on a predetermined schedule and the search results can be presented to the user.
  • only new results are returned to the user.
  • the searches are automatically updated to incorporate additional information relating to the entity after such information becomes available from the searches.
  • the information identified and compiled as described represents a comprehensive collection of information relating to the entity and all the records of the final results represent potential connections between the entity and activities which may be worth further investigation.
  • the present technology therefore, provides a fast, automated, convenient, and comprehensive method to compile information from different data sources relating to an entity, and to present to an investigator potential connections between data records for investigation.
  • entity refers to any real world object that has attributes useful for identifying the object.
  • An entity can be a person or an organization, and can also be an account, a place, or an event. Attributes for the entity include, for example, names, identification number, characteristics and address, without limitation.
  • database may refer to any data structure for storing and/or organizing data, including, but not limited to, relational databases (Oracle database, mySQL database, Cassandra database, etc.), spreadsheets, XML files, and text file, among others.
  • relational databases Oracle database, mySQL database, Cassandra database, etc.
  • spreadsheets XML files
  • text file among others.
  • a database schema of a database system is its structure described in a formal language supported by the database management system.
  • FIG. 1 illustrates a process for identifying and compiling information relating to an entity that is implemented by a computing system.
  • the system receives one or more search queries relating to an entity 101 . If the entity is a person, the search queries include certain basic attributes of the person, such as name, social security number, date of birth, email address, address, or passport number, or their combinations.
  • the searches can be carried out on one or multiple platforms or multiple databases 110 .
  • Each database may have different schema, structure, or content of information. Nevertheless, each entry in a database that can be retrieved as relating to the entity can be commonly referred to as a “record”.
  • a record for the entity may be a historic record of an action, such as a financial transaction, associated with the entity, or simply some basic information about the entity, e.g., being listed as a registered voter.
  • the searches can be carried out asynchronically or synchronically, and in any manner suitable for the queries and the databases. In some embodiments, the search queries can be broadened up to maximize the chance of returning potentially relevant records, such as using variations of the attributes or wild-cards.
  • the multiple databases are parts of a de-centralized database system where a systemically designed database is partitioned into multiple portions each of which can be hosted at a different location.
  • at least some of the databases are autonomous and thereby constituting part of a federated database system.
  • a federated database system maps multiple autonomous and disparate databases into a single federated database.
  • the disparate databases can be interconnected via a computer network and may be geographically decentralized. In some embodiments, there is no data integration between the disparate databases.
  • the multiple databases are independently hosted and managed, and may have different access control.
  • the multiple databases may be databases owned or managed by different banks, companies, or government agencies.
  • the present technology in some embodiments, is configured to interface with each of the disparate or independent databases to identify information that may be related to the entity.
  • the searches will produce a number of records as potentially relating to the entity, e.g., records r 1 through r 7 , as shown in FIG. 1 .
  • records r 1 through r 7 Upon retrieval of these records, which can be optionally saved in a computer medium or temporarily kept in the memory 120 , the system can conduct certain basic and automated analysis of the records.
  • records r 1 , r 2 and r 3 have close-to-perfect matches to the attributes of the entity, e.g., with total match of name, data of birth and social security number. Such records are considered to match the attributes of the entities used in the search query with a substantial confidence.
  • “Matching with a substantial confidence” as used herein means that the similarity between one or more attributes (e.g., name, address, social security number) of a record in a search result and one or more attributes used in the search query is statistically significant.
  • matching with substantial confidence requires a perfect match of at least one attribute.
  • matching with substantial confidence requires a perfect match of at least two attributes.
  • matching with substantial confidence requires a perfect match of at least one or two attribute and a partial match of another attribute with a mismatch of no more than one character (e.g., letter or digit), or no more than two characters.
  • a “unified record” as used herein refers to a record generated by the system by compiling information from two or more records in the search results.
  • the compilation can collapse attributes that are identical in all of the two or more records. For instance, if every record has the same social security number, then only one social security number needs to be saved in the unified record. On the other hand, for attributes that have variations, (e.g., different addresses or different spelling of first name), the variants are all saved in the unified record.
  • the system conducts automatic compilation for records r 1 -r 3 .
  • the compiled record likely includes additional information about the entity that was not apparent before the search. For instance, the search by social security number to return aliases or secondary addresses of the entity. The search by name may return fraudulent password numbers used by the entity. Such additional information (see underlined words in 122 ) can then be used for a second round of searches. It is likely that the first round of search will return a large number of records, some of important ones of which may be presented late in the list or buried in the list. The second round of search can likely bring such records back to a user's attention. Without limitation, a third, fourth or even more rounds of searches can be carried out to further enrich or refine the information relating to the entity.
  • the system now has collected information relating to the entity, with each record providing potentially relevant connection to activities of the entity, such as suspicious financial activities.
  • the entity is now represented by all the information compiled from the search results relating to the entity.
  • Each record represents a “potential connection” between the entity and the activity.
  • the system can present the records on a user interface (e.g., 131 ) to a user.
  • a user interface e.g., 131
  • the system can rank the records before presenting them on the user interface. The ranking method may be dependent upon the type of the entity. For instance, for a system that is set up to detect suspicious activities, a record that includes an activity will be ranked higher than a record that only includes basic information about the entity.
  • the interface can optionally further enable the user to mark or annotate the records (as illustrated in FIG. 1 ).
  • the user can mark a record, say r 6 , as not relevant to the entity by checking the content of the record, and thereby allowing the record to be deleted from the system.
  • the user may also mark a record, say r 11 , as highly relevant to the entity and includes important information for further investigation. Such marking also confirms the record as a potential connection to the entity.
  • this technology provides an efficient approach to build a comprehensive repository of information relating to an entity of interest, and establish potential connections between the entity and activities or transactions of value for further investigation.
  • the system can optionally record the annotation in a library for future use or to be shared with other users.
  • the annotations can also serve as feedback for the search and be used to improve the search algorithm.
  • the annotation can further trigger another round of search with information identified by the user as highly important or relevant.
  • Search alerts can be generated automatically or upon user request, in some embodiments.
  • the user upon completion of a search for an entity, the user can request to save the search as an alert.
  • the search will be automated by the system at a default schedule (e.g., daily or weekly) or at a schedule set by the user. If the schedule search returns information that has a timestamp newer than the previous search time, then an alert is sent (e.g., by email) to the user with the new information. Alternatively, in another example, the new search result is compared to the previous one and any new information is included in the alert.
  • the search can be automatically updated, after each search, to include newly discovered information relating to the entity, such as information with a high confidence and/or relevant level.
  • the update requires confirmation or optimization by the user.
  • a search can be requested based on a complex subject.
  • complex subject refers to a collection of different types of entities, such as a case report, a transaction record, or security log.
  • the transaction record may include identifying information of multiple persons (e.g., name and SSN), multiple accounts (e.g., account type and number), and locations of transactions (e.g., address, zip code, and branch name). Each of these entities can be subject to a search.
  • the system when a user enters such a complex subject for the search, the system is configured to identify and extract some or all of the entities included in the complex subject and conducts a search for each of the entities. Upon completion of all the searches, the system can compile the search results and present them to the user optionally in a single feed. In some embodiments, the system can use information from the complex subject and/or the search results to understand the relationship and thereby compile and/or present the search results taking advantage of the knowledge of such relationship.
  • FIG. 2 illustrates a flowchart of an example method 200 for identifying and compiling information relating to an entity for investigative analysis, according to various embodiments of the present disclosure.
  • the method 200 may be implemented in various environments including, for example, the system of FIG. 3 .
  • the operations of method 200 presented below are intended to be illustrative. Depending on the implementation, the example method 200 may include additional, fewer, or alternative steps performed in various orders or in parallel.
  • the example method 200 may be implemented in various computing systems or devices including one or more processors.
  • a computer system receives name, identification or another basic characteristic of an entity as keywords for a search for information relating to the entity.
  • the system generates one or more search quires optionally with variations of the keywords, and then at block 205 , the system conducts searches in one or more data sources with the search queries.
  • search results would have near-perfect match to the basic information of the entity. Such matches are identified and compiled to form a compiled record representing the entity (block 207 ).
  • additional information from the search results is selected to be used for a second round of searches, followed by the actual searches (block 211 ). With the second round of searches, all the search results can be presented to a user for further investigation and analysis.
  • the search results are ranked to facilitate the user analysis (block 213 ).
  • the system can update the search results with respect to potential connection to the entity (block 215 ).
  • the techniques described herein are implemented by one or more special-purpose computing devices.
  • the special-purpose computing devices may be hard-wired to perform the techniques, or may include circuitry or digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques.
  • the special-purpose computing devices may be desktop computer systems, server computer systems, portable computer systems, handheld devices, networking devices or any other device or combination of devices that incorporate hard-wired and/or program logic to implement the techniques.
  • Computing device(s) are generally controlled and coordinated by operating system software, such as iOS, Android, Chrome OS, Windows XP, Windows Vista, Windows 7 , Windows 8 , Windows Server, Windows CE, Unix, Linux, SunOS, Solaris, iOS, Blackberry OS, VxWorks, or other compatible operating systems.
  • operating system software such as iOS, Android, Chrome OS, Windows XP, Windows Vista, Windows 7 , Windows 8 , Windows Server, Windows CE, Unix, Linux, SunOS, Solaris, iOS, Blackberry OS, VxWorks, or other compatible operating systems.
  • the computing device may be controlled by a proprietary operating system.
  • Conventional operating systems control and schedule computer processes for execution, perform memory management, provide file system, networking, I/O services, and provide a user interface functionality, such as a graphical user interface (“GUI”), among other things.
  • GUI graphical user interface
  • FIG. 3 is a block diagram that illustrates a computer system 300 upon which any of the embodiments described herein may be implemented.
  • the computer system 300 includes a bus 302 or other communication mechanism for communicating information, one or more hardware processors 304 coupled with bus 302 for processing information.
  • Hardware processor(s) 304 may be, for example, one or more general purpose microprocessors.
  • the computer system 300 also includes a main memory 306 , such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 302 for storing information and instructions to be executed by processor 304 .
  • Main memory 306 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 304 .
  • Such instructions when stored in storage media accessible to processor 304 , render computer system 300 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • the computer system 300 further includes a read only memory (ROM) 308 or other static storage device coupled to bus 302 for storing static information and instructions for processor 304 .
  • ROM read only memory
  • a storage device 310 such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 302 for storing information and instructions.
  • the computer system 300 may be coupled via bus 302 to a display 312 , such as a cathode ray tube (CRT) or LCD display (or touch screen), for displaying information to a computer user.
  • a display 312 such as a cathode ray tube (CRT) or LCD display (or touch screen)
  • An input device 314 is coupled to bus 302 for communicating information and command selections to processor 304 .
  • cursor control 316 is Another type of user input device, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 304 and for controlling cursor movement on display 312 .
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • a first axis e.g., x
  • a second axis e.g., y
  • the same direction information and command selections as cursor control may be implemented via receiving touches on a touch screen without a cursor.
  • the computing system 300 may include a user interface module to implement a GUI that may be stored in a mass storage device as executable software codes that are executed by the computing device(s).
  • This and other modules may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • module refers to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++.
  • a software module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software modules may be callable from other modules or from themselves, and/or may be invoked in response to detected events or interrupts.
  • Software modules configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution).
  • Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device.
  • Software instructions may be embedded in firmware, such as an EPROM.
  • hardware modules may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.
  • the modules or computing device functionality described herein are preferably implemented as software modules, but may be represented in hardware or firmware. Generally, the modules described herein refer to logical modules that may be combined with other modules or divided into sub-modules despite their physical organization or storage.
  • the computer system 300 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 300 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 300 in response to processor(s) 304 executing one or more sequences of one or more instructions contained in main memory 306 . Such instructions may be read into main memory 306 from another storage medium, such as storage device 310 . Execution of the sequences of instructions contained in main memory 306 causes processor(s) 304 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • non-transitory media refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media.
  • Non-volatile media includes, for example, optical or magnetic disks, such as storage device 310 .
  • Volatile media includes dynamic memory, such as main memory 306 .
  • non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
  • Non-transitory media is distinct from but may be used in conjunction with transmission media.
  • Transmission media participates in transferring information between non-transitory media.
  • transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 302 .
  • transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 304 for execution.
  • the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to computer system 300 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
  • An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 302 .
  • Bus 302 carries the data to main memory 306 , from which processor 304 retrieves and executes the instructions.
  • the instructions received by main memory 306 may retrieve and execute the instructions.
  • the instructions received by main memory 306 may optionally be stored on storage device 310 either before or after execution by processor 304 .
  • the computer system 300 also includes a communication interface 318 coupled to bus 302 .
  • Communication interface 318 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks.
  • communication interface 318 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 318 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN).
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 318 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • a network link typically provides data communication through one or more networks to other data devices.
  • a network link may provide a connection through local network to a host computer or to data equipment operated by an Internet Service Provider (ISP).
  • ISP Internet Service Provider
  • the ISP in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet”.
  • Internet Internet
  • Local network and Internet both use electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link and through communication interface 318 which carry the digital data to and from computer system 300 , are example forms of transmission media.
  • the computer system 300 can send messages and receive data, including program code, through the network(s), network link and communication interface 318 .
  • a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 318 .
  • the received code may be executed by processor 304 as it is received, and/or stored in storage device 310 , or other non-volatile storage for later execution.
  • Conditional language such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

Systems and methods are provided for identifying and compiling information relating to an entity for investigative analysis. The system may comprise one or more processors and a memory storing instructions that, when executed by the one or more processors, cause the system to search, in one or more data sources, with a plurality of known characteristics of an entity to obtain a first plurality of records, identify from the first plurality of records a subset of records that match the known characteristics with a substantial confidence, compile the subset of records to form a unified record representing the entity and conduct a second search with information from the unified record to obtain a second plurality of search results.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. application Ser. No. 15/590,956, filed on May 9, 2017, which claims the benefit under 35 U.S.C. § 119(e) of the United States Provisional Application Ser. No. 62/434,936 filed Dec. 15, 2016, the content of which is hereby incorporated by reference in its entirety.
  • FIELD OF THE INVENTION
  • This disclosure relates to approaches for identifying and compiling information relating to an entity for investigative analysis.
  • BACKGROUND
  • Collection of all available digital records of an entity is useful for investigation, such as by the police department or a potential employer as part of a background check. There is no centralized database that includes all of the relevant records. Further, each record may not be associated with the complete or correct identifying information for the entity, and different databases may have entirely different structure or ontology, making collection of such record challenging.
  • SUMMARY
  • Various embodiments of the present disclosure can include systems, methods, and non-transitory computer readable media configured to identify and compile information relating to an entity for investigative analysis. In some embodiments, the systems, methods, and non-transitory computer readable media are configured to implement a method that entails searching, in one or more data sources, with a plurality of known characteristics of an entity to obtain a first plurality of records, identifying, from the first plurality of records, a subset of records that match the known characteristics with a substantial confidence, compiling the subset of records to form a unified record representing the entity, and conducting a second search with information from the unified record to obtain a second plurality of search results.
  • In some embodiments, the method further comprises presenting, on an interface, at least part of the records from the first plurality and the second plurality, wherein the interface is configured to allow a user to annotate the records. In some embodiments, the annotation comprises confirmation that a record is associated with the entity. In some embodiments, the method further comprises storing the annotation in a library in a non-transitory medium. In some embodiments, the method further comprises ranking the records before presenting the records on the interface.
  • In some embodiments, the records of the subset are those that have perfect match to the known characteristics. In some embodiments, the method further comprises generating variations of the known characteristics as additional queries for the search.
  • These and other features of the systems, methods, and non-transitory computer readable media disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for purposes of illustration and description only and are not intended as a definition of the limits of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Certain features of various embodiments of the present technology are set forth with particularity in the appended claims. A better understanding of the features and advantages of the technology will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
  • FIG. 1 illustrates a procedure for obtaining, compiling and presenting information relating to an entity for user analysis.
  • FIG. 2 illustrates a flowchart of an example method for obtaining, compiling and presenting information relating to an entity for user analysis.
  • FIG. 3 is a block diagram that illustrates a computer system upon which any of the embodiments described herein may be implemented.
  • The figures depict various embodiments of the disclosed technology for purposes of illustration only, wherein the figures use like reference numerals to identify like elements. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated in the figures can be employed without departing from the principles of the disclosed technology described herein.
  • DETAILED DESCRIPTION
  • Information relating to entities, such as a subject of an investigation, is scattered in different databases. Different records of an entity, such as financial transactions, are often stored individually rather than collectively, which makes the retrieval, visualization and analysis difficult for end users. Moreover, the entities in each record may be identified with different identifications or characteristics of the entity. This further presents a challenge for identifying all relevant information for effective investigation of the entity. Also, redundant or duplicative information about the entity can present challenges for data management or even skew the analysis. A solution is needed for identifying and compiling all available information relating to the entity and enabling an investigator to conduct investigation with the information.
  • A claimed solution rooted in computer technology overcomes problems specifically arising in the realm of computer technology. In various implementations, a method entails collecting basic information (e.g., name, phone number, date of birth, social security number, email address and address) of an entity and generating one or more search queries. The search queries are used to search against a plurality of data sources for exact as well as approximate matches. The different data sources may be de-centralized, or federated where no master data management systems or defined standards are employed to manage the data sources. All of the matches are collected, and those that most likely relate to the entity (e.g., having perfect matches on name and social security number) are combined into a single record. Optionally, identical records can be merged to remove redundancy.
  • In some embodiments, additional searches can be formulated based on the initial search results. The search query can include the basic information of the entity as used in the previous step and can also include relevant information returned from the last search. All of the search results can be combined and presented, on a user interface, to an investigator. The search results can be ranked based on significance or relevance, facilitating analysis by the investigator. The interface can also enable the investigator to annotate the search results, and delete certain results as needed. Any annotation or change that the investigator makes can be optionally saved to a library, which can be shared with other investigators or archived for future use.
  • In some embodiments, all or part of the search results can be added back to a data repository serving to enrich the knowledge of the entity. The addition and accumulation of such added information can improve further searches of the entity. In some embodiments, an alert can be set up by a user such that a search can be carried out on a predetermined schedule and the search results can be presented to the user. In some embodiments, only new results are returned to the user. In some embodiments, the searches are automatically updated to incorporate additional information relating to the entity after such information becomes available from the searches.
  • The information identified and compiled as described represents a comprehensive collection of information relating to the entity and all the records of the final results represent potential connections between the entity and activities which may be worth further investigation. The present technology, therefore, provides a fast, automated, convenient, and comprehensive method to compile information from different data sources relating to an entity, and to present to an investigator potential connections between data records for investigation.
  • The term “entity” refers to any real world object that has attributes useful for identifying the object. An entity can be a person or an organization, and can also be an account, a place, or an event. Attributes for the entity include, for example, names, identification number, characteristics and address, without limitation.
  • The term “database” may refer to any data structure for storing and/or organizing data, including, but not limited to, relational databases (Oracle database, mySQL database, Cassandra database, etc.), spreadsheets, XML files, and text file, among others. In some embodiments, a database schema of a database system is its structure described in a formal language supported by the database management system.
  • FIG. 1 illustrates a process for identifying and compiling information relating to an entity that is implemented by a computing system. The system receives one or more search queries relating to an entity 101. If the entity is a person, the search queries include certain basic attributes of the person, such as name, social security number, date of birth, email address, address, or passport number, or their combinations.
  • The searches can be carried out on one or multiple platforms or multiple databases 110. Each database may have different schema, structure, or content of information. Nevertheless, each entry in a database that can be retrieved as relating to the entity can be commonly referred to as a “record”. A record for the entity may be a historic record of an action, such as a financial transaction, associated with the entity, or simply some basic information about the entity, e.g., being listed as a registered voter. The searches can be carried out asynchronically or synchronically, and in any manner suitable for the queries and the databases. In some embodiments, the search queries can be broadened up to maximize the chance of returning potentially relevant records, such as using variations of the attributes or wild-cards.
  • The multiple databases, in some embodiments, are parts of a de-centralized database system where a systemically designed database is partitioned into multiple portions each of which can be hosted at a different location. In some embodiments, at least some of the databases are autonomous and thereby constituting part of a federated database system. A federated database system maps multiple autonomous and disparate databases into a single federated database. The disparate databases can be interconnected via a computer network and may be geographically decentralized. In some embodiments, there is no data integration between the disparate databases.
  • In some embodiments, at least some of the multiple databases are independently hosted and managed, and may have different access control. For instance, the multiple databases may be databases owned or managed by different banks, companies, or government agencies. The present technology, in some embodiments, is configured to interface with each of the disparate or independent databases to identify information that may be related to the entity.
  • The searches will produce a number of records as potentially relating to the entity, e.g., records r1 through r7, as shown in FIG. 1. Upon retrieval of these records, which can be optionally saved in a computer medium or temporarily kept in the memory 120, the system can conduct certain basic and automated analysis of the records. In the example of FIG. 1, records r1, r2 and r3 have close-to-perfect matches to the attributes of the entity, e.g., with total match of name, data of birth and social security number. Such records are considered to match the attributes of the entities used in the search query with a substantial confidence. “Matching with a substantial confidence” as used herein means that the similarity between one or more attributes (e.g., name, address, social security number) of a record in a search result and one or more attributes used in the search query is statistically significant. In one embodiment, matching with substantial confidence requires a perfect match of at least one attribute. In another embodiment, matching with substantial confidence requires a perfect match of at least two attributes. In one embodiment, matching with substantial confidence requires a perfect match of at least one or two attribute and a partial match of another attribute with a mismatch of no more than one character (e.g., letter or digit), or no more than two characters.
  • The records that are matched to the entity with a substantial confidence can be considered as belonging to the entity and thus all the information from the records can be combined into a unified record. Optionally, during the compilation, redundant information or records can be merged to reduce redundancy. A “unified record” as used herein refers to a record generated by the system by compiling information from two or more records in the search results. The compilation can collapse attributes that are identical in all of the two or more records. For instance, if every record has the same social security number, then only one social security number needs to be saved in the unified record. On the other hand, for attributes that have variations, (e.g., different addresses or different spelling of first name), the variants are all saved in the unified record.
  • As shown in FIG. 1, the system conducts automatic compilation for records r1-r3. The compiled record likely includes additional information about the entity that was not apparent before the search. For instance, the search by social security number to return aliases or secondary addresses of the entity. The search by name may return fraudulent password numbers used by the entity. Such additional information (see underlined words in 122) can then be used for a second round of searches. It is likely that the first round of search will return a large number of records, some of important ones of which may be presented late in the list or buried in the list. The second round of search can likely bring such records back to a user's attention. Without limitation, a third, fourth or even more rounds of searches can be carried out to further enrich or refine the information relating to the entity.
  • With the two or more rounds of searches, the system now has collected information relating to the entity, with each record providing potentially relevant connection to activities of the entity, such as suspicious financial activities. In this context, the entity is now represented by all the information compiled from the search results relating to the entity. Each record represents a “potential connection” between the entity and the activity.
  • Building and confirming the potential connections can benefit from human input. To this end, the system can present the records on a user interface (e.g., 131) to a user. To further facilitate user analysis, the system can rank the records before presenting them on the user interface. The ranking method may be dependent upon the type of the entity. For instance, for a system that is set up to detect suspicious activities, a record that includes an activity will be ranked higher than a record that only includes basic information about the entity.
  • When the records, preferably sorted, are presented on the user interface, the interface can optionally further enable the user to mark or annotate the records (as illustrated in FIG. 1). The user can mark a record, say r6, as not relevant to the entity by checking the content of the record, and thereby allowing the record to be deleted from the system. The user may also mark a record, say r11, as highly relevant to the entity and includes important information for further investigation. Such marking also confirms the record as a potential connection to the entity.
  • With the automated search process carried out by the system and the further input facilitated by the interface provided by the system, this technology provides an efficient approach to build a comprehensive repository of information relating to an entity of interest, and establish potential connections between the entity and activities or transactions of value for further investigation.
  • Further, upon receiving the annotations from the user, the system can optionally record the annotation in a library for future use or to be shared with other users. The annotations can also serve as feedback for the search and be used to improve the search algorithm. Yet the annotation can further trigger another round of search with information identified by the user as highly important or relevant.
  • Search alerts can be generated automatically or upon user request, in some embodiments. In one example, upon completion of a search for an entity, the user can request to save the search as an alert. Accordingly, in some embodiments, the search will be automated by the system at a default schedule (e.g., daily or weekly) or at a schedule set by the user. If the schedule search returns information that has a timestamp newer than the previous search time, then an alert is sent (e.g., by email) to the user with the new information. Alternatively, in another example, the new search result is compared to the previous one and any new information is included in the alert.
  • In some embodiments, the search can be automatically updated, after each search, to include newly discovered information relating to the entity, such as information with a high confidence and/or relevant level. In some embodiments, the update requires confirmation or optimization by the user.
  • In some embodiments, a search can be requested based on a complex subject. The term “complex subject” as used herein refers to a collection of different types of entities, such as a case report, a transaction record, or security log. Taking a transaction record as an example, the transaction record may include identifying information of multiple persons (e.g., name and SSN), multiple accounts (e.g., account type and number), and locations of transactions (e.g., address, zip code, and branch name). Each of these entities can be subject to a search.
  • In one embodiment, when a user enters such a complex subject for the search, the system is configured to identify and extract some or all of the entities included in the complex subject and conducts a search for each of the entities. Upon completion of all the searches, the system can compile the search results and present them to the user optionally in a single feed. In some embodiments, the system can use information from the complex subject and/or the search results to understand the relationship and thereby compile and/or present the search results taking advantage of the knowledge of such relationship.
  • FIG. 2 illustrates a flowchart of an example method 200 for identifying and compiling information relating to an entity for investigative analysis, according to various embodiments of the present disclosure. The method 200 may be implemented in various environments including, for example, the system of FIG. 3. The operations of method 200 presented below are intended to be illustrative. Depending on the implementation, the example method 200 may include additional, fewer, or alternative steps performed in various orders or in parallel. The example method 200 may be implemented in various computing systems or devices including one or more processors.
  • At block 201, a computer system receives name, identification or another basic characteristic of an entity as keywords for a search for information relating to the entity. At block 203, the system generates one or more search quires optionally with variations of the keywords, and then at block 205, the system conducts searches in one or more data sources with the search queries.
  • Some of the search results would have near-perfect match to the basic information of the entity. Such matches are identified and compiled to form a compiled record representing the entity (block 207). At block 209, additional information from the search results is selected to be used for a second round of searches, followed by the actual searches (block 211). With the second round of searches, all the search results can be presented to a user for further investigation and analysis. Optionally, the search results are ranked to facilitate the user analysis (block 213). Upon receiving user input, the system can update the search results with respect to potential connection to the entity (block 215).
  • Hardware Implementation
  • The techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include circuitry or digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, server computer systems, portable computer systems, handheld devices, networking devices or any other device or combination of devices that incorporate hard-wired and/or program logic to implement the techniques.
  • Computing device(s) are generally controlled and coordinated by operating system software, such as iOS, Android, Chrome OS, Windows XP, Windows Vista, Windows 7, Windows 8, Windows Server, Windows CE, Unix, Linux, SunOS, Solaris, iOS, Blackberry OS, VxWorks, or other compatible operating systems. In other embodiments, the computing device may be controlled by a proprietary operating system. Conventional operating systems control and schedule computer processes for execution, perform memory management, provide file system, networking, I/O services, and provide a user interface functionality, such as a graphical user interface (“GUI”), among other things.
  • FIG. 3 is a block diagram that illustrates a computer system 300 upon which any of the embodiments described herein may be implemented. The computer system 300 includes a bus 302 or other communication mechanism for communicating information, one or more hardware processors 304 coupled with bus 302 for processing information. Hardware processor(s) 304 may be, for example, one or more general purpose microprocessors.
  • The computer system 300 also includes a main memory 306, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 302 for storing information and instructions to be executed by processor 304. Main memory 306 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 304. Such instructions, when stored in storage media accessible to processor 304, render computer system 300 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • The computer system 300 further includes a read only memory (ROM) 308 or other static storage device coupled to bus 302 for storing static information and instructions for processor 304. A storage device 310, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 302 for storing information and instructions.
  • The computer system 300 may be coupled via bus 302 to a display 312, such as a cathode ray tube (CRT) or LCD display (or touch screen), for displaying information to a computer user. An input device 314, including alphanumeric and other keys, is coupled to bus 302 for communicating information and command selections to processor 304. Another type of user input device is cursor control 316, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 304 and for controlling cursor movement on display 312. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane. In some embodiments, the same direction information and command selections as cursor control may be implemented via receiving touches on a touch screen without a cursor.
  • The computing system 300 may include a user interface module to implement a GUI that may be stored in a mass storage device as executable software codes that are executed by the computing device(s). This and other modules may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • In general, the word “module,” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++. A software module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software modules may be callable from other modules or from themselves, and/or may be invoked in response to detected events or interrupts. Software modules configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution). Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors. The modules or computing device functionality described herein are preferably implemented as software modules, but may be represented in hardware or firmware. Generally, the modules described herein refer to logical modules that may be combined with other modules or divided into sub-modules despite their physical organization or storage.
  • The computer system 300 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 300 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 300 in response to processor(s) 304 executing one or more sequences of one or more instructions contained in main memory 306. Such instructions may be read into main memory 306 from another storage medium, such as storage device 310. Execution of the sequences of instructions contained in main memory 306 causes processor(s) 304 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • The term “non-transitory media,” and similar terms, as used herein refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 310. Volatile media includes dynamic memory, such as main memory 306. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
  • Non-transitory media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between non-transitory media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 302. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 304 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 300 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 302. Bus 302 carries the data to main memory 306, from which processor 304 retrieves and executes the instructions. The instructions received by main memory 306 may retrieve and execute the instructions. The instructions received by main memory 306 may optionally be stored on storage device 310 either before or after execution by processor 304.
  • The computer system 300 also includes a communication interface 318 coupled to bus 302. Communication interface 318 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks. For example, communication interface 318 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 318 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN). Wireless links may also be implemented. In any such implementation, communication interface 318 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • A network link typically provides data communication through one or more networks to other data devices. For example, a network link may provide a connection through local network to a host computer or to data equipment operated by an Internet Service Provider (ISP). The ISP in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet”. Local network and Internet both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link and through communication interface 318, which carry the digital data to and from computer system 300, are example forms of transmission media.
  • The computer system 300 can send messages and receive data, including program code, through the network(s), network link and communication interface 318. In the Internet example, a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 318.
  • The received code may be executed by processor 304 as it is received, and/or stored in storage device 310, or other non-volatile storage for later execution.
  • Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code modules executed by one or more computer systems or computer processors comprising computer hardware. The processes and algorithms may be implemented partially or wholly in application-specific circuitry.
  • The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed example embodiments.
  • Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.
  • Any process descriptions, elements, or blocks in the flow diagrams described herein and/or depicted in the attached figures should be understood as potentially representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process. Alternate implementations are included within the scope of the embodiments described herein in which elements or functions may be deleted, executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those skilled in the art.
  • It should be emphasized that many variations and modifications may be made to the above-described embodiments, the elements of which are to be understood as being among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure. The foregoing description details certain embodiments of the invention. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention can be practiced in many ways. As is also stated above, it should be noted that the use of particular terminology when describing certain features or aspects of the invention should not be taken to imply that the terminology is being re-defined herein to be restricted to including any specific characteristics of the features or aspects of the invention with which that terminology is associated. The scope of the invention should therefore be construed in accordance with the appended claims and any equivalents thereof.

Claims (20)

1. A method being implemented by a computing system including one or more physical processors and storage media storing machine-readable instructions, the method comprising:
searching, in one or more data sources, with a plurality of known characteristics of an entity to obtain a first plurality of records;
identifying, from the first plurality of records, a subset of records that match the known characteristics with a substantial confidence;
compiling the subset of records to form a unified record representing the entity; and
presenting, on an interface, at least part of the first plurality of records, wherein the interface is configured to allow a user to annotate the at least part of the first plurality of records;
receiving one or more annotations made to the least part of the first plurality of records, the one or more annotations including at least one marking of a record as any relevant or not relevant;
conducting a subsequent search to obtain a second plurality of search results, the subsequent search based on information from the unified record and the one or more annotations made to the at least part of the first plurality of records, each result of the subsequent search representing a potential connection between the entity and a corresponding activity of the entity.
2. The method of claim 1, wherein the receiving the one or more annotations triggers the conducting the subsequent search to obtain a second plurality of search results.
3. The method of claim 1, wherein marking a record as not relevant causes that record to be excluded from the second plurality of search results.
4. The method of claim 2, further comprising storing the annotation in a library in a non-transitory medium.
5. The method of claim 2, further comprising ranking the records before presenting the records on the interface.
6. The method of claim 1, wherein the records of the subset have perfect match to the known characteristics.
7. The method of claim 1, further comprising generating variations of the known characteristics as additional queries for the search.
8. A system comprising:
one or more processors; and
memory storing instructions that, when executed by the one or more processors, cause the system to perform:
searching, in one or more data sources, with a plurality of known characteristics of an entity to obtain a first plurality of records;
identifying, from the first plurality of records, a subset of records that match the known characteristics with a substantial confidence;
compiling the subset of records to form a unified record representing the entity; and
presenting, on an interface, at least part of the first plurality of records, wherein the interface is configured to allow a user to annotate the at least part of the first plurality of records;
receiving one or more annotations made to the least part of the first plurality of records, the one or more annotations including at least one marking of a record as any relevant or not relevant;
conducting a subsequent search to obtain a second plurality of search results, the subsequent search based on information from the unified record and the one or more annotations made to the at least part of the first plurality of records, each result of the subsequent search representing a potential connection between the entity and a corresponding activity of the entity.
9. The system of claim 8, wherein the receiving the one or more annotations triggers the conducting the subsequent search to obtain a second plurality of search results.
10. The system of claim 8, wherein marking a record as not relevant causes that record to be excluded from the second plurality of search results.
11. The system of claim 9, wherein the instructions further cause the system to store the annotation in a library in a non-transitory medium.
12. The system of claim 9, wherein the instructions further cause the system to rank the records before presenting the records on the interface.
13. The system of claim 8, wherein the records of the subset have perfect match to the known characteristics.
14. The system of claim 8, wherein the instructions further cause the system to generate variations of the known characteristics as additional queries for the search.
15. A non-transitory computer readable medium comprising instructions that, when executed, cause one or more processors to perform:
searching, in one or more data sources, with a plurality of known characteristics of an entity to obtain a first plurality of records;
identifying, from the first plurality of records, a subset of records that match the known characteristics with a substantial confidence;
compiling the subset of records to form a unified record representing the entity; and
presenting, on an interface, at least part of the first plurality of records, wherein the interface is configured to allow a user to annotate the at least part of the first plurality of records;
receiving one or more annotations made to the least part of the first plurality of records, the one or more annotations including at least one marking of a record as any relevant or not relevant;
conducting a subsequent search to obtain a second plurality of search results, the subsequent search based on information from the unified record and the one or more annotations made to the at least part of the first plurality of records, each result of the subsequent search representing a potential connection between the entity and a corresponding activity of the entity.
16. The non-transitory computer readable medium of claim 15, wherein the instructions further cause the processors to present, on an interface, at least part of the records from the first plurality and the second plurality, wherein the interface is configured to allow a user to annotate the records.
17. The non-transitory computer readable medium of claim 16, wherein the annotation comprises confirmation that a record is associated with the entity.
18. The non-transitory computer readable medium of claim 16, wherein the instructions further cause the processors to store the annotation in a library in a non-transitory medium.
19. The non-transitory computer readable medium of claim 16, wherein the instructions further cause the processors to rank the records before presenting the records on the interface.
20. The non-transitory computer readable medium of claim 15, wherein the records of the subset have perfect match to the known characteristics.
US16/389,300 2016-12-15 2019-04-19 Identification and compiling of information relating to an entity Active 2039-02-19 US11475031B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/389,300 US11475031B2 (en) 2016-12-15 2019-04-19 Identification and compiling of information relating to an entity

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201662434936P 2016-12-15 2016-12-15
US15/590,956 US10311074B1 (en) 2016-12-15 2017-05-09 Identification and compiling of information relating to an entity
US16/389,300 US11475031B2 (en) 2016-12-15 2019-04-19 Identification and compiling of information relating to an entity

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/590,956 Continuation US10311074B1 (en) 2016-12-15 2017-05-09 Identification and compiling of information relating to an entity

Publications (2)

Publication Number Publication Date
US20190243840A1 true US20190243840A1 (en) 2019-08-08
US11475031B2 US11475031B2 (en) 2022-10-18

Family

ID=66673571

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/590,956 Active US10311074B1 (en) 2016-12-15 2017-05-09 Identification and compiling of information relating to an entity
US16/389,300 Active 2039-02-19 US11475031B2 (en) 2016-12-15 2019-04-19 Identification and compiling of information relating to an entity

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/590,956 Active US10311074B1 (en) 2016-12-15 2017-05-09 Identification and compiling of information relating to an entity

Country Status (1)

Country Link
US (2) US10311074B1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10311074B1 (en) * 2016-12-15 2019-06-04 Palantir Technologies Inc. Identification and compiling of information relating to an entity
US10713329B2 (en) * 2018-10-30 2020-07-14 Longsand Limited Deriving links to online resources based on implicit references
US11657100B2 (en) * 2020-10-29 2023-05-23 Kyndryl, Inc. Cognitively rendered event timeline display

Family Cites Families (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5875446A (en) 1997-02-24 1999-02-23 International Business Machines Corporation System and method for hierarchically grouping and ranking a set of objects in a query context based on one or more relationships
US5995973A (en) 1997-08-29 1999-11-30 International Business Machines Corporation Storing relationship tables identifying object relationships
US7062483B2 (en) 2000-05-18 2006-06-13 Endeca Technologies, Inc. Hierarchical data-driven search and navigation system and method for information retrieval
US20100281364A1 (en) 2005-01-11 2010-11-04 David Sidman Apparatuses, Methods and Systems For Portable Universal Profile
US6980984B1 (en) 2001-05-16 2005-12-27 Kanisa, Inc. Content provider systems and methods using structured data
AU2003298616A1 (en) 2002-11-06 2004-06-03 International Business Machines Corporation Confidential data sharing and anonymous entity resolution
US7657540B1 (en) 2003-02-04 2010-02-02 Seisint, Inc. Method and system for linking and delinking data records
US20040243613A1 (en) 2003-05-30 2004-12-02 Mohammad Pourheidari System and method for creating a custom view from information in a managed data store
KR101312190B1 (en) * 2004-03-15 2013-09-27 야후! 인크. Search systems and methods with integration of user annotations
US7899796B1 (en) 2004-11-23 2011-03-01 Andrew Borthwick Batch automated blocking and record matching
US20070130206A1 (en) 2005-08-05 2007-06-07 Siemens Corporate Research Inc System and Method For Integrating Heterogeneous Biomedical Information
US20080215557A1 (en) 2005-11-05 2008-09-04 Jorey Ramer Methods and systems of mobile query classification
US7672833B2 (en) 2005-09-22 2010-03-02 Fair Isaac Corporation Method and apparatus for automatic entity disambiguation
CN101145152B (en) 2006-09-14 2010-08-11 国际商业机器公司 System and method for automatically refining reality in specific context
US20080162544A1 (en) 2006-12-27 2008-07-03 Salesforce.Com, Inc. Systems and methods for implementing many object to object relationships in a multi-tenant environment
US7917489B2 (en) 2007-03-14 2011-03-29 Yahoo! Inc. Implicit name searching
US7996210B2 (en) 2007-04-24 2011-08-09 The Research Foundation Of The State University Of New York Large-scale sentiment analysis
US8271477B2 (en) 2007-07-20 2012-09-18 Informatica Corporation Methods and systems for accessing data
US8117208B2 (en) 2007-09-21 2012-02-14 The Board Of Trustees Of The University Of Illinois System for entity search and a method for entity scoring in a linked document database
US8566327B2 (en) 2007-12-19 2013-10-22 Match.Com, L.L.C. Matching process system and method
WO2009089487A1 (en) * 2008-01-11 2009-07-16 Drubner Jeffrey M Method and system for uniquely identifying a person to the exclusion of all others
US8972463B2 (en) * 2008-07-25 2015-03-03 International Business Machines Corporation Method and apparatus for functional integration of metadata
US20100114887A1 (en) 2008-10-17 2010-05-06 Google Inc. Textual Disambiguation Using Social Connections
US9454606B2 (en) * 2009-09-11 2016-09-27 Lexisnexis Risk & Information Analytics Group Inc. Technique for providing supplemental internet search criteria
US8719267B2 (en) 2010-04-19 2014-05-06 Alcatel Lucent Spectral neighborhood blocking for entity resolution
US8417696B2 (en) 2010-06-10 2013-04-09 Microsoft Corporation Contact information merger and duplicate resolution
US9336184B2 (en) 2010-12-17 2016-05-10 Microsoft Technology Licensing, Llc Representation of an interactive document as a graph of entities
US9245049B2 (en) 2011-02-16 2016-01-26 Empire Technology Development Llc Performing queries using semantically restricted relations
US20120246154A1 (en) 2011-03-23 2012-09-27 International Business Machines Corporation Aggregating search results based on associating data instances with knowledge base entities
US9317584B2 (en) 2011-12-30 2016-04-19 Certona Corporation Keyword index pruning
US8972336B2 (en) 2012-05-03 2015-03-03 Salesforce.Com, Inc. System and method for mapping source columns to target columns
EP2662782A1 (en) 2012-05-10 2013-11-13 Siemens Aktiengesellschaft Method and system for storing data in a database
US9183310B2 (en) 2012-06-12 2015-11-10 Microsoft Technology Licensing, Llc Disambiguating intents within search engine result pages
US20140258014A1 (en) 2013-03-05 2014-09-11 Google Inc. Entity-based searching with content selection
US8818892B1 (en) * 2013-03-15 2014-08-26 Palantir Technologies, Inc. Prioritizing data clusters with customizable scoring strategies
IN2013CH01237A (en) 2013-03-21 2015-08-14 Infosys Ltd
US9767127B2 (en) 2013-05-02 2017-09-19 Outseeker Corp. Method for record linkage from multiple sources
US10019519B2 (en) * 2013-10-30 2018-07-10 Gordon E. Seay Methods and systems for utilizing global entities in software applications
US9245057B1 (en) 2014-10-09 2016-01-26 Splunk Inc. Presenting a graphical visualization along a time-based graph lane using key performance indicators derived from machine data
US20160171507A1 (en) 2014-12-11 2016-06-16 Connectivity, Inc. Systems and Methods for Identifying Customers of Businesses Through Gathered Named Entity Data
US9348920B1 (en) 2014-12-22 2016-05-24 Palantir Technologies Inc. Concept indexing among database of documents using machine learning techniques
US9537504B1 (en) 2015-09-25 2017-01-03 Intel Corporation Heterogeneous compression architecture for optimized compression ratio
US10311074B1 (en) * 2016-12-15 2019-06-04 Palantir Technologies Inc. Identification and compiling of information relating to an entity
US10216811B1 (en) 2017-01-05 2019-02-26 Palantir Technologies Inc. Collaborating using different object models
US10235461B2 (en) 2017-05-02 2019-03-19 Palantir Technologies Inc. Automated assistance for generating relevant and valuable search results for an entity of interest

Also Published As

Publication number Publication date
US10311074B1 (en) 2019-06-04
US11475031B2 (en) 2022-10-18

Similar Documents

Publication Publication Date Title
US9449074B1 (en) Determining and extracting changed data from a data source
US11475031B2 (en) Identification and compiling of information relating to an entity
US11714869B2 (en) Automated assistance for generating relevant and valuable search results for an entity of interest
US20210382885A1 (en) Collaborating using different object models
US12099509B2 (en) Systems and methods for constraint driven database searching
US20230020057A1 (en) Systems and methods for context-based keyword searching
US11803532B2 (en) Integrated data analysis
US11860831B2 (en) Systems and methods for data entry
US11301499B2 (en) Systems and methods for providing an object platform for datasets
US11874849B2 (en) Systems and methods for creating a data layer based on content from data sources
US11615071B2 (en) Methods and systems for data synchronization
US11494454B1 (en) Systems and methods for searching a schema to identify and visualize corresponding data
US11829380B2 (en) Ontological mapping of data

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: PALANTIR TECHNOLOGIES INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BEARD, MITCHELL;CHANG, ALLEN;HAMMETT, CHRIS;AND OTHERS;SIGNING DATES FROM 20170816 TO 20180720;REEL/FRAME:050764/0523

AS Assignment

Owner name: ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT, CANADA

Free format text: SECURITY INTEREST;ASSIGNOR:PALANTIR TECHNOLOGIES INC.;REEL/FRAME:051709/0471

Effective date: 20200127

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:PALANTIR TECHNOLOGIES INC.;REEL/FRAME:051713/0149

Effective date: 20200127

AS Assignment

Owner name: PALANTIR TECHNOLOGIES INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052856/0382

Effective date: 20200604

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:PALANTIR TECHNOLOGIES INC.;REEL/FRAME:052856/0817

Effective date: 20200604

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: PALANTIR TECHNOLOGIES INC., CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ERRONEOUSLY LISTED PATENT BY REMOVING APPLICATION NO. 16/832267 FROM THE RELEASE OF SECURITY INTEREST PREVIOUSLY RECORDED ON REEL 052856 FRAME 0382. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:057335/0753

Effective date: 20200604

STPP Information on status: patent application and granting procedure in general

Free format text: PRE-INTERVIEW COMMUNICATION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

AS Assignment

Owner name: WELLS FARGO BANK, N.A., NORTH CAROLINA

Free format text: ASSIGNMENT OF INTELLECTUAL PROPERTY SECURITY AGREEMENTS;ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC.;REEL/FRAME:060572/0640

Effective date: 20220701

Owner name: WELLS FARGO BANK, N.A., NORTH CAROLINA

Free format text: SECURITY INTEREST;ASSIGNOR:PALANTIR TECHNOLOGIES INC.;REEL/FRAME:060572/0506

Effective date: 20220701

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE