US20150095290A1 - Method and device for identifying an application type of unknown data - Google Patents

Method and device for identifying an application type of unknown data Download PDF

Info

Publication number
US20150095290A1
US20150095290A1 US14/498,325 US201414498325A US2015095290A1 US 20150095290 A1 US20150095290 A1 US 20150095290A1 US 201414498325 A US201414498325 A US 201414498325A US 2015095290 A1 US2015095290 A1 US 2015095290A1
Authority
US
United States
Prior art keywords
data
column
field
application type
date
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/498,325
Inventor
Jad John Saliba
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Royal Bank of Canada
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US14/498,325 priority Critical patent/US20150095290A1/en
Publication of US20150095290A1 publication Critical patent/US20150095290A1/en
Assigned to MAGNET FORENSICS INC. reassignment MAGNET FORENSICS INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SALIBA, JAD JOHN
Assigned to MAGNET FORENSICS INVESTCO INC. reassignment MAGNET FORENSICS INVESTCO INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MAGNET FORENSICS INC.
Assigned to ROYAL BANK OF CANADA reassignment ROYAL BANK OF CANADA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MAGNET FORENSICS INVESTCO, INC.
Assigned to ROYAL BANK OF CANADA reassignment ROYAL BANK OF CANADA CORRECTIVE ASSIGNMENT TO CORRECT THE NATURE OF CONVEYANCE PREVIOUSLY RECORDED AT REEL: 057797 FRAME: 0493. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT. Assignors: MAGNET FORENSICS INVESTCO, INC.
Priority to US17/554,581 priority patent/US11868212B2/en
Assigned to MAGNET FORENSICS INC., MAGNET FORENSICS INVESTCO, INC. reassignment MAGNET FORENSICS INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: ROYAL BANK OF CANADA
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying

Definitions

  • Embodiments herein relate to methods, systems and devices for identifying an application type of unknown data, and in particular to methods, systems and devices for identifying an application type of unknown data stored on a computer readable medium such as, for example, a storage device.
  • Computing devices typically contain one or more computer readable media (e.g., memory, a hard disk drive, or a solid state drive) on which applications can store data.
  • computer readable media e.g., memory, a hard disk drive, or a solid state drive
  • data recovery may be performed when there is a failure of the computer readable media that prevents normal access to the data.
  • Data recovery may also be performed in the context of forensics. For example, a user may attempt to hide, delete, or obfuscate data on a computer readable medium so that the data may not be readily accessible to someone else. This may be particularly the case where the user is undertaking some illicit or otherwise improper activity and does not want such activity to be discovered. In such scenarios, law enforcement authorities or others may be interested to learn about a user's activities on a particular computing device by recovering data on that computing device.
  • data recovery may be performed to allow a party to know how a computing device had been or is being used by a given individual or a group of individuals.
  • employers may be interested to learn how their computer resources are being used by their employees.
  • parents and/or spouses might be interested to know how members of their families are using a computing device.
  • a method of identifying an application type of unknown data comprising: determining that the unknown data corresponds to database information, the database information comprising at least one table with at least one column; for a column of a table in the database information, determining if a column identifier of the column comprises a keyword associated with a particular application type; and if the column identifier comprises the keyword, identifying data stored in the database as belonging to an application that is of the particular application type.
  • the keyword is associated with a data field that is commonly used by an application of the particular application type.
  • the method includes sampling a data record in the table; and determining that data for the column in the data record is consistent with data for the data field that would belong to an application of the particular application type
  • the method includes converting the data in the column in the data record to each of a plurality of date/time formats; comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table.
  • the method includes storing a mapping between the data field and the column, the mapping being accessible during recovery of data in the database to indicate that data for the column in the table is associated with the data field.
  • the method includes displaying the mapping between the data field and the column in a user interface, wherein the user interface provides an option to select an alternative column of the table to be mapped to the data field; receiving input indicating that the data field is to be mapped to the alternative column; and storing an updated mapping for the data field, the updated mapping indicating that the data field is mapped to the alternative column.
  • the particular application type can include a messaging application
  • the data field that is commonly used comprises one of: a sender field, a recipient field, a message field, and a timestamp field.
  • the particular application type can include a web browser application
  • the data field that is commonly used comprises one of: an address field, a date field, a bookmark field, and a title field.
  • the particular application type can include a geographic location-enabled application
  • the data field that is commonly used comprises one of: a longitude field, a latitude field, a destination field, a direction field, and a route field.
  • the particular application type comprises a messaging application
  • the keyword comprises one of the following words: message, subject, text, msg, body, content, date, time, timestamp, from, sender, author, uid, member, to, receiver, conversation, recipient, partner, participant, and party.
  • the particular application type comprises a web browser application
  • the keyword comprises one of the following words: address, location, loc, URL, visited, date, bookmark, favorite and title.
  • the particular application type comprises a geographic location-enabled application
  • the keyword comprises one of the following words: coordinate, longitude, latitude, location, loc, home, destination, direction, and route.
  • a computing device comprising a processor and a memory storing instructions which, when executed by the processor, cause the processor to perform the methods described herein.
  • a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the methods described herein.
  • the computer readable medium is non-transitory.
  • a device comprising at least one processor adapted to perform any one or more of the methods as described herein.
  • FIG. 1 is a schematic diagram illustrating a computing device for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment
  • FIG. 2 is a flowchart illustrating a method for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment
  • FIG. 3 is an exemplary database table containing data that may be stored in a storage device, in accordance with one example embodiment
  • FIG. 4 is a flowchart illustrating a method for updating the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment
  • FIG. 5 is a screenshot of an example user interface that allows updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment
  • FIG. 6 is a screenshot of a data recovery user interface after the updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment.
  • embodiments of the methods described herein may be implemented in hardware or software, or a combination of both.
  • embodiments may be implemented in one or more computer programs executing on one or more programmable computing devices comprising at least one processor (e.g., a microprocessor), a data storage device (including in some cases volatile and non-volatile memory and/or data storage elements), at least one input device, and at least one output device.
  • the programmable computing devices may be a personal computer, laptop, personal data assistant, cellular telephone, smartphone device, tablet computer, and/or wireless device.
  • Program code is applied to input data to perform the functions described herein and generate output information.
  • the output information is applied to one or more output devices.
  • each program may be implemented in a high level procedural or object oriented programming and/or scripting language.
  • the programs can be implemented in assembly or machine language, if desired.
  • the language may be a compiled or interpreted language.
  • the computing devices and methods as described herein may also be implemented as a transitory or non-transitory computer-readable storage medium configured with a computer program, wherein the storage medium so configured causes a computing device to operate in a specific and predefined manner to perform at least some of the functions as described herein.
  • the medium may be provided in various forms, including one or more diskettes, compact disks, tapes, chips, wireline transmissions, satellite transmissions, internet transmission or downloadings, magnetic and electronic storage media, digital and analog signals, and the like.
  • the computer useable instructions may also be in various forms, including compiled and non-compiled code.
  • the subject system may be implemented as one or more software components stored on one or more computer servers that are accessible via one or more client machines in a client-server architecture.
  • the system can be considered to be a hosted software offering or a software service in a software-as-a-service deployment.
  • the embodiments of the present disclosure relate generally to methods of identifying an application type of unknown data that may be encountered during a data recovery process.
  • traditional data recovery processes there is typically a catalog of application data that indicates the data format of data stored by a given application on a storage device.
  • this application data is referenced to determine if the unknown data matches the data formats that are indicative of a particular application. If so, the unknown data is processed according to the identified application.
  • Such traditional processes may be inefficient because they require analysis of the data structure stored by an application before data associated with the application can be recovered.
  • applications for electronic devices are being developed at an increasingly fast rate.
  • mobile device applications have become more popular, the number and variety of applications available to users of mobile devices has expanded dramatically.
  • the ever-growing number of applications makes analyzing the data format of each application difficult. This results in data formats for many applications not being analyzed. If data stored by these applications are encountered during data recovery, such traditional data recovery processes may not be able to adequately recover the data.
  • At least some of the present embodiments may provide a computing device, system or method that allows unknown data (which does not already correspond to a data format of a known application) to still nevertheless be recovered.
  • various embodiments may recognize that even though the particular data format for an unanalyzed application may be unknown, the application type of the unanalyzed application may be identified based on certain characteristics of how unknown data is stored. In particular, some embodiments may recognize that certain keywords may be commonly used by applications of a particular application type as an identifier for a column of a table stored in a database.
  • some embodiments may recognize that these keywords are being used to identify a column that may indicate that the data for the column corresponds to a common data field stored by applications of the application type.
  • the keywords “author” or “from” may be commonly used in chat or instant messaging (IM) type applications to identify a “sender” data field for chat messages stored in the application.
  • IM instant messaging
  • a computing device 102 may be coupled to a storage device 104 on which the unknown data is stored.
  • the computing device 102 may include a processor 110 , a display 112 , a storage device interface 114 , and a memory 116 .
  • Processor 110 may be configured to perform the steps of the methods described herein. To perform these steps, in various embodiments, the processor 110 may execute instructions stored on memory 116 . For example, the instructions may be stored in the form of an application-type identification module 120 .
  • the application-type identification module 120 may be configured to retrieve keywords from a keyword store 122 , with the keywords being used to analyze the column identifiers of an unknown database 132 stored on the storage device 104 to determine if the data stored therein corresponds to a particular application-type.
  • the keyword store 122 is shown as being also stored on memory 116 . However, it will be understood that the keyword store 122 may be stored separately from the memory 116 (e.g., on a hard disk (not shown) or some other local or remote storage).
  • a mapping may be stored between the column and a data field that is commonly used by applications of the application type. These mappings may be stored in the column/data field mapping store 140 . In some cases, the initial or a previous mapping determined by the presence of the keywords may be subsequently updated through a user interface 142 provided by the application-type identification module 120 .
  • the user interface 142 may be displayed on display 112 .
  • Display 112 may be a suitable display device (e.g. a monitor, screen or touchscreen) coupled to the processor 110 .
  • the user interface 142 may allow the processor 110 to solicit input from a user that may confirm or update the mapping of a column to a data field, as stored in the column/data field mapping store 140 of memory 116 . Examples screenshots that may be shown in the user interface 142 are illustrated in FIGS. 5 and 6 , and will be discussed in greater detail below.
  • the storage device 104 may be coupled to computing device 102 through storage device interface 114 .
  • the storage device 104 may have application data stored thereon associated with various known and unknown applications.
  • the storage device 104 may include a file system 130 that contains a number of different files.
  • one or more of the files 130 may correspond to unknown database information stored in a database 132 .
  • At least some of the present embodiments are directed to methods of determining if the data stored within an unknown database 132 includes data that is of a particular application type, e.g., by determining if column identifiers for a database table 134 of the unknown database 132 includes certain keywords.
  • a data recovery process may attempt to analyze data that is intended to have been deleted from the storage device 104 (e.g., if data recovery is being performed in a forensics context by law enforcement officers). For example, a user may use the “delete” function of an operating system to delete a file, but such file may nevertheless still be recoverable despite having been “deleted”. This is because many operating systems and/or device driver software may not physically delete the data from the storage device 104 immediately when a command to delete such data is received. Instead, the addresses on the storage device 104 that stores such data may simply be marked as “unallocated” or “available”.
  • Such indications inform the operating system or other applications that these addresses are now available to store other data, so that the old data may subsequently be overwritten and thereby deleted when there are new data stored in such addresses. Since such data may not actually be overwritten, it is possible that the data flagged to be deleted may remain physically undeleted from the storage device 104 for an extended period of time even though it had been requested to be deleted by the application or the user (or both).
  • the computing device 102 may be provided in the form of personal computers, networked computers, portable computers, portable electronic devices, personal digital assistants, laptops, desktops, mobile phones, smart phones, tablets, and so on.
  • the processor 110 may be any type of processor, such as, for example, any type of general-purpose microprocessor or microcontroller, a digital signal processing (DSP) processor, an application-specific integrated circuit (ASIC), a programmable read-only memory (PROM), or any combination thereof.
  • DSP digital signal processing
  • ASIC application-specific integrated circuit
  • PROM programmable read-only memory
  • the memory 116 may include any type of computer memory that is located either internally or externally to the computing device 150 such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), electrically-erasable programmable read-only memory (EEPROM), a hard disk drive, a solid-state drive or any other form of suitable computer readable medium that may be used in electronic devices.
  • RAM random-access memory
  • ROM read-only memory
  • CDROM compact disc read-only memory
  • electro-optical memory magneto-optical memory
  • EPROM erasable programmable read-only memory
  • EEPROM electrically-erasable programmable read-only memory
  • hard disk drive a solid-state drive or any other form of suitable computer readable medium that may be used in electronic devices.
  • computing device 102 may include one or more input devices (not shown), such as a keyboard, mouse, camera, touch screen and/or a microphone, and may also include one or more output devices such as a display screen 112 and/or a speaker.
  • Computing device 102 may have a network interface for connecting to a network (not shown) in order to communicate with other components.
  • each of data stores 122 , 140 are illustrated in FIG. 1 separately, they can be stored together as separate tables within the same or multiple databases both locally and/or remotely. Additionally, other persistent storage methods such as encrypted files may also be used to provide persistent storage.
  • the storage device interface 114 of the computing device 102 may be any type of hardware or software interface that allows the computing device to communicate with the storage device 104 .
  • the storage device interface 114 may be one or more of the following interfaces: Parallel AT Attachment (PATA), Serial AT Attachment (SATA), Integrated Drive Electronics (IDE), Enhanced Integrated Drive Electronics (EIDE), Small Computer System Interface (SCSI), Universal Serial Bus (any version), FireWire and/or Thunderbolt.
  • PATA Parallel AT Attachment
  • SATA Serial AT Attachment
  • IDE Integrated Drive Electronics
  • EIDE Enhanced Integrated Drive Electronics
  • SCSI Small Computer System Interface
  • USB any version
  • FireWire and/or Thunderbolt any version
  • the storage device interface 114 may allow communication with a storage device 114 which is provided remotely (e.g., via Network-Attached Storage (NAS) and/or Storage Area Network (SAN) mechanisms) by acting as a client to a server that provides access to the storage device 104 .
  • NAS Network-Attached Storage
  • SAN Storage Area Network
  • the storage device 104 on which the unknown data is stored may include any type of the computer readable media that is to be the subject of the data analysis methods described herein, including the types of memory that are listed above as being options for the memory 116 .
  • FIG. 2 shown there generally as 200 is a method for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment.
  • the components of the computing device 102 such as the processor 110 , may be configured to execute one or more steps of the method 200 to identify an application type of unknown data.
  • the components of the computing device 102 such as the processor 110
  • the components of the computing device 102 may be configured to execute one or more steps of the method 200 to identify an application type of unknown data.
  • the components of the computing device 102 such as the processor 110 , may be configured to execute one or more steps of the method 200 to identify an application type of unknown data.
  • FIG. 3 shown generally as 134 in FIG. 3 .
  • the storage device 104 may be formatted according to a known file system that the application type identification module 120 (as shown in FIG. 1 ) is configured to access and conduct a search on.
  • the storage device 104 may have been formatted for use with various operating systems such as MicrosoftTM WindowsTM, LinuxTM, Apple OS XTM, Apple iOSTM and/or AndroidTM, and the file systems that can be processed by the application-type identification module 120 may include the following file systems: File Allocation Table 32 (FAT32), New Technology File System (NTFS), third extended file system (ext3), fourth extended file system (ext4), Hierarchical File System (HFS) and/or Hierarchical File System Plus (HFS+ or HFSX),
  • FAT32 File Allocation Table 32
  • NTFS New Technology File System
  • ext3 third extended file system
  • ext4 fourth extended file system
  • HFS Hierarchical File System
  • HFS+ or HFSX Hierarchical File System Plus
  • step 210 it is determined whether a file has been found. If no file has been filed (the ‘NO’ branch at step 210 ), then there may be no remaining files to be processed by method 200 . If at least one of the located files were determined to contain database information and also resulted in at least one mapping between a column and a data field, method 200 may then proceed to display the mapping user interface at step 215 .
  • computing device 102 may display the mapping user interface via display 112 (as shown in FIG. 1 ). As will be discussed below, the mapping user interface may allow a user using the computing device 102 to confirm or alter the mapping between a column and a data field determined by the method of FIG. 2 . The steps associated with providing such a user interface are shown in FIG. 4 .
  • step 215 may not be performed. The process may then end and a message may be displayed that indicates that the method of FIG. 2 was not able to identify the application type of the data stored on the storage device 104 .
  • the computing device 102 may open the file for read access and the data in the file may be read to attempt to determine if the file includes database information (step 220 ).
  • the information indicating that a file is a database may not necessarily be within the header portion of a file and instead, could be located in other parts of the data associated with the file.
  • the file extension of a file may indicate that a file is of a particular database type (e.g., the file extensions “.mdb”, “.mda”, or “.accdb” may indicate that the given file is a MicrosoftTM AccessTM database).
  • the determination made at step 225 may identify database information that does not necessarily include the entire contents of a given database.
  • the database information may include any portion of the contents of a database (e.g., as may be the case if the remaining contents of the database have already been overwritten or is otherwise unavailable).
  • step 225 If the file is determined not to include database information (the ‘NO’ branch at step 225 ), then method 200 returns to step 205 where it may again begin the process of determining whether there are any files remaining to be processed.
  • the application-type identification module 120 may also store data to a report database (not shown) indicating that the previous file was processed and had been identified as not containing a database.
  • step 225 If the file is determined to include database information (the ‘YES’ branch at step 225 ), then the tables accessible within the database are processed and method 200 proceeds to step 230 .
  • the method 200 determines if there are any tables that are still unprocessed in the database. If there are not (the ‘NO’ branch at step 230 ), then the method 200 returns to step 205 where it may again begin the process of determining whether there are any files remaining on the storage device 104 to be processed. If there are still tables to be processed (the ‘YES’ branch at step 230 ), then the next table in the database information is read at step 235 .
  • step 240 it is determined if there are any columns of the table that are still unprocessed. If there are not (the ‘NO’ branch at step 240 ), then the table has been processed and method 200 returns to step 230 to determine whether the database information contains any further tables that are still unprocessed.
  • the column identifier of the next column is read at step 245 .
  • the column identifier may be the name of the column used in a database.
  • the column identifier information may be retrieved from the schema of the database.
  • the column identifier may be found in header data of the database table, however the column identifier could also be located in other parts of the data associated with the column.
  • step 250 determines whether the column identifier read at step 245 includes a keyword associated with a particular application type. For example, in the case where the column identifier is a column name, this step may involve determining if the column names include keywords that are commonly used as column names by applications of the particular application type. The presence of such keywords in the column name may be taken as an indication that the database belongs to an application of that particular application type.
  • the present embodiments may be able to determine that unknown data is of an application type that includes, without limitation: a chat or instant messaging application type, a web browser application type, a navigation/geo-location application type, a file sharing application type, a social networking application type, a cloud application type, and an email application type.
  • an application type that includes, without limitation: a chat or instant messaging application type, a web browser application type, a navigation/geo-location application type, a file sharing application type, a social networking application type, a cloud application type, and an email application type.
  • keywords that may be used to identify data as belonging to an application of a chat or instant messaging application type may include the words: ‘message’, ‘subject’, ‘text’, ‘msg’, ‘body’, ‘content’, ‘date’, ‘time’, ‘timestamp’, ‘from’, ‘sender’, ‘author’, ‘uid’, ‘member’, ‘to’, ‘receiver’, ‘conversation’, ‘recipient’, ‘partner’, ‘participant’, and ‘party’.
  • keywords that may be used to identify data as belonging to an application as belonging to an application of a web browser type may include the words: ‘address’, ‘location’, ‘loc’, ‘URL’, ‘visited’, ‘date’, ‘bookmark’, ‘favorite’ and ‘title’.
  • keywords that may be used to identify data as belonging to an application of a navigation/geographic-location application type include: ‘coordinate’, ‘longitude’, ‘latitude’, ‘location’, ‘loc’, ‘home’, ‘destination’, ‘direction’, and ‘route’.
  • Data table 134 includes a table identifier 302 (e.g., a table name such as ‘Messages’ or some other alphanumeric identifier), and a series of columns with respective column identifiers 312 , 314 , 316 , 318 , 320 .
  • table identifier 302 e.g., a table name such as ‘Messages’ or some other alphanumeric identifier
  • a particular column with column identifier ‘Message_id’ 312 may be being processed.
  • control returns to step 240 where it is again determined whether there are any columns in the table that are still unprocessed.
  • a column identifier does include a keyword associated with an application type (the ‘YES’ branch at step 250 ) If it is determined that a column identifier does include a keyword associated with an application type (the ‘YES’ branch at step 250 ), then the process has determined that the file includes database information from a particular application type.
  • a given keyword used to identify data stored in a database as belonging to an application may also be associated with a data field that is commonly used by an application of the particular application type.
  • the keywords ‘message’, ‘subject’, ‘text’, ‘msg’, ‘body’ and ‘content’ may all be commonly used to identify a data field for the substance or “content” of the message in an application of a Chat/IM application type.
  • the ‘date’, ‘time’, and ‘timestamp’ keywords may be commonly used to identify a data/time field for the date and time of a message in an application of a Chat/IM application type.
  • the keywords ‘from’, ‘sender’, ‘author’, ‘uid’, and ‘member’ may be commonly used to identify a data field for the sender of a message in an application of a Chat/IM application type.
  • the keywords ‘to’, ‘receiver’, ‘conversation’, ‘recipient’, ‘partner’, ‘participant’, and ‘party’ may be commonly used to identify a data field for a recipient of a message in an application of a Chat/IM application type.
  • keywords associated with other data fields depending on the nature of the application type that is attempting to be determined.
  • the keywords ‘address’, ‘location’, ‘URL’, or ‘visited’ may all be commonly used to identify a data field for the address field in a web browser application.
  • a data record from the table may be sampled to determine if the data for that column in the data record is consistent with data that would belong (e.g., generated by, or otherwise associated with) to that data field by an application of the application type.
  • This may be performed in a number of ways. For example, in the case where the keyword corresponds to a data field that is supposed to contain the “content” of a message, the data for the data column in the data record may be compared to words in a dictionary that have been previously recognized as commonly being present in the content of a message.
  • these words may include ‘hello’, ‘hi’, ‘hey’, ‘bye’, ‘see’, ‘you’, ‘soon’ and/or ‘later’.
  • a regular expression can be created to recognize a string of text or numeric values as potentially being data of the given data field (e.g., a potential GPS coordinate).
  • heuristics may be developed based on historical experience of what data for a given data field contains, and these heuristics may be used to confirm that data for a given column appears as expected. It will be understood that various other ways of performing this step may be possible.
  • Data record 360 stored in data table 134 may be sampled to determine whether the data in a particular column of data record 360 , such as data 368 for the column with column identifier ‘text’ 318 , is consistent with data that is expected for the data field that the column has mapped to (i.e., the data field for the ‘content’ of the message). As shown, such data 368 includes the text “hey hey hey”.
  • the data 368 is consistent with that which is expected for the data field for the “content” of the message (e.g., as that which would have been generated by an application belonging to the Chat/IM application type).
  • the date/time format for the data of the column may be determined. For example, this may be performed by converting the date/time value to various formats and performing boundary checks to identify the particular date/time format that the data in the column is most likely to be formatted in. Specifically, this may involve converting the data in the column in the sampled data record to each of a plurality of date/time formats; comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table.
  • the reference date/time may be the present date/time (e.g., the date/time when data recovery process is being performed). Additionally or alternatively, the reference date/time may be a predefined date/time of a particular event (e.g., if data recovery is being performed for forensics purposes, the date/time of a criminal activity such as a murder).
  • only the post-conversion date/time data that is within a specific date/time window may be used to compare with each other. This may reduce the amount of comparisons that need to be performed if it is known that it is unlikely that the date/time data that is being sampled will be beyond the specific date/time window. For example, if data converted to a given date/time format results in data that is beyond the specific date/time window (e.g., earlier or later than the window), then it can be determined that such given date/time format is unlikely to be the correct date/time format that the data is actually formatted in.
  • a specific date/time window e.g., +/ ⁇ 7 years of the reference date/time
  • any date/time format may be supported.
  • some example date/time formats that may be supported include: Unix epoch time—seconds, Unix epoch time—milliseconds, PRTime, Mac Absolute Time, and/or Chrome/webkit time.
  • method 200 returns to step 240 where it is again determined whether there are any columns remaining in the table that are still unprocessed.
  • steps 255 to 260 are optional in that they need not be performed. When they are performed, however, the acts may provide a confirmation that the column with the column identifier having a keyword does in fact corresponds to the data field associated with the keyword. In this way, steps 255 and 260 may be considered a “sanity check” that verifies the conclusion arrived at in step 250 .
  • step 250 illustrated there are several columns for which the performance of steps 255 to 260 may result in the conclusion arrived at in step 250 not being confirmed.
  • the determination at step 250 may have been that because the column identifier contains the keyword ‘message’, that the column corresponds to the data field for the “content” of a message, as would be generated by an application belonging to the Chat/IM application type.
  • it may be determined that the data 362 for that column does not contain any of the words in the dictionary that have been previously recognized as indicating that the data constitutes the “content” of a message.
  • steps 255 to 260 may reduce the likelihood of erroneous mappings that are determined based on the results of step 250 alone.
  • Step 265 the data stored in the database is identified as belonging to an application of the particular application type identified in step 250 .
  • Step 265 may involve the application-type identification module 120 (as shown in FIG. 1 ) storing information indicating that the unknown data encountered on the storage device 104 belongs to the particular application type for which the unknown data was analyzed (e.g., a Chat/IM application type).
  • application-type identification module 120 may attempt to identify an application identifier (e.g., the name of the application). For example, this may be performed by using operating system application manifests (e.g., as may be separately found on file system 130 of storage device 104 , apart from the database 132 ), or via the text found in file path location (e.g., such text may be found in the file path of where the database 132 is located on the file system 130 of storage device 104 ). If the application name is available, when performing step 265 in FIG. 2 , an association between the name of the application and the application type as identified by the method of FIG. 2 may be stored.
  • an application identifier e.g., the name of the application. For example, this may be performed by using operating system application manifests (e.g., as may be separately found on file system 130 of storage device 104 , apart from the database 132 ), or via the text found in file path location (e.g., such text may be found in the file path of where the database 132 is located on
  • a mapping may be stored between the column being processed and the commonly used data field that the sampled data of the column was determined to be consistent with in step 260 .
  • the mapping may be stored in column/data field mapping store 140 .
  • the mappings may, for example, be subsequently referenced when recovering data from the unknown database 132 .
  • the mappings may also be subsequently used when recovering data from another storage device 104 containing unknown data, so that if similar database information is encountered, the mappings can be referenced to identify the type of data that is stored in the database.
  • step 270 method 200 returns to step 240 to determine whether there are any columns still unprocessed. If method 200 determines that there are no columns still unprocessed, and no tables still unprocessed at step 230 , it will return to step 205 where it will continue to search for files. If no files are found in step 210 , method 200 may proceed to step 215 and display a mapping user interface that may allow updating of the mapping between a column and a given data field.
  • the mapping user interface may display a list of the located databases tables and mappings of columns of such tables to the commonly used data fields for a given application type, so as to allow user input for final verification or remapping if necessary. A method and user interface for performing such remapping or verification is discussed below with respect to FIGS. 4 and 5 .
  • step 265 (to identify data stored in the database as belonging to an application of the particular application type) may be performed immediately after it has been determined that the column identifier 250 includes a keyword associated with the application type, and before a data record is sampled at steps 255 to 260 .
  • some of the steps of method 200 may be executed in parallel.
  • Parallel execution of some steps may be desirable in systems that have more than one processor or a processor that has more than one processing core.
  • one or more cores may be focused on executing step 250 to identify whether a column identifier contains a keyword associated with an application type, and one or more other cores may be focused on sampling a data record from the table to determine whether the data is consistent with data that would be generated by an application of the application type.
  • Parallel execution may also allow the computing device 102 to process more than one table or more than one column simultaneously.
  • method 200 may also include a step of checking a reference database (e.g., the column/data field mapping store 140 shown in FIG. 1 ) containing mappings and database information from previous executions of the method of FIG. 2 .
  • a reference database e.g., the column/data field mapping store 140 shown in FIG. 1
  • the previously stored mapping may be applied and the data from the database being processed may be automatically recovered to be presented later.
  • the method may return directly to step 205 to search for any remaining files that need to be processed, and the mapping need not be presented to the user for verification in accordance with method 400 .
  • FIG. 4 shown there generally as 400 is a method 400 for updating a mapping between a data field and a column, in accordance with one example embodiment.
  • FIG. 5 shows generally as 500 , a screenshot of an example user interface that allows updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment.
  • the components of the computing device 102 such as the processor 110 and the display 112 , may be configured to execute one or more steps of the method 400 .
  • method 400 may be initiated at step 215 of the method 200 of FIG. 2 where a mapping user interface is displayed.
  • the method 400 starts at step 405 where a mapping between a data field and a column of a table is displayed in a user interface.
  • the mapping may be retrieved from the column/data field mapping store 140 shown in FIG. 1 .
  • the mapping may have been stored as a result of step 270 of method 200 in FIG. 2 .
  • the user interface 500 is for an execution of method 200 of FIG. 2 that attempts to determine if unknown data corresponds to a Chat/IM application type. Accordingly, the user interface provides a number of data fields that are commonly used by an application of a Chat/IM application type. Specifically, there is a ‘sender’ data field 514 (shown in FIG. 5 with the text “Identified Sender Column”), a ‘recipient’ data field 516 (shown in FIG. 5 with the text “Identified Recipient Column”), a ‘content’ data field 518 (shown in FIG. 5 with the text “Identified Message Column”), and a ‘date/time’ data field 520 (shown in FIG. 5 with the text “Identified Date Column”).
  • a ‘sender’ data field 514 shown in FIG. 5 with the text “Identified Sender Column”
  • a ‘recipient’ data field 516 shown in FIG. 5 with the text “Identified Recipient Column”
  • a ‘content’ data field 518 shown in FIG. 5 with
  • the user interface 500 of FIG. 5 shows, as rows, each of the tables 134 that have been processed in FIG. 2 .
  • the user interface shows the column identifier of a column in the table that has been determined to be mapped to a particular data field 514 , 516 , 518 , 520 for the application type.
  • the user interface 500 shows information for the database table 134 illustrated in FIG. 3 . Referring simultaneously to FIG. 3 , it can be seen that the mapping (e.g., as may have been stored at step 270 in FIG. 2 ) has determined that the column with column identifier ‘author_id’ 314 corresponds to the ‘sender’ data field 514 .
  • the column with column identifier ‘conversation_id’ 316 has been determined to correspond to the ‘recipient’ data field 516
  • the column with column identifier ‘timestamp’ 320 has been determined to correspond to the ‘date/time’ data field 520
  • the column with the column identifier ‘text’ 318 has been determined to correspond to the ‘content’ data field 518 .
  • the user interface 500 may also display the determined application identifier 530 for a given table 134 (e.g., “com.google.android.apps.plus”) and the table identifier 535 for a given table 134 (e.g., the table name “messages” 302 ) if such information is available and has been determined.
  • the determined application identifier 530 for a given table 134 e.g., “com.google.android.apps.plus”
  • the table identifier 535 for a given table 134 e.g., the table name “messages” 302 .
  • the determined date/time format of the data may also be shown in the user interface 500 . As illustrated, this is provided as an additional column 550 positioned beside the ‘date/time’ field 520 .
  • the user interface 500 shows that the data for the column with the column identifier ‘timestamp’ 320 as being determined to be of a ‘PRTime’ date/time format 555.
  • user interface 500 has a preview section 560 that displays how the data from a data record of the table would be presented, based on the mappings.
  • the data record 360 illustrated in FIG. 3 may be shown in the preview section 560 according to the mappings illustrated for the database table 134 .
  • the data 364 for the ‘Author_id’ column 314 is provided under ‘From’ in the preview section 560 because the ‘Author_id’ column 314 is mapped to the ‘sender’ data field 514 .
  • the data 366 for the ‘Conversation_id’ column 316 is provided under ‘To’ in the preview section 560 because the ‘Conversation_id’ column 316 is mapped to the ‘recipient’ data field 516 .
  • the data 368 for the ‘text’ column 318 is provided under ‘Message’ in the preview section 560 because the ‘text’ column 318 is mapped to the ‘content’ data field 518
  • the data 370 for the ‘timestamp’ column 320 is provided under Date/Time′ in the preview section 560 because the ‘timestamp’ column 320 is mapped to the ‘date/time’ data field 520 .
  • the preview section 560 may be configured to display the data according to the date/time format that has been determined in the method of FIG. 2 above. For example, as illustrated, because the date/time format of the date/time data 370 has been determined to be ‘PRTime’ (e.g., as illustrated at 555 of user interface 500 ), the date/time shown would be the date/time data 370 after it has been converted to the ‘PRTime’ format. If the resultant post-conversion date/time data appears to be incorrect in the preview section 560 to a user, user input may be received via the user interface control 555 (e.g., the indicated combo box may be selected), and an alternative date/time format may be chosen. The preview section 360 may then be updated to display the date/time data formatted according to the alternative date/time format.
  • PRTime e.g., as illustrated at 555 of user interface 500
  • step 410 input from the user interface may be received indicating that a data field is to be mapped to an alternative column of the table.
  • the various mappings of the column identifiers 314 , 316 , 318 , 320 of data table 134 are provided within drop-down controls that may receive such input selecting an alternative column of the table to map to the particular data field 514 , 516 , 518 , 520 .
  • the list within the drop down box may be populated with all the column identifiers of the other columns of the table, so that other column identifiers for the table may be selected to be mapped to a given data field 514 , 516 , 518 , 520 .
  • the drop-down boxes may be activated if a user determines that a given mapping is incorrect. For example, upon seeing that the ‘recipient’ data field 516 being mapped to the column with the ‘conversation_id’ column identifier results in the number ‘1’ being identified as a recipient of message in a Chat/IM application (e.g., as is illustrated in under ‘To’ in the preview section 560 ), the user may select an alternative column to be mapped to the ‘recipient’ data field 516 .
  • the preview section 560 in response to receiving user input indicating an alternative mapping, may be updated to correspond to the alternative mapping. In some instances the preview section 560 may be updated to allow the user to consider whether the selected mapping is appropriate, or whether the mapping should be changed again.
  • an updated mapping for the data field to the alternative column may be stored.
  • the processor 110 may update the mapping stored in column/data field mapping store 140 to correspond to the alternative column.
  • method 400 may be executed multiple times for an application before the user has determined the appropriate mapping for that application.
  • illustrated therein generally as 600 is a screenshot of a data recovery user interface after the updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment.
  • the computing device 102 may attempt to recover the data from the respective unknown databases, using the stored mappings. For example, the computing device 102 may generate a report or case file that contains the results of the data extracted using the stored mappings.
  • the data recovery user interface 600 may be displayed after the mapping has been updated according to the method 400 of FIG. 4 . Additionally or alternatively, the data recovery interface 600 may also be presented to a user immediately after the initial mappings are stored upon the completion of the method 200 of FIG. 2 .
  • a number of different tables 134 that have been analyzed may be selected in the left-hand pane.
  • the records for that table can then be populated into the top-right pane (e.g., as illustrated, there are 43 data records within the table).
  • the data within each of the data records are then displayed in accordance with the mapping determined by method 200 of FIG. 2 and/or method 400 of FIG. 4 .
  • the data contained therein for each data record are shown under each respective data field 514 , 516 , 518 , 520 .
  • the user interface 600 also includes a preview section 660 which displays a preview of how the data of a selected data record 360 would appear based on the mappings.
  • the example data record 360 shown in FIG. 3 is again shown.
  • the data 364 for the ‘Author_id’ column 314 is shown as being associated with the ‘sender’ data field 514 .
  • the data 366 for the ‘Conversation_id’ column is shown as being associated with the ‘recipient’ data field 516 .
  • the data 368 for the ‘text’ column 318 is shown as being associated with the ‘content’ data field 518
  • the data 370 for the ‘timestamp’ column 320 is shown as being associated with the ‘date/time’ data field 520 .
  • the preview section 660 may also provide an indication of the table identifier 302 (e.g., the table name ‘messages’).
  • X and/or Y is intended to mean X or Y or both.
  • X, Y, and/or Z is intended to mean X or Y or Z or any combination thereof.

Abstract

The present embodiments relate generally to a computer device, system and method of identifying an application type of unknown data. The method may include: determining that the unknown data corresponds to database information, the database information comprising at least one table with at least one column; for a column of a table in the database information, determining if a column identifier of the column comprises a keyword associated with a particular application type; and if the column identifier comprises the keyword, identifying data stored in the database as belonging to an application that is of the particular application type.

Description

    RELATED APPLICATIONS
  • This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/883,279, filed Sep. 27, 2013, the entire contents of which are hereby incorporated by reference herein for all purposes.
  • FIELD
  • Embodiments herein relate to methods, systems and devices for identifying an application type of unknown data, and in particular to methods, systems and devices for identifying an application type of unknown data stored on a computer readable medium such as, for example, a storage device.
  • INTRODUCTION
  • Computing devices (e.g., desktop or laptop computers, and mobile devices such as smartphones or tablet computers) typically contain one or more computer readable media (e.g., memory, a hard disk drive, or a solid state drive) on which applications can store data.
  • In various situations, it may be desirable to recover data from the computer readable media. For example, data recovery may be performed when there is a failure of the computer readable media that prevents normal access to the data.
  • Data recovery may also be performed in the context of forensics. For example, a user may attempt to hide, delete, or obfuscate data on a computer readable medium so that the data may not be readily accessible to someone else. This may be particularly the case where the user is undertaking some illicit or otherwise improper activity and does not want such activity to be discovered. In such scenarios, law enforcement authorities or others may be interested to learn about a user's activities on a particular computing device by recovering data on that computing device.
  • In a further example, data recovery may be performed to allow a party to know how a computing device had been or is being used by a given individual or a group of individuals. For example, employers may be interested to learn how their computer resources are being used by their employees. Similarly, parents and/or spouses might be interested to know how members of their families are using a computing device.
  • SUMMARY OF SOME EMBODIMENTS
  • According to some aspects, there is provided a method of identifying an application type of unknown data, the method comprising: determining that the unknown data corresponds to database information, the database information comprising at least one table with at least one column; for a column of a table in the database information, determining if a column identifier of the column comprises a keyword associated with a particular application type; and if the column identifier comprises the keyword, identifying data stored in the database as belonging to an application that is of the particular application type.
  • In various embodiments, the keyword is associated with a data field that is commonly used by an application of the particular application type.
  • In various embodiments, the method includes sampling a data record in the table; and determining that data for the column in the data record is consistent with data for the data field that would belong to an application of the particular application type
  • In various embodiments, the method includes converting the data in the column in the data record to each of a plurality of date/time formats; comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table.
  • In various embodiments, the method includes storing a mapping between the data field and the column, the mapping being accessible during recovery of data in the database to indicate that data for the column in the table is associated with the data field.
  • In various embodiments, the method includes displaying the mapping between the data field and the column in a user interface, wherein the user interface provides an option to select an alternative column of the table to be mapped to the data field; receiving input indicating that the data field is to be mapped to the alternative column; and storing an updated mapping for the data field, the updated mapping indicating that the data field is mapped to the alternative column.
  • In various embodiments, the particular application type can include a messaging application, and the data field that is commonly used comprises one of: a sender field, a recipient field, a message field, and a timestamp field.
  • In various embodiments, the particular application type can include a web browser application, and the data field that is commonly used comprises one of: an address field, a date field, a bookmark field, and a title field.
  • In various embodiments, the particular application type can include a geographic location-enabled application, and the data field that is commonly used comprises one of: a longitude field, a latitude field, a destination field, a direction field, and a route field.
  • In various embodiments, the particular application type comprises a messaging application, and the keyword comprises one of the following words: message, subject, text, msg, body, content, date, time, timestamp, from, sender, author, uid, member, to, receiver, conversation, recipient, partner, participant, and party.
  • In various embodiments, the particular application type comprises a web browser application, and the keyword comprises one of the following words: address, location, loc, URL, visited, date, bookmark, favorite and title.
  • In various embodiments, the particular application type comprises a geographic location-enabled application, and the keyword comprises one of the following words: coordinate, longitude, latitude, location, loc, home, destination, direction, and route.
  • According to some other aspects, there is provided a computing device comprising a processor and a memory storing instructions which, when executed by the processor, cause the processor to perform the methods described herein.
  • According to some other aspects, there is provided a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the methods described herein. In various embodiments, the computer readable medium is non-transitory.
  • According to some other aspects, there is provided a system adapted to perform any one or more of the methods as described herein.
  • According to some other aspects, there is provided a device comprising at least one processor adapted to perform any one or more of the methods as described herein.
  • DRAWINGS
  • Some embodiments will now be described, by way of example only, with reference to the following drawings, in which:
  • FIG. 1 is a schematic diagram illustrating a computing device for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment;
  • FIG. 2 is a flowchart illustrating a method for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment;
  • FIG. 3 is an exemplary database table containing data that may be stored in a storage device, in accordance with one example embodiment;
  • FIG. 4 is a flowchart illustrating a method for updating the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment;
  • FIG. 5 is a screenshot of an example user interface that allows updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment; and
  • FIG. 6 is a screenshot of a data recovery user interface after the updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment.
  • DESCRIPTION OF VARIOUS EMBODIMENTS
  • For simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements or steps. In addition, numerous specific details are set forth in order to provide a thorough understanding of the exemplary embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the embodiments generally described herein.
  • Furthermore, this description is not to be considered as limiting the scope of the embodiments described herein in any way, but rather as merely describing the implementation of various embodiments.
  • The embodiments of the methods described herein may be implemented in hardware or software, or a combination of both. In some cases, embodiments may be implemented in one or more computer programs executing on one or more programmable computing devices comprising at least one processor (e.g., a microprocessor), a data storage device (including in some cases volatile and non-volatile memory and/or data storage elements), at least one input device, and at least one output device. For example and without limitation, the programmable computing devices may be a personal computer, laptop, personal data assistant, cellular telephone, smartphone device, tablet computer, and/or wireless device. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices.
  • In some embodiments, each program may be implemented in a high level procedural or object oriented programming and/or scripting language. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language.
  • In some embodiments, the computing devices and methods as described herein may also be implemented as a transitory or non-transitory computer-readable storage medium configured with a computer program, wherein the storage medium so configured causes a computing device to operate in a specific and predefined manner to perform at least some of the functions as described herein. The medium may be provided in various forms, including one or more diskettes, compact disks, tapes, chips, wireline transmissions, satellite transmissions, internet transmission or downloadings, magnetic and electronic storage media, digital and analog signals, and the like. The computer useable instructions may also be in various forms, including compiled and non-compiled code.
  • Moreover, the subject system may be implemented as one or more software components stored on one or more computer servers that are accessible via one or more client machines in a client-server architecture. In such case, the system can be considered to be a hosted software offering or a software service in a software-as-a-service deployment.
  • The embodiments of the present disclosure relate generally to methods of identifying an application type of unknown data that may be encountered during a data recovery process. In traditional data recovery processes, there is typically a catalog of application data that indicates the data format of data stored by a given application on a storage device. When unknown data is encountered, this application data is referenced to determine if the unknown data matches the data formats that are indicative of a particular application. If so, the unknown data is processed according to the identified application.
  • Such traditional processes, however, may be inefficient because they require analysis of the data structure stored by an application before data associated with the application can be recovered. As computer technology advances and becomes more prevalent, applications for electronic devices are being developed at an increasingly fast rate. In particular, as mobile device applications have become more popular, the number and variety of applications available to users of mobile devices has expanded dramatically. The ever-growing number of applications makes analyzing the data format of each application difficult. This results in data formats for many applications not being analyzed. If data stored by these applications are encountered during data recovery, such traditional data recovery processes may not be able to adequately recover the data.
  • At least some of the present embodiments may provide a computing device, system or method that allows unknown data (which does not already correspond to a data format of a known application) to still nevertheless be recovered.
  • Viewed at a high-level, according to the teachings herein, various embodiments may recognize that even though the particular data format for an unanalyzed application may be unknown, the application type of the unanalyzed application may be identified based on certain characteristics of how unknown data is stored. In particular, some embodiments may recognize that certain keywords may be commonly used by applications of a particular application type as an identifier for a column of a table stored in a database.
  • Additionally, some embodiments may recognize that these keywords are being used to identify a column that may indicate that the data for the column corresponds to a common data field stored by applications of the application type. For example, the keywords “author” or “from” may be commonly used in chat or instant messaging (IM) type applications to identify a “sender” data field for chat messages stored in the application.
  • Referring now to FIG. 1, illustrated therein generally as 100 is a block diagram showing a computing device for identifying an application type of unknown data stored in a storage device in accordance with one example embodiment. As shown, a computing device 102 may be coupled to a storage device 104 on which the unknown data is stored. The computing device 102 may include a processor 110, a display 112, a storage device interface 114, and a memory 116.
  • Processor 110 may be configured to perform the steps of the methods described herein. To perform these steps, in various embodiments, the processor 110 may execute instructions stored on memory 116. For example, the instructions may be stored in the form of an application-type identification module 120.
  • During execution, the application-type identification module 120 may be configured to retrieve keywords from a keyword store 122, with the keywords being used to analyze the column identifiers of an unknown database 132 stored on the storage device 104 to determine if the data stored therein corresponds to a particular application-type. For ease of illustration, the keyword store 122 is shown as being also stored on memory 116. However, it will be understood that the keyword store 122 may be stored separately from the memory 116 (e.g., on a hard disk (not shown) or some other local or remote storage).
  • As is discussed in greater detail below, if it is determined that a column identifier includes a keyword associated with an application type, a mapping may be stored between the column and a data field that is commonly used by applications of the application type. These mappings may be stored in the column/data field mapping store 140. In some cases, the initial or a previous mapping determined by the presence of the keywords may be subsequently updated through a user interface 142 provided by the application-type identification module 120.
  • In various embodiments, the user interface 142 may be displayed on display 112. Display 112, for example, may be a suitable display device (e.g. a monitor, screen or touchscreen) coupled to the processor 110. The user interface 142 may allow the processor 110 to solicit input from a user that may confirm or update the mapping of a column to a data field, as stored in the column/data field mapping store 140 of memory 116. Examples screenshots that may be shown in the user interface 142 are illustrated in FIGS. 5 and 6, and will be discussed in greater detail below.
  • The storage device 104 may be coupled to computing device 102 through storage device interface 114. The storage device 104 may have application data stored thereon associated with various known and unknown applications. In various embodiments, the storage device 104 may include a file system 130 that contains a number of different files. In some situations, one or more of the files 130 may correspond to unknown database information stored in a database 132. At least some of the present embodiments are directed to methods of determining if the data stored within an unknown database 132 includes data that is of a particular application type, e.g., by determining if column identifiers for a database table 134 of the unknown database 132 includes certain keywords.
  • It will be understood that in certain situations, a data recovery process may attempt to analyze data that is intended to have been deleted from the storage device 104 (e.g., if data recovery is being performed in a forensics context by law enforcement officers). For example, a user may use the “delete” function of an operating system to delete a file, but such file may nevertheless still be recoverable despite having been “deleted”. This is because many operating systems and/or device driver software may not physically delete the data from the storage device 104 immediately when a command to delete such data is received. Instead, the addresses on the storage device 104 that stores such data may simply be marked as “unallocated” or “available”. Such indications inform the operating system or other applications that these addresses are now available to store other data, so that the old data may subsequently be overwritten and thereby deleted when there are new data stored in such addresses. Since such data may not actually be overwritten, it is possible that the data flagged to be deleted may remain physically undeleted from the storage device 104 for an extended period of time even though it had been requested to be deleted by the application or the user (or both).
  • Accordingly, when analyzing the data that is stored on a storage device 104 in the present embodiments, it is possible that the data being analyzed is intended to have been deleted, but nevertheless remains readable from the storage device 104.
  • It will be understood that the different components shown in FIG. 1 can be provided in a variety of ways. For example, the computing device 102 may be provided in the form of personal computers, networked computers, portable computers, portable electronic devices, personal digital assistants, laptops, desktops, mobile phones, smart phones, tablets, and so on.
  • The processor 110 may be any type of processor, such as, for example, any type of general-purpose microprocessor or microcontroller, a digital signal processing (DSP) processor, an application-specific integrated circuit (ASIC), a programmable read-only memory (PROM), or any combination thereof.
  • Similarly, the memory 116 may include any type of computer memory that is located either internally or externally to the computing device 150 such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), electrically-erasable programmable read-only memory (EEPROM), a hard disk drive, a solid-state drive or any other form of suitable computer readable medium that may be used in electronic devices.
  • Additionally, computing device 102 may include one or more input devices (not shown), such as a keyboard, mouse, camera, touch screen and/or a microphone, and may also include one or more output devices such as a display screen 112 and/or a speaker. Computing device 102 may have a network interface for connecting to a network (not shown) in order to communicate with other components.
  • It will be understood that although each of data stores 122, 140 are illustrated in FIG. 1 separately, they can be stored together as separate tables within the same or multiple databases both locally and/or remotely. Additionally, other persistent storage methods such as encrypted files may also be used to provide persistent storage.
  • Further, the storage device interface 114 of the computing device 102 may be any type of hardware or software interface that allows the computing device to communicate with the storage device 104. For example, the storage device interface 114 may be one or more of the following interfaces: Parallel AT Attachment (PATA), Serial AT Attachment (SATA), Integrated Drive Electronics (IDE), Enhanced Integrated Drive Electronics (EIDE), Small Computer System Interface (SCSI), Universal Serial Bus (any version), FireWire and/or Thunderbolt. Additionally or alternatively, the storage device interface 114 may allow communication with a storage device 114 which is provided remotely (e.g., via Network-Attached Storage (NAS) and/or Storage Area Network (SAN) mechanisms) by acting as a client to a server that provides access to the storage device 104.
  • Moreover, the storage device 104 on which the unknown data is stored may include any type of the computer readable media that is to be the subject of the data analysis methods described herein, including the types of memory that are listed above as being options for the memory 116.
  • Referring now to FIG. 2, shown there generally as 200 is a method for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment. In some embodiments, the components of the computing device 102, such as the processor 110, may be configured to execute one or more steps of the method 200 to identify an application type of unknown data. For ease of explanation, when discussing various steps of the method of FIG. 2, reference will simultaneously be made to an example database table shown generally as 134 in FIG. 3.
  • At step 205, a search for files of the storage device 104 is initiated. The storage device 104 may be formatted according to a known file system that the application type identification module 120 (as shown in FIG. 1) is configured to access and conduct a search on. For example, the storage device 104 may have been formatted for use with various operating systems such as Microsoft™ Windows™, Linux™, Apple OS X™, Apple iOS™ and/or Android™, and the file systems that can be processed by the application-type identification module 120 may include the following file systems: File Allocation Table 32 (FAT32), New Technology File System (NTFS), third extended file system (ext3), fourth extended file system (ext4), Hierarchical File System (HFS) and/or Hierarchical File System Plus (HFS+ or HFSX),
  • At step 210, it is determined whether a file has been found. If no file has been filed (the ‘NO’ branch at step 210), then there may be no remaining files to be processed by method 200. If at least one of the located files were determined to contain database information and also resulted in at least one mapping between a column and a data field, method 200 may then proceed to display the mapping user interface at step 215. As noted, computing device 102 may display the mapping user interface via display 112 (as shown in FIG. 1). As will be discussed below, the mapping user interface may allow a user using the computing device 102 to confirm or alter the mapping between a column and a data field determined by the method of FIG. 2. The steps associated with providing such a user interface are shown in FIG. 4.
  • It may be the case that the file system 130 is not recognized by the application-type identification module 120, or that no database information has been located on the file system 130, or that no mappings between data fields have been generated. In such case, step 215 may not be performed. The process may then end and a message may be displayed that indicates that the method of FIG. 2 was not able to identify the application type of the data stored on the storage device 104.
  • If it is determined that a file has been found (the ‘YES’ branch at 210), the computing device 102 may open the file for read access and the data in the file may be read to attempt to determine if the file includes database information (step 220).
  • At step 225, a determination is made as to whether the file corresponds to database information. This may be performed, for example, by reading the header information for a file to determine if it contains information indicating that it is of a known database type. For example, this step may involve checking for the string “SQLite format 3” in the header information of the file as SQLite database files typically includes the text “SQLite format 3” in its header. In some cases, the information indicating that a file is a database may not necessarily be within the header portion of a file and instead, could be located in other parts of the data associated with the file. For example, the file extension of a file may indicate that a file is of a particular database type (e.g., the file extensions “.mdb”, “.mda”, or “.accdb” may indicate that the given file is a Microsoft™ Access™ database). It will be appreciated that the determination made at step 225 may identify database information that does not necessarily include the entire contents of a given database. For example, the database information may include any portion of the contents of a database (e.g., as may be the case if the remaining contents of the database have already been overwritten or is otherwise unavailable).
  • If the file is determined not to include database information (the ‘NO’ branch at step 225), then method 200 returns to step 205 where it may again begin the process of determining whether there are any files remaining to be processed. In various embodiments, the application-type identification module 120 may also store data to a report database (not shown) indicating that the previous file was processed and had been identified as not containing a database.
  • If the file is determined to include database information (the ‘YES’ branch at step 225), then the tables accessible within the database are processed and method 200 proceeds to step 230.
  • At step 230, the method 200 determines if there are any tables that are still unprocessed in the database. If there are not (the ‘NO’ branch at step 230), then the method 200 returns to step 205 where it may again begin the process of determining whether there are any files remaining on the storage device 104 to be processed. If there are still tables to be processed (the ‘YES’ branch at step 230), then the next table in the database information is read at step 235.
  • At step 240, it is determined if there are any columns of the table that are still unprocessed. If there are not (the ‘NO’ branch at step 240), then the table has been processed and method 200 returns to step 230 to determine whether the database information contains any further tables that are still unprocessed.
  • If it is determined that there are columns of the table still to be processed (the ‘YES’ branch at 240), then the column identifier of the next column is read at step 245. In some cases, the column identifier may be the name of the column used in a database. In various embodiments, the column identifier information may be retrieved from the schema of the database. In some cases, the column identifier may be found in header data of the database table, however the column identifier could also be located in other parts of the data associated with the column.
  • Once the identifier of a column has been read, step 250 determines whether the column identifier read at step 245 includes a keyword associated with a particular application type. For example, in the case where the column identifier is a column name, this step may involve determining if the column names include keywords that are commonly used as column names by applications of the particular application type. The presence of such keywords in the column name may be taken as an indication that the database belongs to an application of that particular application type.
  • As examples, the present embodiments may be able to determine that unknown data is of an application type that includes, without limitation: a chat or instant messaging application type, a web browser application type, a navigation/geo-location application type, a file sharing application type, a social networking application type, a cloud application type, and an email application type. It will be understood that although specific example application types are mentioned and described herein for illustrative purposes, the present embodiments may be used to identify any application type generally.
  • Some examples of keywords that may be used to identify data as belonging to an application of a chat or instant messaging application type may include the words: ‘message’, ‘subject’, ‘text’, ‘msg’, ‘body’, ‘content’, ‘date’, ‘time’, ‘timestamp’, ‘from’, ‘sender’, ‘author’, ‘uid’, ‘member’, ‘to’, ‘receiver’, ‘conversation’, ‘recipient’, ‘partner’, ‘participant’, and ‘party’. Some examples of keywords that may be used to identify data as belonging to an application as belonging to an application of a web browser type may include the words: ‘address’, ‘location’, ‘loc’, ‘URL’, ‘visited’, ‘date’, ‘bookmark’, ‘favorite’ and ‘title’. Some examples of keywords that may be used to identify data as belonging to an application of a navigation/geographic-location application type include: ‘coordinate’, ‘longitude’, ‘latitude’, ‘location’, ‘loc’, ‘home’, ‘destination’, ‘direction’, and ‘route’.
  • To illustrate step 250, reference is simultaneously made to FIG. 3, which shows generally as 134 a schematic representation of a data table storing unknown data. Data table 134 includes a table identifier 302 (e.g., a table name such as ‘Messages’ or some other alphanumeric identifier), and a series of columns with respective column identifiers 312, 314, 316, 318, 320.
  • In the context of step 250 in FIG. 2, a particular column with column identifier ‘Message_id’ 312 may be being processed. In this context, it may be determined that the column identifier 312 contains the keyword ‘message’. Since the column identifier 312 contains a keyword associated with the chat/IM application type, it may be determined that the data in the database of data table 134 belongs to an application of the chat/IM application type.
  • Returning to FIG. 2, if it is determined that a column identifier does not include a keyword associated with an application type (the ‘NO’ branch at step 250), then control returns to step 240 where it is again determined whether there are any columns in the table that are still unprocessed.
  • If it is determined that a column identifier does include a keyword associated with an application type (the ‘YES’ branch at step 250), then the process has determined that the file includes database information from a particular application type. In a number of situations, a given keyword used to identify data stored in a database as belonging to an application may also be associated with a data field that is commonly used by an application of the particular application type. For example, the keywords ‘message’, ‘subject’, ‘text’, ‘msg’, ‘body’ and ‘content’ may all be commonly used to identify a data field for the substance or “content” of the message in an application of a Chat/IM application type. Similarly, the ‘date’, ‘time’, and ‘timestamp’ keywords may be commonly used to identify a data/time field for the date and time of a message in an application of a Chat/IM application type. Likewise, the keywords ‘from’, ‘sender’, ‘author’, ‘uid’, and ‘member’ may be commonly used to identify a data field for the sender of a message in an application of a Chat/IM application type. Further, the keywords ‘to’, ‘receiver’, ‘conversation’, ‘recipient’, ‘partner’, ‘participant’, and ‘party’ may be commonly used to identify a data field for a recipient of a message in an application of a Chat/IM application type.
  • As will be understood, there may be other keywords associated with other data fields depending on the nature of the application type that is attempting to be determined. For example, the keywords ‘address’, ‘location’, ‘URL’, or ‘visited’ may all be commonly used to identify a data field for the address field in a web browser application.
  • Optionally, to confirm that the determining made at step 250 is correct, at step 255, a data record from the table may be sampled to determine if the data for that column in the data record is consistent with data that would belong (e.g., generated by, or otherwise associated with) to that data field by an application of the application type. This may be performed in a number of ways. For example, in the case where the keyword corresponds to a data field that is supposed to contain the “content” of a message, the data for the data column in the data record may be compared to words in a dictionary that have been previously recognized as commonly being present in the content of a message. For example, these words may include ‘hello’, ‘hi’, ‘hey’, ‘bye’, ‘see’, ‘you’, ‘soon’ and/or ‘later’. In another example, if the keyword corresponds to a data field that is supposed to contain data for the given data field (e.g., a GPS coordinate), a regular expression can be created to recognize a string of text or numeric values as potentially being data of the given data field (e.g., a potential GPS coordinate). In yet another example, heuristics may be developed based on historical experience of what data for a given data field contains, and these heuristics may be used to confirm that data for a given column appears as expected. It will be understood that various other ways of performing this step may be possible.
  • To illustrate step 255, reference is simultaneously again made to FIG. 3. Data record 360 stored in data table 134 may be sampled to determine whether the data in a particular column of data record 360, such as data 368 for the column with column identifier ‘text’ 318, is consistent with data that is expected for the data field that the column has mapped to (i.e., the data field for the ‘content’ of the message). As shown, such data 368 includes the text “hey hey hey”. Using the dictionary lookup method described above, it may be determined that since the data 368 includes the word ‘hey’, the data 368 is consistent with that which is expected for the data field for the “content” of the message (e.g., as that which would have been generated by an application belonging to the Chat/IM application type).
  • At step 258, if the column maps to a date/time data field, the date/time format for the data of the column may be determined. For example, this may be performed by converting the date/time value to various formats and performing boundary checks to identify the particular date/time format that the data in the column is most likely to be formatted in. Specifically, this may involve converting the data in the column in the sampled data record to each of a plurality of date/time formats; comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table. In various embodiments, the reference date/time may be the present date/time (e.g., the date/time when data recovery process is being performed). Additionally or alternatively, the reference date/time may be a predefined date/time of a particular event (e.g., if data recovery is being performed for forensics purposes, the date/time of a criminal activity such as a murder).
  • In a variant embodiment, only the post-conversion date/time data that is within a specific date/time window (e.g., +/−7 years of the reference date/time) may be used to compare with each other. This may reduce the amount of comparisons that need to be performed if it is known that it is unlikely that the date/time data that is being sampled will be beyond the specific date/time window. For example, if data converted to a given date/time format results in data that is beyond the specific date/time window (e.g., earlier or later than the window), then it can be determined that such given date/time format is unlikely to be the correct date/time format that the data is actually formatted in.
  • As will be understood, any date/time format may be supported. For example, some example date/time formats that may be supported include: Unix epoch time—seconds, Unix epoch time—milliseconds, PRTime, Mac Absolute Time, and/or Chrome/webkit time.
  • Referring still to FIG. 2, if it is determined that the data sampled for the column being analyzed is not consistent with data that would belong to an application of the application type (the ‘NO’ branch at step 260), then method 200 returns to step 240 where it is again determined whether there are any columns remaining in the table that are still unprocessed.
  • As indicated above, steps 255 to 260 are optional in that they need not be performed. When they are performed, however, the acts may provide a confirmation that the column with the column identifier having a keyword does in fact corresponds to the data field associated with the keyword. In this way, steps 255 and 260 may be considered a “sanity check” that verifies the conclusion arrived at in step 250.
  • Referring again simultaneously to FIG. 3, illustrated there are several columns for which the performance of steps 255 to 260 may result in the conclusion arrived at in step 250 not being confirmed. For example, in the column with the column identifier ‘Message_id’ 312, the determination at step 250 may have been that because the column identifier contains the keyword ‘message’, that the column corresponds to the data field for the “content” of a message, as would be generated by an application belonging to the Chat/IM application type. However, upon performing steps 255 to 260, it may be determined that the data 362 for that column does not contain any of the words in the dictionary that have been previously recognized as indicating that the data constitutes the “content” of a message. As a result, it may be determined at 260 that the column with the ‘Message_id’ column identifier 312 does not correspond to the “content” data field of a Chat/IM application. In this way, the performance of steps 255 to 260 may reduce the likelihood of erroneous mappings that are determined based on the results of step 250 alone.
  • Referring back to FIG. 2, if it is determined that the data sampled for a column is consistent with data that would belong to an application of the application type (the ‘YES’ branch at step 260), the method 200 proceeds to step 265 where the data stored in the database is identified as belonging to an application of the particular application type identified in step 250. Step 265 may involve the application-type identification module 120 (as shown in FIG. 1) storing information indicating that the unknown data encountered on the storage device 104 belongs to the particular application type for which the unknown data was analyzed (e.g., a Chat/IM application type).
  • In some cases, application-type identification module 120 may attempt to identify an application identifier (e.g., the name of the application). For example, this may be performed by using operating system application manifests (e.g., as may be separately found on file system 130 of storage device 104, apart from the database 132), or via the text found in file path location (e.g., such text may be found in the file path of where the database 132 is located on the file system 130 of storage device 104). If the application name is available, when performing step 265 in FIG. 2, an association between the name of the application and the application type as identified by the method of FIG. 2 may be stored.
  • At step 270, a mapping may be stored between the column being processed and the commonly used data field that the sampled data of the column was determined to be consistent with in step 260. In some cases, the mapping may be stored in column/data field mapping store 140. The mappings may, for example, be subsequently referenced when recovering data from the unknown database 132. The mappings may also be subsequently used when recovering data from another storage device 104 containing unknown data, so that if similar database information is encountered, the mappings can be referenced to identify the type of data that is stored in the database.
  • After step 270, method 200 returns to step 240 to determine whether there are any columns still unprocessed. If method 200 determines that there are no columns still unprocessed, and no tables still unprocessed at step 230, it will return to step 205 where it will continue to search for files. If no files are found in step 210, method 200 may proceed to step 215 and display a mapping user interface that may allow updating of the mapping between a column and a given data field. The mapping user interface may display a list of the located databases tables and mappings of columns of such tables to the commonly used data fields for a given application type, so as to allow user input for final verification or remapping if necessary. A method and user interface for performing such remapping or verification is discussed below with respect to FIGS. 4 and 5.
  • It will be appreciated that various changes may be made to the method of FIG. 2.
  • For example, in a variant embodiment, step 265 (to identify data stored in the database as belonging to an application of the particular application type) may be performed immediately after it has been determined that the column identifier 250 includes a keyword associated with the application type, and before a data record is sampled at steps 255 to 260.
  • Additionally or alternatively, in some embodiments, some of the steps of method 200 may be executed in parallel. Parallel execution of some steps may be desirable in systems that have more than one processor or a processor that has more than one processing core. In such cases, for example, one or more cores may be focused on executing step 250 to identify whether a column identifier contains a keyword associated with an application type, and one or more other cores may be focused on sampling a data record from the table to determine whether the data is consistent with data that would be generated by an application of the application type. Parallel execution may also allow the computing device 102 to process more than one table or more than one column simultaneously.
  • Further, in some cases, method 200 may also include a step of checking a reference database (e.g., the column/data field mapping store 140 shown in FIG. 1) containing mappings and database information from previous executions of the method of FIG. 2. In such a case, if a match is found between the database being processed and a previously stored mapping, the previously stored mapping may be applied and the data from the database being processed may be automatically recovered to be presented later. In such a case, the method may return directly to step 205 to search for any remaining files that need to be processed, and the mapping need not be presented to the user for verification in accordance with method 400.
  • Referring to FIG. 4, shown there generally as 400 is a method 400 for updating a mapping between a data field and a column, in accordance with one example embodiment. For ease of illustration, reference will also simultaneously be made to FIG. 5, which shows generally as 500, a screenshot of an example user interface that allows updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment. In various embodiments, the components of the computing device 102, such as the processor 110 and the display 112, may be configured to execute one or more steps of the method 400. In various embodiments, method 400 may be initiated at step 215 of the method 200 of FIG. 2 where a mapping user interface is displayed.
  • The method 400 starts at step 405 where a mapping between a data field and a column of a table is displayed in a user interface. The mapping may be retrieved from the column/data field mapping store 140 shown in FIG. 1. In various embodiments, the mapping may have been stored as a result of step 270 of method 200 in FIG. 2.
  • Referring simultaneously to FIG. 5, shown there generally as 500 is an example user interface that may be displayed at step 405 of FIG. 4. As illustrated, the user interface 500 is for an execution of method 200 of FIG. 2 that attempts to determine if unknown data corresponds to a Chat/IM application type. Accordingly, the user interface provides a number of data fields that are commonly used by an application of a Chat/IM application type. Specifically, there is a ‘sender’ data field 514 (shown in FIG. 5 with the text “Identified Sender Column”), a ‘recipient’ data field 516 (shown in FIG. 5 with the text “Identified Recipient Column”), a ‘content’ data field 518 (shown in FIG. 5 with the text “Identified Message Column”), and a ‘date/time’ data field 520 (shown in FIG. 5 with the text “Identified Date Column”).
  • The user interface 500 of FIG. 5 shows, as rows, each of the tables 134 that have been processed in FIG. 2. For each table 134, the user interface shows the column identifier of a column in the table that has been determined to be mapped to a particular data field 514, 516, 518, 520 for the application type. For example, the user interface 500 shows information for the database table 134 illustrated in FIG. 3. Referring simultaneously to FIG. 3, it can be seen that the mapping (e.g., as may have been stored at step 270 in FIG. 2) has determined that the column with column identifier ‘author_id’ 314 corresponds to the ‘sender’ data field 514. Similarly, the column with column identifier ‘conversation_id’ 316 has been determined to correspond to the ‘recipient’ data field 516, the column with column identifier ‘timestamp’ 320 has been determined to correspond to the ‘date/time’ data field 520, and the column with the column identifier ‘text’ 318 has been determined to correspond to the ‘content’ data field 518.
  • As illustrated, the user interface 500 may also display the determined application identifier 530 for a given table 134 (e.g., “com.google.android.apps.plus”) and the table identifier 535 for a given table 134 (e.g., the table name “messages” 302) if such information is available and has been determined.
  • As discussed above, for a column that is identified as corresponding to the ‘date/time’ data field 520, it may be possible to determine the date/time format of the data stored for that column. Accordingly, in some embodiments, if a given column has been mapped to a ‘date/time’ data field, the determined date/time format of the data may also be shown in the user interface 500. As illustrated, this is provided as an additional column 550 positioned beside the ‘date/time’ field 520. For the database table 134 of FIG. 3, for example, the user interface 500 shows that the data for the column with the column identifier ‘timestamp’ 320 as being determined to be of a ‘PRTime’ date/time format 555.
  • Furthermore, user interface 500 has a preview section 560 that displays how the data from a data record of the table would be presented, based on the mappings. For example, the data record 360 illustrated in FIG. 3 may be shown in the preview section 560 according to the mappings illustrated for the database table 134. Referring again simultaneously to FIG. 3, it can be seen that the data 364 for the ‘Author_id’ column 314 is provided under ‘From’ in the preview section 560 because the ‘Author_id’ column 314 is mapped to the ‘sender’ data field 514. Similarly, the data 366 for the ‘Conversation_id’ column 316 is provided under ‘To’ in the preview section 560 because the ‘Conversation_id’ column 316 is mapped to the ‘recipient’ data field 516. Likewise, the data 368 for the ‘text’ column 318 is provided under ‘Message’ in the preview section 560 because the ‘text’ column 318 is mapped to the ‘content’ data field 518, and the data 370 for the ‘timestamp’ column 320 is provided under Date/Time′ in the preview section 560 because the ‘timestamp’ column 320 is mapped to the ‘date/time’ data field 520.
  • For the date/time data 370, the preview section 560 may be configured to display the data according to the date/time format that has been determined in the method of FIG. 2 above. For example, as illustrated, because the date/time format of the date/time data 370 has been determined to be ‘PRTime’ (e.g., as illustrated at 555 of user interface 500), the date/time shown would be the date/time data 370 after it has been converted to the ‘PRTime’ format. If the resultant post-conversion date/time data appears to be incorrect in the preview section 560 to a user, user input may be received via the user interface control 555 (e.g., the indicated combo box may be selected), and an alternative date/time format may be chosen. The preview section 360 may then be updated to display the date/time data formatted according to the alternative date/time format.
  • Referring back to FIG. 4, at step 410, input from the user interface may be received indicating that a data field is to be mapped to an alternative column of the table. As illustrated in FIG. 5, the various mappings of the column identifiers 314, 316, 318, 320 of data table 134 are provided within drop-down controls that may receive such input selecting an alternative column of the table to map to the particular data field 514, 516, 518, 520. Specifically, the list within the drop down box may be populated with all the column identifiers of the other columns of the table, so that other column identifiers for the table may be selected to be mapped to a given data field 514, 516, 518, 520.
  • The drop-down boxes may be activated if a user determines that a given mapping is incorrect. For example, upon seeing that the ‘recipient’ data field 516 being mapped to the column with the ‘conversation_id’ column identifier results in the number ‘1’ being identified as a recipient of message in a Chat/IM application (e.g., as is illustrated in under ‘To’ in the preview section 560), the user may select an alternative column to be mapped to the ‘recipient’ data field 516.
  • In various embodiments, in response to receiving user input indicating an alternative mapping, the preview section 560 may be updated to correspond to the alternative mapping. In some instances the preview section 560 may be updated to allow the user to consider whether the selected mapping is appropriate, or whether the mapping should be changed again.
  • Referring again to FIG. 4, at step 415, an updated mapping for the data field to the alternative column may be stored. For example, the processor 110 may update the mapping stored in column/data field mapping store 140 to correspond to the alternative column. In some cases, method 400 may be executed multiple times for an application before the user has determined the appropriate mapping for that application.
  • Referring now to FIG. 6, illustrated therein generally as 600 is a screenshot of a data recovery user interface after the updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment.
  • Once a user has completed their review of the mappings for the identified applications, the computing device 102 may attempt to recover the data from the respective unknown databases, using the stored mappings. For example, the computing device 102 may generate a report or case file that contains the results of the data extracted using the stored mappings. In various embodiments, the data recovery user interface 600 may be displayed after the mapping has been updated according to the method 400 of FIG. 4. Additionally or alternatively, the data recovery interface 600 may also be presented to a user immediately after the initial mappings are stored upon the completion of the method 200 of FIG. 2.
  • As illustrated, a number of different tables 134 that have been analyzed may be selected in the left-hand pane. Upon selection of a given table (e.g., as illustrated, a table for the application with the “com.google.android.apps.plus” application identifier), the records for that table can then be populated into the top-right pane (e.g., as illustrated, there are 43 data records within the table). The data within each of the data records are then displayed in accordance with the mapping determined by method 200 of FIG. 2 and/or method 400 of FIG. 4. For example, for the column in each table that has been mapped to the ‘sender’ data field 514, the ‘recipient’ data field 516, the ‘content’ data field 518, and the ‘date/time’ data field 520 respectively, the data contained therein for each data record are shown under each respective data field 514, 516, 518, 520.
  • The user interface 600 also includes a preview section 660 which displays a preview of how the data of a selected data record 360 would appear based on the mappings. As illustrated, the example data record 360 shown in FIG. 3 is again shown. For example, the data 364 for the ‘Author_id’ column 314 is shown as being associated with the ‘sender’ data field 514. Similarly, the data 366 for the ‘Conversation_id’ column is shown as being associated with the ‘recipient’ data field 516. Likewise, the data 368 for the ‘text’ column 318 is shown as being associated with the ‘content’ data field 518, and the data 370 for the ‘timestamp’ column 320 is shown as being associated with the ‘date/time’ data field 520. Additionally, the preview section 660 may also provide an indication of the table identifier 302 (e.g., the table name ‘messages’).
  • While the above description provides examples of one or more devices, systems and methods, it will be appreciated that other devices, systems and methods may be within the scope of the present description interpreted by one of skill in the art.
  • As noted, the systems and methods disclosed herein are presented only by way of example and are not meant to limit the scope of the subject matter described herein. Other variations of the systems and methods described above will be apparent to those in the art and as such are considered to be within the scope of the subject matter described herein. For example, it should be understood that acts and the order of the acts performed in the processing described herein may be altered, modified and/or augmented yet still achieve the desired outcome.
  • In particular, the steps of a method in accordance with any of the embodiments described herein may be performed in any order, whether or not such steps are described in the claims, figures or otherwise in any sequential numbered or lettered manner. Also, in the various user interfaces illustrated in the figures, it will be understood that the illustrated user interface text and controls are provided as examples only and are not meant to be limiting. Other suitable user interface elements may be possible.
  • As used herein, the wording “and/or” is intended to represent an inclusive-or. That is, “X and/or Y” is intended to mean X or Y or both. Moreover, “X, Y, and/or Z” is intended to mean X or Y or Z or any combination thereof.

Claims (17)

1. A method of identifying an application type of unknown data, the method comprising:
determining that the unknown data corresponds to database information, the database information comprising at least one table with at least one column;
for a column of a table in the database information,
determining if a column identifier of the column comprises a keyword associated with a particular application type; and
if the column identifier comprises the keyword, identifying data stored in the database as belonging to an application that is of the particular application type.
2. The method of claim 1, wherein the keyword is associated with a data field that is commonly used by an application of the particular application type.
3. The method of claim 2, wherein prior to identifying the data stored in the database as belonging to an application that is of the particular application type, the method further comprises:
sampling a data record in the table; and
determining that data for the column in the data record is consistent with data for the data field that would belong to an application of the particular application type.
4. The method of claim 3, wherein the data field comprises a date/time field, and the method further comprises:
converting the data in the column in the data record to each of a plurality of date/time formats;
comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and
for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table.
5. The method of claim 2, wherein the method further comprises:
storing a mapping between the data field and the column, the mapping being accessible during recovery of data in the database to indicate that data for the column in the table is associated with the data field.
6. The method of claim 5, further comprising:
displaying the mapping between the data field and the column in a user interface, wherein the user interface provides an option to select an alternative column of the table to be mapped to the data field;
receiving input indicating that the data field is to be mapped to the alternative column; and
storing an updated mapping for the data field, the updated mapping indicating that the data field is mapped to the alternative column.
7. The method of claim 2, wherein the particular application type comprises a messaging application, and the data field that is commonly used comprises one of: a sender field, a recipient field, a message field, and a timestamp field.
8. The method of claim 2, wherein the particular application type comprises a web browser application, and the data field that is commonly used comprises one of: an address field, a date field, a bookmark field, and a title field.
9. The method of claim 2, wherein the particular application type comprises a geographic location-enabled application, and the data field that is commonly used comprises one of: a longitude field, a latitude field, a destination field, a direction field, and a route field.
10. The method of claim 1, wherein the particular application type comprises a messaging application, and the keyword comprises one of the following words: message, subject, text, msg, body, content, date, time, timestamp, from, sender, author, uid, member, to, receiver, conversation, recipient, partner, participant, and party.
11. The method of claim 1, wherein the particular application type comprises a web browser application, and the keyword comprises one of the following words: address, location, loc, URL, visited, date, bookmark, favorite and title.
12. The method of claim 1, wherein the particular application type comprises a geographic location-enabled application, and the keyword comprises one of the following words: coordinate, longitude, latitude, location, loc, home, destination, direction, and route.
13. A computing device comprising a processor and a memory storing instructions which, when executed by the processor, cause the processor to perform the method as claimed in claim 1.
14. A computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method as claimed in claim 1.
15. The computer readable medium of claim 14, wherein the computer readable medium is non-transitory.
16. A system adapted to perform any one or more of the methods as described in claim 1.
17. A device comprising at least one processor adapted to perform any one or more of the methods as described in claim 1.
US14/498,325 2013-09-27 2014-09-26 Method and device for identifying an application type of unknown data Abandoned US20150095290A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/498,325 US20150095290A1 (en) 2013-09-27 2014-09-26 Method and device for identifying an application type of unknown data
US17/554,581 US11868212B2 (en) 2013-09-27 2021-12-17 Method and device for identifying an application type of unknown data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361883279P 2013-09-27 2013-09-27
US14/498,325 US20150095290A1 (en) 2013-09-27 2014-09-26 Method and device for identifying an application type of unknown data

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/554,581 Continuation US11868212B2 (en) 2013-09-27 2021-12-17 Method and device for identifying an application type of unknown data

Publications (1)

Publication Number Publication Date
US20150095290A1 true US20150095290A1 (en) 2015-04-02

Family

ID=52741140

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/498,325 Abandoned US20150095290A1 (en) 2013-09-27 2014-09-26 Method and device for identifying an application type of unknown data
US17/554,581 Active 2034-10-22 US11868212B2 (en) 2013-09-27 2021-12-17 Method and device for identifying an application type of unknown data

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/554,581 Active 2034-10-22 US11868212B2 (en) 2013-09-27 2021-12-17 Method and device for identifying an application type of unknown data

Country Status (5)

Country Link
US (2) US20150095290A1 (en)
EP (1) EP3049970A4 (en)
AU (1) AU2014328401B2 (en)
CA (1) CA2925426C (en)
WO (1) WO2015042719A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150370917A1 (en) * 2013-02-07 2015-12-24 Hewlett-Packard Development Company, L.P. Formatting Semi-Structured Data in a Database
US20180150548A1 (en) * 2016-11-27 2018-05-31 Amazon Technologies, Inc. Recognizing unknown data objects
US11036560B1 (en) 2016-12-20 2021-06-15 Amazon Technologies, Inc. Determining isolation types for executing code portions
US20210224085A1 (en) * 2018-11-07 2021-07-22 Citrix Systems, Inc. Preloading of Application on a User Device Based on Content Received by the User Device
US20220058343A1 (en) * 2018-03-23 2022-02-24 Servicenow, Inc. Written-modality prosody subsystem in a natural language understanding (nlu) framework
US11704331B2 (en) 2016-06-30 2023-07-18 Amazon Technologies, Inc. Dynamic generation of data catalogs for accessing data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956725A (en) * 1997-11-26 1999-09-21 Interanational Business Machines Corporation Schema mapping to a legacy table with primary and foreign key support
US6070539A (en) * 1997-03-21 2000-06-06 Case Corporation Variable rate agricultural product application implement with multiple inputs and feedback
US20020013817A1 (en) * 2000-07-07 2002-01-31 Collins Thomas M. Method and apparatus for distributing of e-mail to multiple recipients
US20030050935A1 (en) * 2001-09-07 2003-03-13 Dominika Spetsmann System and method for searching an object catalog subject to a plurality of standards regimes
US20090043786A1 (en) * 2007-08-08 2009-02-12 Schmidt Brian K Network repository for metadata
US20100318489A1 (en) * 2009-06-11 2010-12-16 Microsoft Corporation Pii identification learning and inference algorithm
US20140258341A1 (en) * 2013-03-11 2014-09-11 Business Objects Software Ltd. Automatic file structure and field data type detection

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761656A (en) 1995-06-26 1998-06-02 Netdynamics, Inc. Interaction between databases and graphical user interfaces
US6496831B1 (en) * 1999-03-25 2002-12-17 Lucent Technologies Inc. Real-time event processing system for telecommunications and other applications
US6687711B1 (en) * 2000-12-04 2004-02-03 Centor Software Corporation Keyword and methods for using a keyword
US7379934B1 (en) 2004-07-09 2008-05-27 Ernest Forman Data mapping
US8832048B2 (en) * 2005-12-29 2014-09-09 Nextlabs, Inc. Techniques and system to monitor and log access of information based on system and user context using policies
US7747563B2 (en) 2006-12-11 2010-06-29 Breakaway Technologies, Inc. System and method of data movement between a data source and a destination
US8010502B2 (en) * 2007-04-13 2011-08-30 Harris Corporation Methods and systems for data recovery

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070539A (en) * 1997-03-21 2000-06-06 Case Corporation Variable rate agricultural product application implement with multiple inputs and feedback
US5956725A (en) * 1997-11-26 1999-09-21 Interanational Business Machines Corporation Schema mapping to a legacy table with primary and foreign key support
US20020013817A1 (en) * 2000-07-07 2002-01-31 Collins Thomas M. Method and apparatus for distributing of e-mail to multiple recipients
US20030050935A1 (en) * 2001-09-07 2003-03-13 Dominika Spetsmann System and method for searching an object catalog subject to a plurality of standards regimes
US20090043786A1 (en) * 2007-08-08 2009-02-12 Schmidt Brian K Network repository for metadata
US20100318489A1 (en) * 2009-06-11 2010-12-16 Microsoft Corporation Pii identification learning and inference algorithm
US20140258341A1 (en) * 2013-03-11 2014-09-11 Business Objects Software Ltd. Automatic file structure and field data type detection

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150370917A1 (en) * 2013-02-07 2015-12-24 Hewlett-Packard Development Company, L.P. Formatting Semi-Structured Data in a Database
US11126656B2 (en) * 2013-02-07 2021-09-21 Micro Focus Llc Formatting semi-structured data in a database
US11704331B2 (en) 2016-06-30 2023-07-18 Amazon Technologies, Inc. Dynamic generation of data catalogs for accessing data
US20180150548A1 (en) * 2016-11-27 2018-05-31 Amazon Technologies, Inc. Recognizing unknown data objects
US10621210B2 (en) * 2016-11-27 2020-04-14 Amazon Technologies, Inc. Recognizing unknown data objects
US20200242135A1 (en) * 2016-11-27 2020-07-30 Amazon Technologies, Inc. Recognizing unknown data objects
US11893044B2 (en) * 2016-11-27 2024-02-06 Amazon Technologies, Inc. Recognizing unknown data objects
US11036560B1 (en) 2016-12-20 2021-06-15 Amazon Technologies, Inc. Determining isolation types for executing code portions
US20220058343A1 (en) * 2018-03-23 2022-02-24 Servicenow, Inc. Written-modality prosody subsystem in a natural language understanding (nlu) framework
US20210224085A1 (en) * 2018-11-07 2021-07-22 Citrix Systems, Inc. Preloading of Application on a User Device Based on Content Received by the User Device

Also Published As

Publication number Publication date
AU2014328401A1 (en) 2016-05-05
CA2925426A1 (en) 2015-04-02
EP3049970A4 (en) 2017-04-05
EP3049970A1 (en) 2016-08-03
WO2015042719A1 (en) 2015-04-02
US11868212B2 (en) 2024-01-09
CA2925426C (en) 2021-11-23
AU2014328401B2 (en) 2020-03-12
US20220107868A1 (en) 2022-04-07

Similar Documents

Publication Publication Date Title
US11868212B2 (en) Method and device for identifying an application type of unknown data
US11709823B2 (en) Real time visual validation of digital content using a distributed ledger
RU2598991C2 (en) Data recovery client for moveable client data
US20110083079A1 (en) Apparatus, system, and method for improved type-ahead functionality in a type-ahead field based on activity of a user within a user interface
US10437708B2 (en) System for refreshing and sanitizing testing data in a low-level environment
US20120197926A1 (en) Methods and systems for implementing email recipient templates
CN112954043B (en) Method for identifying user based on website access log and computer equipment
CN109074378B (en) Modular electronic data analysis computing system
CN110704476A (en) Data processing method, device, equipment and storage medium
CN111314063A (en) Big data information management method, system and device based on Internet of things
US11496446B1 (en) Protecting personally identifiable information submitted through a browser
US20190050405A1 (en) Systems and methods for constraint driven database searching
CN110019542B (en) Generation of enterprise relationship, generation of organization member database and identification of same name member
WO2019071907A1 (en) Method for identifying help information based on operation page, and application server
US20190108361A1 (en) Secure access to multi-tenant relational data
US20180183752A1 (en) Method and system for providing additional information relating to primary information
US20120089849A1 (en) Cookie management system and method
US9342530B2 (en) Method for skipping empty folders when navigating a file system
US11080238B2 (en) System and method for interactive visual representation of metadata within a networked heterogeneous workflow environment
US11366796B2 (en) Systems and methods for compressing keys in hierarchical data structures
US20180150752A1 (en) Identifying artificial intelligence content
US10878051B1 (en) Mapping device identifiers
US20160173346A1 (en) Method and device to update contacts
CN112866979A (en) User information association method, device, equipment and medium based on 5G service interface
US11397789B1 (en) Normalizing uniform resource locators

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCV Information on status: appeal procedure

Free format text: NOTICE OF APPEAL FILED

STCV Information on status: appeal procedure

Free format text: APPEAL BRIEF (OR SUPPLEMENTAL BRIEF) ENTERED AND FORWARDED TO EXAMINER

STCV Information on status: appeal procedure

Free format text: EXAMINER'S ANSWER TO APPEAL BRIEF MAILED

AS Assignment

Owner name: MAGNET FORENSICS INC., CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SALIBA, JAD JOHN;REEL/FRAME:052171/0230

Effective date: 20200218

STCV Information on status: appeal procedure

Free format text: ON APPEAL -- AWAITING DECISION BY THE BOARD OF APPEALS

AS Assignment

Owner name: MAGNET FORENSICS INVESTCO INC., CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MAGNET FORENSICS INC.;REEL/FRAME:055019/0067

Effective date: 20200925

AS Assignment

Owner name: ROYAL BANK OF CANADA, CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MAGNET FORENSICS INVESTCO, INC.;REEL/FRAME:057797/0493

Effective date: 20211001

STCV Information on status: appeal procedure

Free format text: BOARD OF APPEALS DECISION RENDERED

AS Assignment

Owner name: ROYAL BANK OF CANADA, CANADA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE NATURE OF CONVEYANCE PREVIOUSLY RECORDED AT REEL: 057797 FRAME: 0493. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:MAGNET FORENSICS INVESTCO, INC.;REEL/FRAME:058037/0964

Effective date: 20211001

STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION

AS Assignment

Owner name: MAGNET FORENSICS INVESTCO, INC., CANADA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:063231/0372

Effective date: 20230404

Owner name: MAGNET FORENSICS INC., CANADA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:063231/0372

Effective date: 20230404