US20150095290A1 - Method and device for identifying an application type of unknown data - Google Patents
Method and device for identifying an application type of unknown data Download PDFInfo
- Publication number
- US20150095290A1 US20150095290A1 US14/498,325 US201414498325A US2015095290A1 US 20150095290 A1 US20150095290 A1 US 20150095290A1 US 201414498325 A US201414498325 A US 201414498325A US 2015095290 A1 US2015095290 A1 US 2015095290A1
- Authority
- US
- United States
- Prior art keywords
- data
- column
- field
- application type
- date
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
Definitions
- Embodiments herein relate to methods, systems and devices for identifying an application type of unknown data, and in particular to methods, systems and devices for identifying an application type of unknown data stored on a computer readable medium such as, for example, a storage device.
- Computing devices typically contain one or more computer readable media (e.g., memory, a hard disk drive, or a solid state drive) on which applications can store data.
- computer readable media e.g., memory, a hard disk drive, or a solid state drive
- data recovery may be performed when there is a failure of the computer readable media that prevents normal access to the data.
- Data recovery may also be performed in the context of forensics. For example, a user may attempt to hide, delete, or obfuscate data on a computer readable medium so that the data may not be readily accessible to someone else. This may be particularly the case where the user is undertaking some illicit or otherwise improper activity and does not want such activity to be discovered. In such scenarios, law enforcement authorities or others may be interested to learn about a user's activities on a particular computing device by recovering data on that computing device.
- data recovery may be performed to allow a party to know how a computing device had been or is being used by a given individual or a group of individuals.
- employers may be interested to learn how their computer resources are being used by their employees.
- parents and/or spouses might be interested to know how members of their families are using a computing device.
- a method of identifying an application type of unknown data comprising: determining that the unknown data corresponds to database information, the database information comprising at least one table with at least one column; for a column of a table in the database information, determining if a column identifier of the column comprises a keyword associated with a particular application type; and if the column identifier comprises the keyword, identifying data stored in the database as belonging to an application that is of the particular application type.
- the keyword is associated with a data field that is commonly used by an application of the particular application type.
- the method includes sampling a data record in the table; and determining that data for the column in the data record is consistent with data for the data field that would belong to an application of the particular application type
- the method includes converting the data in the column in the data record to each of a plurality of date/time formats; comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table.
- the method includes storing a mapping between the data field and the column, the mapping being accessible during recovery of data in the database to indicate that data for the column in the table is associated with the data field.
- the method includes displaying the mapping between the data field and the column in a user interface, wherein the user interface provides an option to select an alternative column of the table to be mapped to the data field; receiving input indicating that the data field is to be mapped to the alternative column; and storing an updated mapping for the data field, the updated mapping indicating that the data field is mapped to the alternative column.
- the particular application type can include a messaging application
- the data field that is commonly used comprises one of: a sender field, a recipient field, a message field, and a timestamp field.
- the particular application type can include a web browser application
- the data field that is commonly used comprises one of: an address field, a date field, a bookmark field, and a title field.
- the particular application type can include a geographic location-enabled application
- the data field that is commonly used comprises one of: a longitude field, a latitude field, a destination field, a direction field, and a route field.
- the particular application type comprises a messaging application
- the keyword comprises one of the following words: message, subject, text, msg, body, content, date, time, timestamp, from, sender, author, uid, member, to, receiver, conversation, recipient, partner, participant, and party.
- the particular application type comprises a web browser application
- the keyword comprises one of the following words: address, location, loc, URL, visited, date, bookmark, favorite and title.
- the particular application type comprises a geographic location-enabled application
- the keyword comprises one of the following words: coordinate, longitude, latitude, location, loc, home, destination, direction, and route.
- a computing device comprising a processor and a memory storing instructions which, when executed by the processor, cause the processor to perform the methods described herein.
- a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the methods described herein.
- the computer readable medium is non-transitory.
- a device comprising at least one processor adapted to perform any one or more of the methods as described herein.
- FIG. 1 is a schematic diagram illustrating a computing device for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment
- FIG. 2 is a flowchart illustrating a method for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment
- FIG. 3 is an exemplary database table containing data that may be stored in a storage device, in accordance with one example embodiment
- FIG. 4 is a flowchart illustrating a method for updating the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment
- FIG. 5 is a screenshot of an example user interface that allows updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment
- FIG. 6 is a screenshot of a data recovery user interface after the updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment.
- embodiments of the methods described herein may be implemented in hardware or software, or a combination of both.
- embodiments may be implemented in one or more computer programs executing on one or more programmable computing devices comprising at least one processor (e.g., a microprocessor), a data storage device (including in some cases volatile and non-volatile memory and/or data storage elements), at least one input device, and at least one output device.
- the programmable computing devices may be a personal computer, laptop, personal data assistant, cellular telephone, smartphone device, tablet computer, and/or wireless device.
- Program code is applied to input data to perform the functions described herein and generate output information.
- the output information is applied to one or more output devices.
- each program may be implemented in a high level procedural or object oriented programming and/or scripting language.
- the programs can be implemented in assembly or machine language, if desired.
- the language may be a compiled or interpreted language.
- the computing devices and methods as described herein may also be implemented as a transitory or non-transitory computer-readable storage medium configured with a computer program, wherein the storage medium so configured causes a computing device to operate in a specific and predefined manner to perform at least some of the functions as described herein.
- the medium may be provided in various forms, including one or more diskettes, compact disks, tapes, chips, wireline transmissions, satellite transmissions, internet transmission or downloadings, magnetic and electronic storage media, digital and analog signals, and the like.
- the computer useable instructions may also be in various forms, including compiled and non-compiled code.
- the subject system may be implemented as one or more software components stored on one or more computer servers that are accessible via one or more client machines in a client-server architecture.
- the system can be considered to be a hosted software offering or a software service in a software-as-a-service deployment.
- the embodiments of the present disclosure relate generally to methods of identifying an application type of unknown data that may be encountered during a data recovery process.
- traditional data recovery processes there is typically a catalog of application data that indicates the data format of data stored by a given application on a storage device.
- this application data is referenced to determine if the unknown data matches the data formats that are indicative of a particular application. If so, the unknown data is processed according to the identified application.
- Such traditional processes may be inefficient because they require analysis of the data structure stored by an application before data associated with the application can be recovered.
- applications for electronic devices are being developed at an increasingly fast rate.
- mobile device applications have become more popular, the number and variety of applications available to users of mobile devices has expanded dramatically.
- the ever-growing number of applications makes analyzing the data format of each application difficult. This results in data formats for many applications not being analyzed. If data stored by these applications are encountered during data recovery, such traditional data recovery processes may not be able to adequately recover the data.
- At least some of the present embodiments may provide a computing device, system or method that allows unknown data (which does not already correspond to a data format of a known application) to still nevertheless be recovered.
- various embodiments may recognize that even though the particular data format for an unanalyzed application may be unknown, the application type of the unanalyzed application may be identified based on certain characteristics of how unknown data is stored. In particular, some embodiments may recognize that certain keywords may be commonly used by applications of a particular application type as an identifier for a column of a table stored in a database.
- some embodiments may recognize that these keywords are being used to identify a column that may indicate that the data for the column corresponds to a common data field stored by applications of the application type.
- the keywords “author” or “from” may be commonly used in chat or instant messaging (IM) type applications to identify a “sender” data field for chat messages stored in the application.
- IM instant messaging
- a computing device 102 may be coupled to a storage device 104 on which the unknown data is stored.
- the computing device 102 may include a processor 110 , a display 112 , a storage device interface 114 , and a memory 116 .
- Processor 110 may be configured to perform the steps of the methods described herein. To perform these steps, in various embodiments, the processor 110 may execute instructions stored on memory 116 . For example, the instructions may be stored in the form of an application-type identification module 120 .
- the application-type identification module 120 may be configured to retrieve keywords from a keyword store 122 , with the keywords being used to analyze the column identifiers of an unknown database 132 stored on the storage device 104 to determine if the data stored therein corresponds to a particular application-type.
- the keyword store 122 is shown as being also stored on memory 116 . However, it will be understood that the keyword store 122 may be stored separately from the memory 116 (e.g., on a hard disk (not shown) or some other local or remote storage).
- a mapping may be stored between the column and a data field that is commonly used by applications of the application type. These mappings may be stored in the column/data field mapping store 140 . In some cases, the initial or a previous mapping determined by the presence of the keywords may be subsequently updated through a user interface 142 provided by the application-type identification module 120 .
- the user interface 142 may be displayed on display 112 .
- Display 112 may be a suitable display device (e.g. a monitor, screen or touchscreen) coupled to the processor 110 .
- the user interface 142 may allow the processor 110 to solicit input from a user that may confirm or update the mapping of a column to a data field, as stored in the column/data field mapping store 140 of memory 116 . Examples screenshots that may be shown in the user interface 142 are illustrated in FIGS. 5 and 6 , and will be discussed in greater detail below.
- the storage device 104 may be coupled to computing device 102 through storage device interface 114 .
- the storage device 104 may have application data stored thereon associated with various known and unknown applications.
- the storage device 104 may include a file system 130 that contains a number of different files.
- one or more of the files 130 may correspond to unknown database information stored in a database 132 .
- At least some of the present embodiments are directed to methods of determining if the data stored within an unknown database 132 includes data that is of a particular application type, e.g., by determining if column identifiers for a database table 134 of the unknown database 132 includes certain keywords.
- a data recovery process may attempt to analyze data that is intended to have been deleted from the storage device 104 (e.g., if data recovery is being performed in a forensics context by law enforcement officers). For example, a user may use the “delete” function of an operating system to delete a file, but such file may nevertheless still be recoverable despite having been “deleted”. This is because many operating systems and/or device driver software may not physically delete the data from the storage device 104 immediately when a command to delete such data is received. Instead, the addresses on the storage device 104 that stores such data may simply be marked as “unallocated” or “available”.
- Such indications inform the operating system or other applications that these addresses are now available to store other data, so that the old data may subsequently be overwritten and thereby deleted when there are new data stored in such addresses. Since such data may not actually be overwritten, it is possible that the data flagged to be deleted may remain physically undeleted from the storage device 104 for an extended period of time even though it had been requested to be deleted by the application or the user (or both).
- the computing device 102 may be provided in the form of personal computers, networked computers, portable computers, portable electronic devices, personal digital assistants, laptops, desktops, mobile phones, smart phones, tablets, and so on.
- the processor 110 may be any type of processor, such as, for example, any type of general-purpose microprocessor or microcontroller, a digital signal processing (DSP) processor, an application-specific integrated circuit (ASIC), a programmable read-only memory (PROM), or any combination thereof.
- DSP digital signal processing
- ASIC application-specific integrated circuit
- PROM programmable read-only memory
- the memory 116 may include any type of computer memory that is located either internally or externally to the computing device 150 such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), electrically-erasable programmable read-only memory (EEPROM), a hard disk drive, a solid-state drive or any other form of suitable computer readable medium that may be used in electronic devices.
- RAM random-access memory
- ROM read-only memory
- CDROM compact disc read-only memory
- electro-optical memory magneto-optical memory
- EPROM erasable programmable read-only memory
- EEPROM electrically-erasable programmable read-only memory
- hard disk drive a solid-state drive or any other form of suitable computer readable medium that may be used in electronic devices.
- computing device 102 may include one or more input devices (not shown), such as a keyboard, mouse, camera, touch screen and/or a microphone, and may also include one or more output devices such as a display screen 112 and/or a speaker.
- Computing device 102 may have a network interface for connecting to a network (not shown) in order to communicate with other components.
- each of data stores 122 , 140 are illustrated in FIG. 1 separately, they can be stored together as separate tables within the same or multiple databases both locally and/or remotely. Additionally, other persistent storage methods such as encrypted files may also be used to provide persistent storage.
- the storage device interface 114 of the computing device 102 may be any type of hardware or software interface that allows the computing device to communicate with the storage device 104 .
- the storage device interface 114 may be one or more of the following interfaces: Parallel AT Attachment (PATA), Serial AT Attachment (SATA), Integrated Drive Electronics (IDE), Enhanced Integrated Drive Electronics (EIDE), Small Computer System Interface (SCSI), Universal Serial Bus (any version), FireWire and/or Thunderbolt.
- PATA Parallel AT Attachment
- SATA Serial AT Attachment
- IDE Integrated Drive Electronics
- EIDE Enhanced Integrated Drive Electronics
- SCSI Small Computer System Interface
- USB any version
- FireWire and/or Thunderbolt any version
- the storage device interface 114 may allow communication with a storage device 114 which is provided remotely (e.g., via Network-Attached Storage (NAS) and/or Storage Area Network (SAN) mechanisms) by acting as a client to a server that provides access to the storage device 104 .
- NAS Network-Attached Storage
- SAN Storage Area Network
- the storage device 104 on which the unknown data is stored may include any type of the computer readable media that is to be the subject of the data analysis methods described herein, including the types of memory that are listed above as being options for the memory 116 .
- FIG. 2 shown there generally as 200 is a method for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment.
- the components of the computing device 102 such as the processor 110 , may be configured to execute one or more steps of the method 200 to identify an application type of unknown data.
- the components of the computing device 102 such as the processor 110
- the components of the computing device 102 may be configured to execute one or more steps of the method 200 to identify an application type of unknown data.
- the components of the computing device 102 such as the processor 110 , may be configured to execute one or more steps of the method 200 to identify an application type of unknown data.
- FIG. 3 shown generally as 134 in FIG. 3 .
- the storage device 104 may be formatted according to a known file system that the application type identification module 120 (as shown in FIG. 1 ) is configured to access and conduct a search on.
- the storage device 104 may have been formatted for use with various operating systems such as MicrosoftTM WindowsTM, LinuxTM, Apple OS XTM, Apple iOSTM and/or AndroidTM, and the file systems that can be processed by the application-type identification module 120 may include the following file systems: File Allocation Table 32 (FAT32), New Technology File System (NTFS), third extended file system (ext3), fourth extended file system (ext4), Hierarchical File System (HFS) and/or Hierarchical File System Plus (HFS+ or HFSX),
- FAT32 File Allocation Table 32
- NTFS New Technology File System
- ext3 third extended file system
- ext4 fourth extended file system
- HFS Hierarchical File System
- HFS+ or HFSX Hierarchical File System Plus
- step 210 it is determined whether a file has been found. If no file has been filed (the ‘NO’ branch at step 210 ), then there may be no remaining files to be processed by method 200 . If at least one of the located files were determined to contain database information and also resulted in at least one mapping between a column and a data field, method 200 may then proceed to display the mapping user interface at step 215 .
- computing device 102 may display the mapping user interface via display 112 (as shown in FIG. 1 ). As will be discussed below, the mapping user interface may allow a user using the computing device 102 to confirm or alter the mapping between a column and a data field determined by the method of FIG. 2 . The steps associated with providing such a user interface are shown in FIG. 4 .
- step 215 may not be performed. The process may then end and a message may be displayed that indicates that the method of FIG. 2 was not able to identify the application type of the data stored on the storage device 104 .
- the computing device 102 may open the file for read access and the data in the file may be read to attempt to determine if the file includes database information (step 220 ).
- the information indicating that a file is a database may not necessarily be within the header portion of a file and instead, could be located in other parts of the data associated with the file.
- the file extension of a file may indicate that a file is of a particular database type (e.g., the file extensions “.mdb”, “.mda”, or “.accdb” may indicate that the given file is a MicrosoftTM AccessTM database).
- the determination made at step 225 may identify database information that does not necessarily include the entire contents of a given database.
- the database information may include any portion of the contents of a database (e.g., as may be the case if the remaining contents of the database have already been overwritten or is otherwise unavailable).
- step 225 If the file is determined not to include database information (the ‘NO’ branch at step 225 ), then method 200 returns to step 205 where it may again begin the process of determining whether there are any files remaining to be processed.
- the application-type identification module 120 may also store data to a report database (not shown) indicating that the previous file was processed and had been identified as not containing a database.
- step 225 If the file is determined to include database information (the ‘YES’ branch at step 225 ), then the tables accessible within the database are processed and method 200 proceeds to step 230 .
- the method 200 determines if there are any tables that are still unprocessed in the database. If there are not (the ‘NO’ branch at step 230 ), then the method 200 returns to step 205 where it may again begin the process of determining whether there are any files remaining on the storage device 104 to be processed. If there are still tables to be processed (the ‘YES’ branch at step 230 ), then the next table in the database information is read at step 235 .
- step 240 it is determined if there are any columns of the table that are still unprocessed. If there are not (the ‘NO’ branch at step 240 ), then the table has been processed and method 200 returns to step 230 to determine whether the database information contains any further tables that are still unprocessed.
- the column identifier of the next column is read at step 245 .
- the column identifier may be the name of the column used in a database.
- the column identifier information may be retrieved from the schema of the database.
- the column identifier may be found in header data of the database table, however the column identifier could also be located in other parts of the data associated with the column.
- step 250 determines whether the column identifier read at step 245 includes a keyword associated with a particular application type. For example, in the case where the column identifier is a column name, this step may involve determining if the column names include keywords that are commonly used as column names by applications of the particular application type. The presence of such keywords in the column name may be taken as an indication that the database belongs to an application of that particular application type.
- the present embodiments may be able to determine that unknown data is of an application type that includes, without limitation: a chat or instant messaging application type, a web browser application type, a navigation/geo-location application type, a file sharing application type, a social networking application type, a cloud application type, and an email application type.
- an application type that includes, without limitation: a chat or instant messaging application type, a web browser application type, a navigation/geo-location application type, a file sharing application type, a social networking application type, a cloud application type, and an email application type.
- keywords that may be used to identify data as belonging to an application of a chat or instant messaging application type may include the words: ‘message’, ‘subject’, ‘text’, ‘msg’, ‘body’, ‘content’, ‘date’, ‘time’, ‘timestamp’, ‘from’, ‘sender’, ‘author’, ‘uid’, ‘member’, ‘to’, ‘receiver’, ‘conversation’, ‘recipient’, ‘partner’, ‘participant’, and ‘party’.
- keywords that may be used to identify data as belonging to an application as belonging to an application of a web browser type may include the words: ‘address’, ‘location’, ‘loc’, ‘URL’, ‘visited’, ‘date’, ‘bookmark’, ‘favorite’ and ‘title’.
- keywords that may be used to identify data as belonging to an application of a navigation/geographic-location application type include: ‘coordinate’, ‘longitude’, ‘latitude’, ‘location’, ‘loc’, ‘home’, ‘destination’, ‘direction’, and ‘route’.
- Data table 134 includes a table identifier 302 (e.g., a table name such as ‘Messages’ or some other alphanumeric identifier), and a series of columns with respective column identifiers 312 , 314 , 316 , 318 , 320 .
- table identifier 302 e.g., a table name such as ‘Messages’ or some other alphanumeric identifier
- a particular column with column identifier ‘Message_id’ 312 may be being processed.
- control returns to step 240 where it is again determined whether there are any columns in the table that are still unprocessed.
- a column identifier does include a keyword associated with an application type (the ‘YES’ branch at step 250 ) If it is determined that a column identifier does include a keyword associated with an application type (the ‘YES’ branch at step 250 ), then the process has determined that the file includes database information from a particular application type.
- a given keyword used to identify data stored in a database as belonging to an application may also be associated with a data field that is commonly used by an application of the particular application type.
- the keywords ‘message’, ‘subject’, ‘text’, ‘msg’, ‘body’ and ‘content’ may all be commonly used to identify a data field for the substance or “content” of the message in an application of a Chat/IM application type.
- the ‘date’, ‘time’, and ‘timestamp’ keywords may be commonly used to identify a data/time field for the date and time of a message in an application of a Chat/IM application type.
- the keywords ‘from’, ‘sender’, ‘author’, ‘uid’, and ‘member’ may be commonly used to identify a data field for the sender of a message in an application of a Chat/IM application type.
- the keywords ‘to’, ‘receiver’, ‘conversation’, ‘recipient’, ‘partner’, ‘participant’, and ‘party’ may be commonly used to identify a data field for a recipient of a message in an application of a Chat/IM application type.
- keywords associated with other data fields depending on the nature of the application type that is attempting to be determined.
- the keywords ‘address’, ‘location’, ‘URL’, or ‘visited’ may all be commonly used to identify a data field for the address field in a web browser application.
- a data record from the table may be sampled to determine if the data for that column in the data record is consistent with data that would belong (e.g., generated by, or otherwise associated with) to that data field by an application of the application type.
- This may be performed in a number of ways. For example, in the case where the keyword corresponds to a data field that is supposed to contain the “content” of a message, the data for the data column in the data record may be compared to words in a dictionary that have been previously recognized as commonly being present in the content of a message.
- these words may include ‘hello’, ‘hi’, ‘hey’, ‘bye’, ‘see’, ‘you’, ‘soon’ and/or ‘later’.
- a regular expression can be created to recognize a string of text or numeric values as potentially being data of the given data field (e.g., a potential GPS coordinate).
- heuristics may be developed based on historical experience of what data for a given data field contains, and these heuristics may be used to confirm that data for a given column appears as expected. It will be understood that various other ways of performing this step may be possible.
- Data record 360 stored in data table 134 may be sampled to determine whether the data in a particular column of data record 360 , such as data 368 for the column with column identifier ‘text’ 318 , is consistent with data that is expected for the data field that the column has mapped to (i.e., the data field for the ‘content’ of the message). As shown, such data 368 includes the text “hey hey hey”.
- the data 368 is consistent with that which is expected for the data field for the “content” of the message (e.g., as that which would have been generated by an application belonging to the Chat/IM application type).
- the date/time format for the data of the column may be determined. For example, this may be performed by converting the date/time value to various formats and performing boundary checks to identify the particular date/time format that the data in the column is most likely to be formatted in. Specifically, this may involve converting the data in the column in the sampled data record to each of a plurality of date/time formats; comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table.
- the reference date/time may be the present date/time (e.g., the date/time when data recovery process is being performed). Additionally or alternatively, the reference date/time may be a predefined date/time of a particular event (e.g., if data recovery is being performed for forensics purposes, the date/time of a criminal activity such as a murder).
- only the post-conversion date/time data that is within a specific date/time window may be used to compare with each other. This may reduce the amount of comparisons that need to be performed if it is known that it is unlikely that the date/time data that is being sampled will be beyond the specific date/time window. For example, if data converted to a given date/time format results in data that is beyond the specific date/time window (e.g., earlier or later than the window), then it can be determined that such given date/time format is unlikely to be the correct date/time format that the data is actually formatted in.
- a specific date/time window e.g., +/ ⁇ 7 years of the reference date/time
- any date/time format may be supported.
- some example date/time formats that may be supported include: Unix epoch time—seconds, Unix epoch time—milliseconds, PRTime, Mac Absolute Time, and/or Chrome/webkit time.
- method 200 returns to step 240 where it is again determined whether there are any columns remaining in the table that are still unprocessed.
- steps 255 to 260 are optional in that they need not be performed. When they are performed, however, the acts may provide a confirmation that the column with the column identifier having a keyword does in fact corresponds to the data field associated with the keyword. In this way, steps 255 and 260 may be considered a “sanity check” that verifies the conclusion arrived at in step 250 .
- step 250 illustrated there are several columns for which the performance of steps 255 to 260 may result in the conclusion arrived at in step 250 not being confirmed.
- the determination at step 250 may have been that because the column identifier contains the keyword ‘message’, that the column corresponds to the data field for the “content” of a message, as would be generated by an application belonging to the Chat/IM application type.
- it may be determined that the data 362 for that column does not contain any of the words in the dictionary that have been previously recognized as indicating that the data constitutes the “content” of a message.
- steps 255 to 260 may reduce the likelihood of erroneous mappings that are determined based on the results of step 250 alone.
- Step 265 the data stored in the database is identified as belonging to an application of the particular application type identified in step 250 .
- Step 265 may involve the application-type identification module 120 (as shown in FIG. 1 ) storing information indicating that the unknown data encountered on the storage device 104 belongs to the particular application type for which the unknown data was analyzed (e.g., a Chat/IM application type).
- application-type identification module 120 may attempt to identify an application identifier (e.g., the name of the application). For example, this may be performed by using operating system application manifests (e.g., as may be separately found on file system 130 of storage device 104 , apart from the database 132 ), or via the text found in file path location (e.g., such text may be found in the file path of where the database 132 is located on the file system 130 of storage device 104 ). If the application name is available, when performing step 265 in FIG. 2 , an association between the name of the application and the application type as identified by the method of FIG. 2 may be stored.
- an application identifier e.g., the name of the application. For example, this may be performed by using operating system application manifests (e.g., as may be separately found on file system 130 of storage device 104 , apart from the database 132 ), or via the text found in file path location (e.g., such text may be found in the file path of where the database 132 is located on
- a mapping may be stored between the column being processed and the commonly used data field that the sampled data of the column was determined to be consistent with in step 260 .
- the mapping may be stored in column/data field mapping store 140 .
- the mappings may, for example, be subsequently referenced when recovering data from the unknown database 132 .
- the mappings may also be subsequently used when recovering data from another storage device 104 containing unknown data, so that if similar database information is encountered, the mappings can be referenced to identify the type of data that is stored in the database.
- step 270 method 200 returns to step 240 to determine whether there are any columns still unprocessed. If method 200 determines that there are no columns still unprocessed, and no tables still unprocessed at step 230 , it will return to step 205 where it will continue to search for files. If no files are found in step 210 , method 200 may proceed to step 215 and display a mapping user interface that may allow updating of the mapping between a column and a given data field.
- the mapping user interface may display a list of the located databases tables and mappings of columns of such tables to the commonly used data fields for a given application type, so as to allow user input for final verification or remapping if necessary. A method and user interface for performing such remapping or verification is discussed below with respect to FIGS. 4 and 5 .
- step 265 (to identify data stored in the database as belonging to an application of the particular application type) may be performed immediately after it has been determined that the column identifier 250 includes a keyword associated with the application type, and before a data record is sampled at steps 255 to 260 .
- some of the steps of method 200 may be executed in parallel.
- Parallel execution of some steps may be desirable in systems that have more than one processor or a processor that has more than one processing core.
- one or more cores may be focused on executing step 250 to identify whether a column identifier contains a keyword associated with an application type, and one or more other cores may be focused on sampling a data record from the table to determine whether the data is consistent with data that would be generated by an application of the application type.
- Parallel execution may also allow the computing device 102 to process more than one table or more than one column simultaneously.
- method 200 may also include a step of checking a reference database (e.g., the column/data field mapping store 140 shown in FIG. 1 ) containing mappings and database information from previous executions of the method of FIG. 2 .
- a reference database e.g., the column/data field mapping store 140 shown in FIG. 1
- the previously stored mapping may be applied and the data from the database being processed may be automatically recovered to be presented later.
- the method may return directly to step 205 to search for any remaining files that need to be processed, and the mapping need not be presented to the user for verification in accordance with method 400 .
- FIG. 4 shown there generally as 400 is a method 400 for updating a mapping between a data field and a column, in accordance with one example embodiment.
- FIG. 5 shows generally as 500 , a screenshot of an example user interface that allows updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment.
- the components of the computing device 102 such as the processor 110 and the display 112 , may be configured to execute one or more steps of the method 400 .
- method 400 may be initiated at step 215 of the method 200 of FIG. 2 where a mapping user interface is displayed.
- the method 400 starts at step 405 where a mapping between a data field and a column of a table is displayed in a user interface.
- the mapping may be retrieved from the column/data field mapping store 140 shown in FIG. 1 .
- the mapping may have been stored as a result of step 270 of method 200 in FIG. 2 .
- the user interface 500 is for an execution of method 200 of FIG. 2 that attempts to determine if unknown data corresponds to a Chat/IM application type. Accordingly, the user interface provides a number of data fields that are commonly used by an application of a Chat/IM application type. Specifically, there is a ‘sender’ data field 514 (shown in FIG. 5 with the text “Identified Sender Column”), a ‘recipient’ data field 516 (shown in FIG. 5 with the text “Identified Recipient Column”), a ‘content’ data field 518 (shown in FIG. 5 with the text “Identified Message Column”), and a ‘date/time’ data field 520 (shown in FIG. 5 with the text “Identified Date Column”).
- a ‘sender’ data field 514 shown in FIG. 5 with the text “Identified Sender Column”
- a ‘recipient’ data field 516 shown in FIG. 5 with the text “Identified Recipient Column”
- a ‘content’ data field 518 shown in FIG. 5 with
- the user interface 500 of FIG. 5 shows, as rows, each of the tables 134 that have been processed in FIG. 2 .
- the user interface shows the column identifier of a column in the table that has been determined to be mapped to a particular data field 514 , 516 , 518 , 520 for the application type.
- the user interface 500 shows information for the database table 134 illustrated in FIG. 3 . Referring simultaneously to FIG. 3 , it can be seen that the mapping (e.g., as may have been stored at step 270 in FIG. 2 ) has determined that the column with column identifier ‘author_id’ 314 corresponds to the ‘sender’ data field 514 .
- the column with column identifier ‘conversation_id’ 316 has been determined to correspond to the ‘recipient’ data field 516
- the column with column identifier ‘timestamp’ 320 has been determined to correspond to the ‘date/time’ data field 520
- the column with the column identifier ‘text’ 318 has been determined to correspond to the ‘content’ data field 518 .
- the user interface 500 may also display the determined application identifier 530 for a given table 134 (e.g., “com.google.android.apps.plus”) and the table identifier 535 for a given table 134 (e.g., the table name “messages” 302 ) if such information is available and has been determined.
- the determined application identifier 530 for a given table 134 e.g., “com.google.android.apps.plus”
- the table identifier 535 for a given table 134 e.g., the table name “messages” 302 .
- the determined date/time format of the data may also be shown in the user interface 500 . As illustrated, this is provided as an additional column 550 positioned beside the ‘date/time’ field 520 .
- the user interface 500 shows that the data for the column with the column identifier ‘timestamp’ 320 as being determined to be of a ‘PRTime’ date/time format 555.
- user interface 500 has a preview section 560 that displays how the data from a data record of the table would be presented, based on the mappings.
- the data record 360 illustrated in FIG. 3 may be shown in the preview section 560 according to the mappings illustrated for the database table 134 .
- the data 364 for the ‘Author_id’ column 314 is provided under ‘From’ in the preview section 560 because the ‘Author_id’ column 314 is mapped to the ‘sender’ data field 514 .
- the data 366 for the ‘Conversation_id’ column 316 is provided under ‘To’ in the preview section 560 because the ‘Conversation_id’ column 316 is mapped to the ‘recipient’ data field 516 .
- the data 368 for the ‘text’ column 318 is provided under ‘Message’ in the preview section 560 because the ‘text’ column 318 is mapped to the ‘content’ data field 518
- the data 370 for the ‘timestamp’ column 320 is provided under Date/Time′ in the preview section 560 because the ‘timestamp’ column 320 is mapped to the ‘date/time’ data field 520 .
- the preview section 560 may be configured to display the data according to the date/time format that has been determined in the method of FIG. 2 above. For example, as illustrated, because the date/time format of the date/time data 370 has been determined to be ‘PRTime’ (e.g., as illustrated at 555 of user interface 500 ), the date/time shown would be the date/time data 370 after it has been converted to the ‘PRTime’ format. If the resultant post-conversion date/time data appears to be incorrect in the preview section 560 to a user, user input may be received via the user interface control 555 (e.g., the indicated combo box may be selected), and an alternative date/time format may be chosen. The preview section 360 may then be updated to display the date/time data formatted according to the alternative date/time format.
- PRTime e.g., as illustrated at 555 of user interface 500
- step 410 input from the user interface may be received indicating that a data field is to be mapped to an alternative column of the table.
- the various mappings of the column identifiers 314 , 316 , 318 , 320 of data table 134 are provided within drop-down controls that may receive such input selecting an alternative column of the table to map to the particular data field 514 , 516 , 518 , 520 .
- the list within the drop down box may be populated with all the column identifiers of the other columns of the table, so that other column identifiers for the table may be selected to be mapped to a given data field 514 , 516 , 518 , 520 .
- the drop-down boxes may be activated if a user determines that a given mapping is incorrect. For example, upon seeing that the ‘recipient’ data field 516 being mapped to the column with the ‘conversation_id’ column identifier results in the number ‘1’ being identified as a recipient of message in a Chat/IM application (e.g., as is illustrated in under ‘To’ in the preview section 560 ), the user may select an alternative column to be mapped to the ‘recipient’ data field 516 .
- the preview section 560 in response to receiving user input indicating an alternative mapping, may be updated to correspond to the alternative mapping. In some instances the preview section 560 may be updated to allow the user to consider whether the selected mapping is appropriate, or whether the mapping should be changed again.
- an updated mapping for the data field to the alternative column may be stored.
- the processor 110 may update the mapping stored in column/data field mapping store 140 to correspond to the alternative column.
- method 400 may be executed multiple times for an application before the user has determined the appropriate mapping for that application.
- illustrated therein generally as 600 is a screenshot of a data recovery user interface after the updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment.
- the computing device 102 may attempt to recover the data from the respective unknown databases, using the stored mappings. For example, the computing device 102 may generate a report or case file that contains the results of the data extracted using the stored mappings.
- the data recovery user interface 600 may be displayed after the mapping has been updated according to the method 400 of FIG. 4 . Additionally or alternatively, the data recovery interface 600 may also be presented to a user immediately after the initial mappings are stored upon the completion of the method 200 of FIG. 2 .
- a number of different tables 134 that have been analyzed may be selected in the left-hand pane.
- the records for that table can then be populated into the top-right pane (e.g., as illustrated, there are 43 data records within the table).
- the data within each of the data records are then displayed in accordance with the mapping determined by method 200 of FIG. 2 and/or method 400 of FIG. 4 .
- the data contained therein for each data record are shown under each respective data field 514 , 516 , 518 , 520 .
- the user interface 600 also includes a preview section 660 which displays a preview of how the data of a selected data record 360 would appear based on the mappings.
- the example data record 360 shown in FIG. 3 is again shown.
- the data 364 for the ‘Author_id’ column 314 is shown as being associated with the ‘sender’ data field 514 .
- the data 366 for the ‘Conversation_id’ column is shown as being associated with the ‘recipient’ data field 516 .
- the data 368 for the ‘text’ column 318 is shown as being associated with the ‘content’ data field 518
- the data 370 for the ‘timestamp’ column 320 is shown as being associated with the ‘date/time’ data field 520 .
- the preview section 660 may also provide an indication of the table identifier 302 (e.g., the table name ‘messages’).
- X and/or Y is intended to mean X or Y or both.
- X, Y, and/or Z is intended to mean X or Y or Z or any combination thereof.
Abstract
Description
- This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/883,279, filed Sep. 27, 2013, the entire contents of which are hereby incorporated by reference herein for all purposes.
- Embodiments herein relate to methods, systems and devices for identifying an application type of unknown data, and in particular to methods, systems and devices for identifying an application type of unknown data stored on a computer readable medium such as, for example, a storage device.
- Computing devices (e.g., desktop or laptop computers, and mobile devices such as smartphones or tablet computers) typically contain one or more computer readable media (e.g., memory, a hard disk drive, or a solid state drive) on which applications can store data.
- In various situations, it may be desirable to recover data from the computer readable media. For example, data recovery may be performed when there is a failure of the computer readable media that prevents normal access to the data.
- Data recovery may also be performed in the context of forensics. For example, a user may attempt to hide, delete, or obfuscate data on a computer readable medium so that the data may not be readily accessible to someone else. This may be particularly the case where the user is undertaking some illicit or otherwise improper activity and does not want such activity to be discovered. In such scenarios, law enforcement authorities or others may be interested to learn about a user's activities on a particular computing device by recovering data on that computing device.
- In a further example, data recovery may be performed to allow a party to know how a computing device had been or is being used by a given individual or a group of individuals. For example, employers may be interested to learn how their computer resources are being used by their employees. Similarly, parents and/or spouses might be interested to know how members of their families are using a computing device.
- According to some aspects, there is provided a method of identifying an application type of unknown data, the method comprising: determining that the unknown data corresponds to database information, the database information comprising at least one table with at least one column; for a column of a table in the database information, determining if a column identifier of the column comprises a keyword associated with a particular application type; and if the column identifier comprises the keyword, identifying data stored in the database as belonging to an application that is of the particular application type.
- In various embodiments, the keyword is associated with a data field that is commonly used by an application of the particular application type.
- In various embodiments, the method includes sampling a data record in the table; and determining that data for the column in the data record is consistent with data for the data field that would belong to an application of the particular application type
- In various embodiments, the method includes converting the data in the column in the data record to each of a plurality of date/time formats; comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table.
- In various embodiments, the method includes storing a mapping between the data field and the column, the mapping being accessible during recovery of data in the database to indicate that data for the column in the table is associated with the data field.
- In various embodiments, the method includes displaying the mapping between the data field and the column in a user interface, wherein the user interface provides an option to select an alternative column of the table to be mapped to the data field; receiving input indicating that the data field is to be mapped to the alternative column; and storing an updated mapping for the data field, the updated mapping indicating that the data field is mapped to the alternative column.
- In various embodiments, the particular application type can include a messaging application, and the data field that is commonly used comprises one of: a sender field, a recipient field, a message field, and a timestamp field.
- In various embodiments, the particular application type can include a web browser application, and the data field that is commonly used comprises one of: an address field, a date field, a bookmark field, and a title field.
- In various embodiments, the particular application type can include a geographic location-enabled application, and the data field that is commonly used comprises one of: a longitude field, a latitude field, a destination field, a direction field, and a route field.
- In various embodiments, the particular application type comprises a messaging application, and the keyword comprises one of the following words: message, subject, text, msg, body, content, date, time, timestamp, from, sender, author, uid, member, to, receiver, conversation, recipient, partner, participant, and party.
- In various embodiments, the particular application type comprises a web browser application, and the keyword comprises one of the following words: address, location, loc, URL, visited, date, bookmark, favorite and title.
- In various embodiments, the particular application type comprises a geographic location-enabled application, and the keyword comprises one of the following words: coordinate, longitude, latitude, location, loc, home, destination, direction, and route.
- According to some other aspects, there is provided a computing device comprising a processor and a memory storing instructions which, when executed by the processor, cause the processor to perform the methods described herein.
- According to some other aspects, there is provided a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the methods described herein. In various embodiments, the computer readable medium is non-transitory.
- According to some other aspects, there is provided a system adapted to perform any one or more of the methods as described herein.
- According to some other aspects, there is provided a device comprising at least one processor adapted to perform any one or more of the methods as described herein.
- Some embodiments will now be described, by way of example only, with reference to the following drawings, in which:
-
FIG. 1 is a schematic diagram illustrating a computing device for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment; -
FIG. 2 is a flowchart illustrating a method for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment; -
FIG. 3 is an exemplary database table containing data that may be stored in a storage device, in accordance with one example embodiment; -
FIG. 4 is a flowchart illustrating a method for updating the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment; -
FIG. 5 is a screenshot of an example user interface that allows updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment; and -
FIG. 6 is a screenshot of a data recovery user interface after the updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment. - For simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements or steps. In addition, numerous specific details are set forth in order to provide a thorough understanding of the exemplary embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the embodiments generally described herein.
- Furthermore, this description is not to be considered as limiting the scope of the embodiments described herein in any way, but rather as merely describing the implementation of various embodiments.
- The embodiments of the methods described herein may be implemented in hardware or software, or a combination of both. In some cases, embodiments may be implemented in one or more computer programs executing on one or more programmable computing devices comprising at least one processor (e.g., a microprocessor), a data storage device (including in some cases volatile and non-volatile memory and/or data storage elements), at least one input device, and at least one output device. For example and without limitation, the programmable computing devices may be a personal computer, laptop, personal data assistant, cellular telephone, smartphone device, tablet computer, and/or wireless device. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices.
- In some embodiments, each program may be implemented in a high level procedural or object oriented programming and/or scripting language. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language.
- In some embodiments, the computing devices and methods as described herein may also be implemented as a transitory or non-transitory computer-readable storage medium configured with a computer program, wherein the storage medium so configured causes a computing device to operate in a specific and predefined manner to perform at least some of the functions as described herein. The medium may be provided in various forms, including one or more diskettes, compact disks, tapes, chips, wireline transmissions, satellite transmissions, internet transmission or downloadings, magnetic and electronic storage media, digital and analog signals, and the like. The computer useable instructions may also be in various forms, including compiled and non-compiled code.
- Moreover, the subject system may be implemented as one or more software components stored on one or more computer servers that are accessible via one or more client machines in a client-server architecture. In such case, the system can be considered to be a hosted software offering or a software service in a software-as-a-service deployment.
- The embodiments of the present disclosure relate generally to methods of identifying an application type of unknown data that may be encountered during a data recovery process. In traditional data recovery processes, there is typically a catalog of application data that indicates the data format of data stored by a given application on a storage device. When unknown data is encountered, this application data is referenced to determine if the unknown data matches the data formats that are indicative of a particular application. If so, the unknown data is processed according to the identified application.
- Such traditional processes, however, may be inefficient because they require analysis of the data structure stored by an application before data associated with the application can be recovered. As computer technology advances and becomes more prevalent, applications for electronic devices are being developed at an increasingly fast rate. In particular, as mobile device applications have become more popular, the number and variety of applications available to users of mobile devices has expanded dramatically. The ever-growing number of applications makes analyzing the data format of each application difficult. This results in data formats for many applications not being analyzed. If data stored by these applications are encountered during data recovery, such traditional data recovery processes may not be able to adequately recover the data.
- At least some of the present embodiments may provide a computing device, system or method that allows unknown data (which does not already correspond to a data format of a known application) to still nevertheless be recovered.
- Viewed at a high-level, according to the teachings herein, various embodiments may recognize that even though the particular data format for an unanalyzed application may be unknown, the application type of the unanalyzed application may be identified based on certain characteristics of how unknown data is stored. In particular, some embodiments may recognize that certain keywords may be commonly used by applications of a particular application type as an identifier for a column of a table stored in a database.
- Additionally, some embodiments may recognize that these keywords are being used to identify a column that may indicate that the data for the column corresponds to a common data field stored by applications of the application type. For example, the keywords “author” or “from” may be commonly used in chat or instant messaging (IM) type applications to identify a “sender” data field for chat messages stored in the application.
- Referring now to
FIG. 1 , illustrated therein generally as 100 is a block diagram showing a computing device for identifying an application type of unknown data stored in a storage device in accordance with one example embodiment. As shown, acomputing device 102 may be coupled to astorage device 104 on which the unknown data is stored. Thecomputing device 102 may include aprocessor 110, adisplay 112, astorage device interface 114, and amemory 116. -
Processor 110 may be configured to perform the steps of the methods described herein. To perform these steps, in various embodiments, theprocessor 110 may execute instructions stored onmemory 116. For example, the instructions may be stored in the form of an application-type identification module 120. - During execution, the application-
type identification module 120 may be configured to retrieve keywords from akeyword store 122, with the keywords being used to analyze the column identifiers of anunknown database 132 stored on thestorage device 104 to determine if the data stored therein corresponds to a particular application-type. For ease of illustration, thekeyword store 122 is shown as being also stored onmemory 116. However, it will be understood that thekeyword store 122 may be stored separately from the memory 116 (e.g., on a hard disk (not shown) or some other local or remote storage). - As is discussed in greater detail below, if it is determined that a column identifier includes a keyword associated with an application type, a mapping may be stored between the column and a data field that is commonly used by applications of the application type. These mappings may be stored in the column/data
field mapping store 140. In some cases, the initial or a previous mapping determined by the presence of the keywords may be subsequently updated through auser interface 142 provided by the application-type identification module 120. - In various embodiments, the
user interface 142 may be displayed ondisplay 112.Display 112, for example, may be a suitable display device (e.g. a monitor, screen or touchscreen) coupled to theprocessor 110. Theuser interface 142 may allow theprocessor 110 to solicit input from a user that may confirm or update the mapping of a column to a data field, as stored in the column/datafield mapping store 140 ofmemory 116. Examples screenshots that may be shown in theuser interface 142 are illustrated inFIGS. 5 and 6 , and will be discussed in greater detail below. - The
storage device 104 may be coupled tocomputing device 102 throughstorage device interface 114. Thestorage device 104 may have application data stored thereon associated with various known and unknown applications. In various embodiments, thestorage device 104 may include afile system 130 that contains a number of different files. In some situations, one or more of thefiles 130 may correspond to unknown database information stored in adatabase 132. At least some of the present embodiments are directed to methods of determining if the data stored within anunknown database 132 includes data that is of a particular application type, e.g., by determining if column identifiers for a database table 134 of theunknown database 132 includes certain keywords. - It will be understood that in certain situations, a data recovery process may attempt to analyze data that is intended to have been deleted from the storage device 104 (e.g., if data recovery is being performed in a forensics context by law enforcement officers). For example, a user may use the “delete” function of an operating system to delete a file, but such file may nevertheless still be recoverable despite having been “deleted”. This is because many operating systems and/or device driver software may not physically delete the data from the
storage device 104 immediately when a command to delete such data is received. Instead, the addresses on thestorage device 104 that stores such data may simply be marked as “unallocated” or “available”. Such indications inform the operating system or other applications that these addresses are now available to store other data, so that the old data may subsequently be overwritten and thereby deleted when there are new data stored in such addresses. Since such data may not actually be overwritten, it is possible that the data flagged to be deleted may remain physically undeleted from thestorage device 104 for an extended period of time even though it had been requested to be deleted by the application or the user (or both). - Accordingly, when analyzing the data that is stored on a
storage device 104 in the present embodiments, it is possible that the data being analyzed is intended to have been deleted, but nevertheless remains readable from thestorage device 104. - It will be understood that the different components shown in
FIG. 1 can be provided in a variety of ways. For example, thecomputing device 102 may be provided in the form of personal computers, networked computers, portable computers, portable electronic devices, personal digital assistants, laptops, desktops, mobile phones, smart phones, tablets, and so on. - The
processor 110 may be any type of processor, such as, for example, any type of general-purpose microprocessor or microcontroller, a digital signal processing (DSP) processor, an application-specific integrated circuit (ASIC), a programmable read-only memory (PROM), or any combination thereof. - Similarly, the
memory 116 may include any type of computer memory that is located either internally or externally to the computing device 150 such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), electrically-erasable programmable read-only memory (EEPROM), a hard disk drive, a solid-state drive or any other form of suitable computer readable medium that may be used in electronic devices. - Additionally,
computing device 102 may include one or more input devices (not shown), such as a keyboard, mouse, camera, touch screen and/or a microphone, and may also include one or more output devices such as adisplay screen 112 and/or a speaker.Computing device 102 may have a network interface for connecting to a network (not shown) in order to communicate with other components. - It will be understood that although each of
data stores FIG. 1 separately, they can be stored together as separate tables within the same or multiple databases both locally and/or remotely. Additionally, other persistent storage methods such as encrypted files may also be used to provide persistent storage. - Further, the
storage device interface 114 of thecomputing device 102 may be any type of hardware or software interface that allows the computing device to communicate with thestorage device 104. For example, thestorage device interface 114 may be one or more of the following interfaces: Parallel AT Attachment (PATA), Serial AT Attachment (SATA), Integrated Drive Electronics (IDE), Enhanced Integrated Drive Electronics (EIDE), Small Computer System Interface (SCSI), Universal Serial Bus (any version), FireWire and/or Thunderbolt. Additionally or alternatively, thestorage device interface 114 may allow communication with astorage device 114 which is provided remotely (e.g., via Network-Attached Storage (NAS) and/or Storage Area Network (SAN) mechanisms) by acting as a client to a server that provides access to thestorage device 104. - Moreover, the
storage device 104 on which the unknown data is stored may include any type of the computer readable media that is to be the subject of the data analysis methods described herein, including the types of memory that are listed above as being options for thememory 116. - Referring now to
FIG. 2 , shown there generally as 200 is a method for identifying an application type of unknown data stored in a storage device, in accordance with one example embodiment. In some embodiments, the components of thecomputing device 102, such as theprocessor 110, may be configured to execute one or more steps of themethod 200 to identify an application type of unknown data. For ease of explanation, when discussing various steps of the method ofFIG. 2 , reference will simultaneously be made to an example database table shown generally as 134 inFIG. 3 . - At
step 205, a search for files of thestorage device 104 is initiated. Thestorage device 104 may be formatted according to a known file system that the application type identification module 120 (as shown inFIG. 1 ) is configured to access and conduct a search on. For example, thestorage device 104 may have been formatted for use with various operating systems such as Microsoft™ Windows™, Linux™, Apple OS X™, Apple iOS™ and/or Android™, and the file systems that can be processed by the application-type identification module 120 may include the following file systems: File Allocation Table 32 (FAT32), New Technology File System (NTFS), third extended file system (ext3), fourth extended file system (ext4), Hierarchical File System (HFS) and/or Hierarchical File System Plus (HFS+ or HFSX), - At
step 210, it is determined whether a file has been found. If no file has been filed (the ‘NO’ branch at step 210), then there may be no remaining files to be processed bymethod 200. If at least one of the located files were determined to contain database information and also resulted in at least one mapping between a column and a data field,method 200 may then proceed to display the mapping user interface atstep 215. As noted,computing device 102 may display the mapping user interface via display 112 (as shown inFIG. 1 ). As will be discussed below, the mapping user interface may allow a user using thecomputing device 102 to confirm or alter the mapping between a column and a data field determined by the method ofFIG. 2 . The steps associated with providing such a user interface are shown inFIG. 4 . - It may be the case that the
file system 130 is not recognized by the application-type identification module 120, or that no database information has been located on thefile system 130, or that no mappings between data fields have been generated. In such case, step 215 may not be performed. The process may then end and a message may be displayed that indicates that the method ofFIG. 2 was not able to identify the application type of the data stored on thestorage device 104. - If it is determined that a file has been found (the ‘YES’ branch at 210), the
computing device 102 may open the file for read access and the data in the file may be read to attempt to determine if the file includes database information (step 220). - At
step 225, a determination is made as to whether the file corresponds to database information. This may be performed, for example, by reading the header information for a file to determine if it contains information indicating that it is of a known database type. For example, this step may involve checking for the string “SQLite format 3” in the header information of the file as SQLite database files typically includes the text “SQLite format 3” in its header. In some cases, the information indicating that a file is a database may not necessarily be within the header portion of a file and instead, could be located in other parts of the data associated with the file. For example, the file extension of a file may indicate that a file is of a particular database type (e.g., the file extensions “.mdb”, “.mda”, or “.accdb” may indicate that the given file is a Microsoft™ Access™ database). It will be appreciated that the determination made atstep 225 may identify database information that does not necessarily include the entire contents of a given database. For example, the database information may include any portion of the contents of a database (e.g., as may be the case if the remaining contents of the database have already been overwritten or is otherwise unavailable). - If the file is determined not to include database information (the ‘NO’ branch at step 225), then
method 200 returns to step 205 where it may again begin the process of determining whether there are any files remaining to be processed. In various embodiments, the application-type identification module 120 may also store data to a report database (not shown) indicating that the previous file was processed and had been identified as not containing a database. - If the file is determined to include database information (the ‘YES’ branch at step 225), then the tables accessible within the database are processed and
method 200 proceeds to step 230. - At
step 230, themethod 200 determines if there are any tables that are still unprocessed in the database. If there are not (the ‘NO’ branch at step 230), then themethod 200 returns to step 205 where it may again begin the process of determining whether there are any files remaining on thestorage device 104 to be processed. If there are still tables to be processed (the ‘YES’ branch at step 230), then the next table in the database information is read atstep 235. - At
step 240, it is determined if there are any columns of the table that are still unprocessed. If there are not (the ‘NO’ branch at step 240), then the table has been processed andmethod 200 returns to step 230 to determine whether the database information contains any further tables that are still unprocessed. - If it is determined that there are columns of the table still to be processed (the ‘YES’ branch at 240), then the column identifier of the next column is read at
step 245. In some cases, the column identifier may be the name of the column used in a database. In various embodiments, the column identifier information may be retrieved from the schema of the database. In some cases, the column identifier may be found in header data of the database table, however the column identifier could also be located in other parts of the data associated with the column. - Once the identifier of a column has been read,
step 250 determines whether the column identifier read atstep 245 includes a keyword associated with a particular application type. For example, in the case where the column identifier is a column name, this step may involve determining if the column names include keywords that are commonly used as column names by applications of the particular application type. The presence of such keywords in the column name may be taken as an indication that the database belongs to an application of that particular application type. - As examples, the present embodiments may be able to determine that unknown data is of an application type that includes, without limitation: a chat or instant messaging application type, a web browser application type, a navigation/geo-location application type, a file sharing application type, a social networking application type, a cloud application type, and an email application type. It will be understood that although specific example application types are mentioned and described herein for illustrative purposes, the present embodiments may be used to identify any application type generally.
- Some examples of keywords that may be used to identify data as belonging to an application of a chat or instant messaging application type may include the words: ‘message’, ‘subject’, ‘text’, ‘msg’, ‘body’, ‘content’, ‘date’, ‘time’, ‘timestamp’, ‘from’, ‘sender’, ‘author’, ‘uid’, ‘member’, ‘to’, ‘receiver’, ‘conversation’, ‘recipient’, ‘partner’, ‘participant’, and ‘party’. Some examples of keywords that may be used to identify data as belonging to an application as belonging to an application of a web browser type may include the words: ‘address’, ‘location’, ‘loc’, ‘URL’, ‘visited’, ‘date’, ‘bookmark’, ‘favorite’ and ‘title’. Some examples of keywords that may be used to identify data as belonging to an application of a navigation/geographic-location application type include: ‘coordinate’, ‘longitude’, ‘latitude’, ‘location’, ‘loc’, ‘home’, ‘destination’, ‘direction’, and ‘route’.
- To illustrate
step 250, reference is simultaneously made toFIG. 3 , which shows generally as 134 a schematic representation of a data table storing unknown data. Data table 134 includes a table identifier 302 (e.g., a table name such as ‘Messages’ or some other alphanumeric identifier), and a series of columns withrespective column identifiers - In the context of
step 250 inFIG. 2 , a particular column with column identifier ‘Message_id’ 312 may be being processed. In this context, it may be determined that thecolumn identifier 312 contains the keyword ‘message’. Since thecolumn identifier 312 contains a keyword associated with the chat/IM application type, it may be determined that the data in the database of data table 134 belongs to an application of the chat/IM application type. - Returning to
FIG. 2 , if it is determined that a column identifier does not include a keyword associated with an application type (the ‘NO’ branch at step 250), then control returns to step 240 where it is again determined whether there are any columns in the table that are still unprocessed. - If it is determined that a column identifier does include a keyword associated with an application type (the ‘YES’ branch at step 250), then the process has determined that the file includes database information from a particular application type. In a number of situations, a given keyword used to identify data stored in a database as belonging to an application may also be associated with a data field that is commonly used by an application of the particular application type. For example, the keywords ‘message’, ‘subject’, ‘text’, ‘msg’, ‘body’ and ‘content’ may all be commonly used to identify a data field for the substance or “content” of the message in an application of a Chat/IM application type. Similarly, the ‘date’, ‘time’, and ‘timestamp’ keywords may be commonly used to identify a data/time field for the date and time of a message in an application of a Chat/IM application type. Likewise, the keywords ‘from’, ‘sender’, ‘author’, ‘uid’, and ‘member’ may be commonly used to identify a data field for the sender of a message in an application of a Chat/IM application type. Further, the keywords ‘to’, ‘receiver’, ‘conversation’, ‘recipient’, ‘partner’, ‘participant’, and ‘party’ may be commonly used to identify a data field for a recipient of a message in an application of a Chat/IM application type.
- As will be understood, there may be other keywords associated with other data fields depending on the nature of the application type that is attempting to be determined. For example, the keywords ‘address’, ‘location’, ‘URL’, or ‘visited’ may all be commonly used to identify a data field for the address field in a web browser application.
- Optionally, to confirm that the determining made at
step 250 is correct, atstep 255, a data record from the table may be sampled to determine if the data for that column in the data record is consistent with data that would belong (e.g., generated by, or otherwise associated with) to that data field by an application of the application type. This may be performed in a number of ways. For example, in the case where the keyword corresponds to a data field that is supposed to contain the “content” of a message, the data for the data column in the data record may be compared to words in a dictionary that have been previously recognized as commonly being present in the content of a message. For example, these words may include ‘hello’, ‘hi’, ‘hey’, ‘bye’, ‘see’, ‘you’, ‘soon’ and/or ‘later’. In another example, if the keyword corresponds to a data field that is supposed to contain data for the given data field (e.g., a GPS coordinate), a regular expression can be created to recognize a string of text or numeric values as potentially being data of the given data field (e.g., a potential GPS coordinate). In yet another example, heuristics may be developed based on historical experience of what data for a given data field contains, and these heuristics may be used to confirm that data for a given column appears as expected. It will be understood that various other ways of performing this step may be possible. - To illustrate
step 255, reference is simultaneously again made toFIG. 3 .Data record 360 stored in data table 134 may be sampled to determine whether the data in a particular column ofdata record 360, such asdata 368 for the column with column identifier ‘text’ 318, is consistent with data that is expected for the data field that the column has mapped to (i.e., the data field for the ‘content’ of the message). As shown,such data 368 includes the text “hey hey hey”. Using the dictionary lookup method described above, it may be determined that since thedata 368 includes the word ‘hey’, thedata 368 is consistent with that which is expected for the data field for the “content” of the message (e.g., as that which would have been generated by an application belonging to the Chat/IM application type). - At
step 258, if the column maps to a date/time data field, the date/time format for the data of the column may be determined. For example, this may be performed by converting the date/time value to various formats and performing boundary checks to identify the particular date/time format that the data in the column is most likely to be formatted in. Specifically, this may involve converting the data in the column in the sampled data record to each of a plurality of date/time formats; comparing the converted data, in each respective date/time format, to each other to determine which converted data is closest to a reference date/time; and for the converted data that is closest to the reference date/time, identifying the date/time format of the converted data as the date/time format of the data in the column of the table. In various embodiments, the reference date/time may be the present date/time (e.g., the date/time when data recovery process is being performed). Additionally or alternatively, the reference date/time may be a predefined date/time of a particular event (e.g., if data recovery is being performed for forensics purposes, the date/time of a criminal activity such as a murder). - In a variant embodiment, only the post-conversion date/time data that is within a specific date/time window (e.g., +/−7 years of the reference date/time) may be used to compare with each other. This may reduce the amount of comparisons that need to be performed if it is known that it is unlikely that the date/time data that is being sampled will be beyond the specific date/time window. For example, if data converted to a given date/time format results in data that is beyond the specific date/time window (e.g., earlier or later than the window), then it can be determined that such given date/time format is unlikely to be the correct date/time format that the data is actually formatted in.
- As will be understood, any date/time format may be supported. For example, some example date/time formats that may be supported include: Unix epoch time—seconds, Unix epoch time—milliseconds, PRTime, Mac Absolute Time, and/or Chrome/webkit time.
- Referring still to
FIG. 2 , if it is determined that the data sampled for the column being analyzed is not consistent with data that would belong to an application of the application type (the ‘NO’ branch at step 260), thenmethod 200 returns to step 240 where it is again determined whether there are any columns remaining in the table that are still unprocessed. - As indicated above,
steps 255 to 260 are optional in that they need not be performed. When they are performed, however, the acts may provide a confirmation that the column with the column identifier having a keyword does in fact corresponds to the data field associated with the keyword. In this way, steps 255 and 260 may be considered a “sanity check” that verifies the conclusion arrived at instep 250. - Referring again simultaneously to
FIG. 3 , illustrated there are several columns for which the performance ofsteps 255 to 260 may result in the conclusion arrived at instep 250 not being confirmed. For example, in the column with the column identifier ‘Message_id’ 312, the determination atstep 250 may have been that because the column identifier contains the keyword ‘message’, that the column corresponds to the data field for the “content” of a message, as would be generated by an application belonging to the Chat/IM application type. However, upon performingsteps 255 to 260, it may be determined that thedata 362 for that column does not contain any of the words in the dictionary that have been previously recognized as indicating that the data constitutes the “content” of a message. As a result, it may be determined at 260 that the column with the ‘Message_id’column identifier 312 does not correspond to the “content” data field of a Chat/IM application. In this way, the performance ofsteps 255 to 260 may reduce the likelihood of erroneous mappings that are determined based on the results ofstep 250 alone. - Referring back to
FIG. 2 , if it is determined that the data sampled for a column is consistent with data that would belong to an application of the application type (the ‘YES’ branch at step 260), themethod 200 proceeds to step 265 where the data stored in the database is identified as belonging to an application of the particular application type identified instep 250. Step 265 may involve the application-type identification module 120 (as shown inFIG. 1 ) storing information indicating that the unknown data encountered on thestorage device 104 belongs to the particular application type for which the unknown data was analyzed (e.g., a Chat/IM application type). - In some cases, application-
type identification module 120 may attempt to identify an application identifier (e.g., the name of the application). For example, this may be performed by using operating system application manifests (e.g., as may be separately found onfile system 130 ofstorage device 104, apart from the database 132), or via the text found in file path location (e.g., such text may be found in the file path of where thedatabase 132 is located on thefile system 130 of storage device 104). If the application name is available, when performingstep 265 inFIG. 2 , an association between the name of the application and the application type as identified by the method ofFIG. 2 may be stored. - At
step 270, a mapping may be stored between the column being processed and the commonly used data field that the sampled data of the column was determined to be consistent with instep 260. In some cases, the mapping may be stored in column/datafield mapping store 140. The mappings may, for example, be subsequently referenced when recovering data from theunknown database 132. The mappings may also be subsequently used when recovering data from anotherstorage device 104 containing unknown data, so that if similar database information is encountered, the mappings can be referenced to identify the type of data that is stored in the database. - After
step 270,method 200 returns to step 240 to determine whether there are any columns still unprocessed. Ifmethod 200 determines that there are no columns still unprocessed, and no tables still unprocessed atstep 230, it will return to step 205 where it will continue to search for files. If no files are found instep 210,method 200 may proceed to step 215 and display a mapping user interface that may allow updating of the mapping between a column and a given data field. The mapping user interface may display a list of the located databases tables and mappings of columns of such tables to the commonly used data fields for a given application type, so as to allow user input for final verification or remapping if necessary. A method and user interface for performing such remapping or verification is discussed below with respect toFIGS. 4 and 5 . - It will be appreciated that various changes may be made to the method of
FIG. 2 . - For example, in a variant embodiment, step 265 (to identify data stored in the database as belonging to an application of the particular application type) may be performed immediately after it has been determined that the
column identifier 250 includes a keyword associated with the application type, and before a data record is sampled atsteps 255 to 260. - Additionally or alternatively, in some embodiments, some of the steps of
method 200 may be executed in parallel. Parallel execution of some steps may be desirable in systems that have more than one processor or a processor that has more than one processing core. In such cases, for example, one or more cores may be focused on executingstep 250 to identify whether a column identifier contains a keyword associated with an application type, and one or more other cores may be focused on sampling a data record from the table to determine whether the data is consistent with data that would be generated by an application of the application type. Parallel execution may also allow thecomputing device 102 to process more than one table or more than one column simultaneously. - Further, in some cases,
method 200 may also include a step of checking a reference database (e.g., the column/datafield mapping store 140 shown inFIG. 1 ) containing mappings and database information from previous executions of the method ofFIG. 2 . In such a case, if a match is found between the database being processed and a previously stored mapping, the previously stored mapping may be applied and the data from the database being processed may be automatically recovered to be presented later. In such a case, the method may return directly to step 205 to search for any remaining files that need to be processed, and the mapping need not be presented to the user for verification in accordance withmethod 400. - Referring to
FIG. 4 , shown there generally as 400 is amethod 400 for updating a mapping between a data field and a column, in accordance with one example embodiment. For ease of illustration, reference will also simultaneously be made toFIG. 5 , which shows generally as 500, a screenshot of an example user interface that allows updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment. In various embodiments, the components of thecomputing device 102, such as theprocessor 110 and thedisplay 112, may be configured to execute one or more steps of themethod 400. In various embodiments,method 400 may be initiated atstep 215 of themethod 200 ofFIG. 2 where a mapping user interface is displayed. - The
method 400 starts atstep 405 where a mapping between a data field and a column of a table is displayed in a user interface. The mapping may be retrieved from the column/datafield mapping store 140 shown inFIG. 1 . In various embodiments, the mapping may have been stored as a result ofstep 270 ofmethod 200 inFIG. 2 . - Referring simultaneously to
FIG. 5 , shown there generally as 500 is an example user interface that may be displayed atstep 405 ofFIG. 4 . As illustrated, theuser interface 500 is for an execution ofmethod 200 ofFIG. 2 that attempts to determine if unknown data corresponds to a Chat/IM application type. Accordingly, the user interface provides a number of data fields that are commonly used by an application of a Chat/IM application type. Specifically, there is a ‘sender’ data field 514 (shown inFIG. 5 with the text “Identified Sender Column”), a ‘recipient’ data field 516 (shown inFIG. 5 with the text “Identified Recipient Column”), a ‘content’ data field 518 (shown inFIG. 5 with the text “Identified Message Column”), and a ‘date/time’ data field 520 (shown inFIG. 5 with the text “Identified Date Column”). - The
user interface 500 ofFIG. 5 shows, as rows, each of the tables 134 that have been processed inFIG. 2 . For each table 134, the user interface shows the column identifier of a column in the table that has been determined to be mapped to aparticular data field user interface 500 shows information for the database table 134 illustrated inFIG. 3 . Referring simultaneously toFIG. 3 , it can be seen that the mapping (e.g., as may have been stored atstep 270 inFIG. 2 ) has determined that the column with column identifier ‘author_id’ 314 corresponds to the ‘sender’data field 514. Similarly, the column with column identifier ‘conversation_id’ 316 has been determined to correspond to the ‘recipient’data field 516, the column with column identifier ‘timestamp’ 320 has been determined to correspond to the ‘date/time’data field 520, and the column with the column identifier ‘text’ 318 has been determined to correspond to the ‘content’data field 518. - As illustrated, the
user interface 500 may also display thedetermined application identifier 530 for a given table 134 (e.g., “com.google.android.apps.plus”) and thetable identifier 535 for a given table 134 (e.g., the table name “messages” 302) if such information is available and has been determined. - As discussed above, for a column that is identified as corresponding to the ‘date/time’
data field 520, it may be possible to determine the date/time format of the data stored for that column. Accordingly, in some embodiments, if a given column has been mapped to a ‘date/time’ data field, the determined date/time format of the data may also be shown in theuser interface 500. As illustrated, this is provided as anadditional column 550 positioned beside the ‘date/time’field 520. For the database table 134 ofFIG. 3 , for example, theuser interface 500 shows that the data for the column with the column identifier ‘timestamp’ 320 as being determined to be of a ‘PRTime’ date/time format 555. - Furthermore,
user interface 500 has apreview section 560 that displays how the data from a data record of the table would be presented, based on the mappings. For example, thedata record 360 illustrated inFIG. 3 may be shown in thepreview section 560 according to the mappings illustrated for the database table 134. Referring again simultaneously toFIG. 3 , it can be seen that thedata 364 for the ‘Author_id’column 314 is provided under ‘From’ in thepreview section 560 because the ‘Author_id’column 314 is mapped to the ‘sender’data field 514. Similarly, thedata 366 for the ‘Conversation_id’column 316 is provided under ‘To’ in thepreview section 560 because the ‘Conversation_id’column 316 is mapped to the ‘recipient’data field 516. Likewise, thedata 368 for the ‘text’column 318 is provided under ‘Message’ in thepreview section 560 because the ‘text’column 318 is mapped to the ‘content’data field 518, and thedata 370 for the ‘timestamp’column 320 is provided under Date/Time′ in thepreview section 560 because the ‘timestamp’column 320 is mapped to the ‘date/time’data field 520. - For the date/
time data 370, thepreview section 560 may be configured to display the data according to the date/time format that has been determined in the method ofFIG. 2 above. For example, as illustrated, because the date/time format of the date/time data 370 has been determined to be ‘PRTime’ (e.g., as illustrated at 555 of user interface 500), the date/time shown would be the date/time data 370 after it has been converted to the ‘PRTime’ format. If the resultant post-conversion date/time data appears to be incorrect in thepreview section 560 to a user, user input may be received via the user interface control 555 (e.g., the indicated combo box may be selected), and an alternative date/time format may be chosen. Thepreview section 360 may then be updated to display the date/time data formatted according to the alternative date/time format. - Referring back to
FIG. 4 , atstep 410, input from the user interface may be received indicating that a data field is to be mapped to an alternative column of the table. As illustrated inFIG. 5 , the various mappings of thecolumn identifiers particular data field data field - The drop-down boxes may be activated if a user determines that a given mapping is incorrect. For example, upon seeing that the ‘recipient’
data field 516 being mapped to the column with the ‘conversation_id’ column identifier results in the number ‘1’ being identified as a recipient of message in a Chat/IM application (e.g., as is illustrated in under ‘To’ in the preview section 560), the user may select an alternative column to be mapped to the ‘recipient’data field 516. - In various embodiments, in response to receiving user input indicating an alternative mapping, the
preview section 560 may be updated to correspond to the alternative mapping. In some instances thepreview section 560 may be updated to allow the user to consider whether the selected mapping is appropriate, or whether the mapping should be changed again. - Referring again to
FIG. 4 , atstep 415, an updated mapping for the data field to the alternative column may be stored. For example, theprocessor 110 may update the mapping stored in column/datafield mapping store 140 to correspond to the alternative column. In some cases,method 400 may be executed multiple times for an application before the user has determined the appropriate mapping for that application. - Referring now to
FIG. 6 , illustrated therein generally as 600 is a screenshot of a data recovery user interface after the updating of the mapping of a column of a table to a data field commonly used by an application of a particular application type, in accordance with one example embodiment. - Once a user has completed their review of the mappings for the identified applications, the
computing device 102 may attempt to recover the data from the respective unknown databases, using the stored mappings. For example, thecomputing device 102 may generate a report or case file that contains the results of the data extracted using the stored mappings. In various embodiments, the datarecovery user interface 600 may be displayed after the mapping has been updated according to themethod 400 ofFIG. 4 . Additionally or alternatively, thedata recovery interface 600 may also be presented to a user immediately after the initial mappings are stored upon the completion of themethod 200 ofFIG. 2 . - As illustrated, a number of different tables 134 that have been analyzed may be selected in the left-hand pane. Upon selection of a given table (e.g., as illustrated, a table for the application with the “com.google.android.apps.plus” application identifier), the records for that table can then be populated into the top-right pane (e.g., as illustrated, there are 43 data records within the table). The data within each of the data records are then displayed in accordance with the mapping determined by
method 200 ofFIG. 2 and/ormethod 400 ofFIG. 4 . For example, for the column in each table that has been mapped to the ‘sender’data field 514, the ‘recipient’data field 516, the ‘content’data field 518, and the ‘date/time’data field 520 respectively, the data contained therein for each data record are shown under eachrespective data field - The
user interface 600 also includes apreview section 660 which displays a preview of how the data of a selecteddata record 360 would appear based on the mappings. As illustrated, theexample data record 360 shown inFIG. 3 is again shown. For example, thedata 364 for the ‘Author_id’column 314 is shown as being associated with the ‘sender’data field 514. Similarly, thedata 366 for the ‘Conversation_id’ column is shown as being associated with the ‘recipient’data field 516. Likewise, thedata 368 for the ‘text’column 318 is shown as being associated with the ‘content’data field 518, and thedata 370 for the ‘timestamp’column 320 is shown as being associated with the ‘date/time’data field 520. Additionally, thepreview section 660 may also provide an indication of the table identifier 302 (e.g., the table name ‘messages’). - While the above description provides examples of one or more devices, systems and methods, it will be appreciated that other devices, systems and methods may be within the scope of the present description interpreted by one of skill in the art.
- As noted, the systems and methods disclosed herein are presented only by way of example and are not meant to limit the scope of the subject matter described herein. Other variations of the systems and methods described above will be apparent to those in the art and as such are considered to be within the scope of the subject matter described herein. For example, it should be understood that acts and the order of the acts performed in the processing described herein may be altered, modified and/or augmented yet still achieve the desired outcome.
- In particular, the steps of a method in accordance with any of the embodiments described herein may be performed in any order, whether or not such steps are described in the claims, figures or otherwise in any sequential numbered or lettered manner. Also, in the various user interfaces illustrated in the figures, it will be understood that the illustrated user interface text and controls are provided as examples only and are not meant to be limiting. Other suitable user interface elements may be possible.
- As used herein, the wording “and/or” is intended to represent an inclusive-or. That is, “X and/or Y” is intended to mean X or Y or both. Moreover, “X, Y, and/or Z” is intended to mean X or Y or Z or any combination thereof.
Claims (17)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/498,325 US20150095290A1 (en) | 2013-09-27 | 2014-09-26 | Method and device for identifying an application type of unknown data |
US17/554,581 US11868212B2 (en) | 2013-09-27 | 2021-12-17 | Method and device for identifying an application type of unknown data |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361883279P | 2013-09-27 | 2013-09-27 | |
US14/498,325 US20150095290A1 (en) | 2013-09-27 | 2014-09-26 | Method and device for identifying an application type of unknown data |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/554,581 Continuation US11868212B2 (en) | 2013-09-27 | 2021-12-17 | Method and device for identifying an application type of unknown data |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150095290A1 true US20150095290A1 (en) | 2015-04-02 |
Family
ID=52741140
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/498,325 Abandoned US20150095290A1 (en) | 2013-09-27 | 2014-09-26 | Method and device for identifying an application type of unknown data |
US17/554,581 Active 2034-10-22 US11868212B2 (en) | 2013-09-27 | 2021-12-17 | Method and device for identifying an application type of unknown data |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/554,581 Active 2034-10-22 US11868212B2 (en) | 2013-09-27 | 2021-12-17 | Method and device for identifying an application type of unknown data |
Country Status (5)
Country | Link |
---|---|
US (2) | US20150095290A1 (en) |
EP (1) | EP3049970A4 (en) |
AU (1) | AU2014328401B2 (en) |
CA (1) | CA2925426C (en) |
WO (1) | WO2015042719A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150370917A1 (en) * | 2013-02-07 | 2015-12-24 | Hewlett-Packard Development Company, L.P. | Formatting Semi-Structured Data in a Database |
US20180150548A1 (en) * | 2016-11-27 | 2018-05-31 | Amazon Technologies, Inc. | Recognizing unknown data objects |
US11036560B1 (en) | 2016-12-20 | 2021-06-15 | Amazon Technologies, Inc. | Determining isolation types for executing code portions |
US20210224085A1 (en) * | 2018-11-07 | 2021-07-22 | Citrix Systems, Inc. | Preloading of Application on a User Device Based on Content Received by the User Device |
US20220058343A1 (en) * | 2018-03-23 | 2022-02-24 | Servicenow, Inc. | Written-modality prosody subsystem in a natural language understanding (nlu) framework |
US11704331B2 (en) | 2016-06-30 | 2023-07-18 | Amazon Technologies, Inc. | Dynamic generation of data catalogs for accessing data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5956725A (en) * | 1997-11-26 | 1999-09-21 | Interanational Business Machines Corporation | Schema mapping to a legacy table with primary and foreign key support |
US6070539A (en) * | 1997-03-21 | 2000-06-06 | Case Corporation | Variable rate agricultural product application implement with multiple inputs and feedback |
US20020013817A1 (en) * | 2000-07-07 | 2002-01-31 | Collins Thomas M. | Method and apparatus for distributing of e-mail to multiple recipients |
US20030050935A1 (en) * | 2001-09-07 | 2003-03-13 | Dominika Spetsmann | System and method for searching an object catalog subject to a plurality of standards regimes |
US20090043786A1 (en) * | 2007-08-08 | 2009-02-12 | Schmidt Brian K | Network repository for metadata |
US20100318489A1 (en) * | 2009-06-11 | 2010-12-16 | Microsoft Corporation | Pii identification learning and inference algorithm |
US20140258341A1 (en) * | 2013-03-11 | 2014-09-11 | Business Objects Software Ltd. | Automatic file structure and field data type detection |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5761656A (en) | 1995-06-26 | 1998-06-02 | Netdynamics, Inc. | Interaction between databases and graphical user interfaces |
US6496831B1 (en) * | 1999-03-25 | 2002-12-17 | Lucent Technologies Inc. | Real-time event processing system for telecommunications and other applications |
US6687711B1 (en) * | 2000-12-04 | 2004-02-03 | Centor Software Corporation | Keyword and methods for using a keyword |
US7379934B1 (en) | 2004-07-09 | 2008-05-27 | Ernest Forman | Data mapping |
US8832048B2 (en) * | 2005-12-29 | 2014-09-09 | Nextlabs, Inc. | Techniques and system to monitor and log access of information based on system and user context using policies |
US7747563B2 (en) | 2006-12-11 | 2010-06-29 | Breakaway Technologies, Inc. | System and method of data movement between a data source and a destination |
US8010502B2 (en) * | 2007-04-13 | 2011-08-30 | Harris Corporation | Methods and systems for data recovery |
-
2014
- 2014-09-26 EP EP14848057.7A patent/EP3049970A4/en not_active Ceased
- 2014-09-26 AU AU2014328401A patent/AU2014328401B2/en active Active
- 2014-09-26 WO PCT/CA2014/050929 patent/WO2015042719A1/en active Application Filing
- 2014-09-26 US US14/498,325 patent/US20150095290A1/en not_active Abandoned
- 2014-09-26 CA CA2925426A patent/CA2925426C/en active Active
-
2021
- 2021-12-17 US US17/554,581 patent/US11868212B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6070539A (en) * | 1997-03-21 | 2000-06-06 | Case Corporation | Variable rate agricultural product application implement with multiple inputs and feedback |
US5956725A (en) * | 1997-11-26 | 1999-09-21 | Interanational Business Machines Corporation | Schema mapping to a legacy table with primary and foreign key support |
US20020013817A1 (en) * | 2000-07-07 | 2002-01-31 | Collins Thomas M. | Method and apparatus for distributing of e-mail to multiple recipients |
US20030050935A1 (en) * | 2001-09-07 | 2003-03-13 | Dominika Spetsmann | System and method for searching an object catalog subject to a plurality of standards regimes |
US20090043786A1 (en) * | 2007-08-08 | 2009-02-12 | Schmidt Brian K | Network repository for metadata |
US20100318489A1 (en) * | 2009-06-11 | 2010-12-16 | Microsoft Corporation | Pii identification learning and inference algorithm |
US20140258341A1 (en) * | 2013-03-11 | 2014-09-11 | Business Objects Software Ltd. | Automatic file structure and field data type detection |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150370917A1 (en) * | 2013-02-07 | 2015-12-24 | Hewlett-Packard Development Company, L.P. | Formatting Semi-Structured Data in a Database |
US11126656B2 (en) * | 2013-02-07 | 2021-09-21 | Micro Focus Llc | Formatting semi-structured data in a database |
US11704331B2 (en) | 2016-06-30 | 2023-07-18 | Amazon Technologies, Inc. | Dynamic generation of data catalogs for accessing data |
US20180150548A1 (en) * | 2016-11-27 | 2018-05-31 | Amazon Technologies, Inc. | Recognizing unknown data objects |
US10621210B2 (en) * | 2016-11-27 | 2020-04-14 | Amazon Technologies, Inc. | Recognizing unknown data objects |
US20200242135A1 (en) * | 2016-11-27 | 2020-07-30 | Amazon Technologies, Inc. | Recognizing unknown data objects |
US11893044B2 (en) * | 2016-11-27 | 2024-02-06 | Amazon Technologies, Inc. | Recognizing unknown data objects |
US11036560B1 (en) | 2016-12-20 | 2021-06-15 | Amazon Technologies, Inc. | Determining isolation types for executing code portions |
US20220058343A1 (en) * | 2018-03-23 | 2022-02-24 | Servicenow, Inc. | Written-modality prosody subsystem in a natural language understanding (nlu) framework |
US20210224085A1 (en) * | 2018-11-07 | 2021-07-22 | Citrix Systems, Inc. | Preloading of Application on a User Device Based on Content Received by the User Device |
Also Published As
Publication number | Publication date |
---|---|
AU2014328401A1 (en) | 2016-05-05 |
CA2925426A1 (en) | 2015-04-02 |
EP3049970A4 (en) | 2017-04-05 |
EP3049970A1 (en) | 2016-08-03 |
WO2015042719A1 (en) | 2015-04-02 |
US11868212B2 (en) | 2024-01-09 |
CA2925426C (en) | 2021-11-23 |
AU2014328401B2 (en) | 2020-03-12 |
US20220107868A1 (en) | 2022-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11868212B2 (en) | Method and device for identifying an application type of unknown data | |
US11709823B2 (en) | Real time visual validation of digital content using a distributed ledger | |
RU2598991C2 (en) | Data recovery client for moveable client data | |
US20110083079A1 (en) | Apparatus, system, and method for improved type-ahead functionality in a type-ahead field based on activity of a user within a user interface | |
US10437708B2 (en) | System for refreshing and sanitizing testing data in a low-level environment | |
US20120197926A1 (en) | Methods and systems for implementing email recipient templates | |
CN112954043B (en) | Method for identifying user based on website access log and computer equipment | |
CN109074378B (en) | Modular electronic data analysis computing system | |
CN110704476A (en) | Data processing method, device, equipment and storage medium | |
CN111314063A (en) | Big data information management method, system and device based on Internet of things | |
US11496446B1 (en) | Protecting personally identifiable information submitted through a browser | |
US20190050405A1 (en) | Systems and methods for constraint driven database searching | |
CN110019542B (en) | Generation of enterprise relationship, generation of organization member database and identification of same name member | |
WO2019071907A1 (en) | Method for identifying help information based on operation page, and application server | |
US20190108361A1 (en) | Secure access to multi-tenant relational data | |
US20180183752A1 (en) | Method and system for providing additional information relating to primary information | |
US20120089849A1 (en) | Cookie management system and method | |
US9342530B2 (en) | Method for skipping empty folders when navigating a file system | |
US11080238B2 (en) | System and method for interactive visual representation of metadata within a networked heterogeneous workflow environment | |
US11366796B2 (en) | Systems and methods for compressing keys in hierarchical data structures | |
US20180150752A1 (en) | Identifying artificial intelligence content | |
US10878051B1 (en) | Mapping device identifiers | |
US20160173346A1 (en) | Method and device to update contacts | |
CN112866979A (en) | User information association method, device, equipment and medium based on 5G service interface | |
US11397789B1 (en) | Normalizing uniform resource locators |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STCV | Information on status: appeal procedure |
Free format text: APPEAL BRIEF (OR SUPPLEMENTAL BRIEF) ENTERED AND FORWARDED TO EXAMINER |
|
STCV | Information on status: appeal procedure |
Free format text: EXAMINER'S ANSWER TO APPEAL BRIEF MAILED |
|
AS | Assignment |
Owner name: MAGNET FORENSICS INC., CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SALIBA, JAD JOHN;REEL/FRAME:052171/0230 Effective date: 20200218 |
|
STCV | Information on status: appeal procedure |
Free format text: ON APPEAL -- AWAITING DECISION BY THE BOARD OF APPEALS |
|
AS | Assignment |
Owner name: MAGNET FORENSICS INVESTCO INC., CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MAGNET FORENSICS INC.;REEL/FRAME:055019/0067 Effective date: 20200925 |
|
AS | Assignment |
Owner name: ROYAL BANK OF CANADA, CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MAGNET FORENSICS INVESTCO, INC.;REEL/FRAME:057797/0493 Effective date: 20211001 |
|
STCV | Information on status: appeal procedure |
Free format text: BOARD OF APPEALS DECISION RENDERED |
|
AS | Assignment |
Owner name: ROYAL BANK OF CANADA, CANADA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE NATURE OF CONVEYANCE PREVIOUSLY RECORDED AT REEL: 057797 FRAME: 0493. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:MAGNET FORENSICS INVESTCO, INC.;REEL/FRAME:058037/0964 Effective date: 20211001 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |
|
AS | Assignment |
Owner name: MAGNET FORENSICS INVESTCO, INC., CANADA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:063231/0372 Effective date: 20230404 Owner name: MAGNET FORENSICS INC., CANADA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:063231/0372 Effective date: 20230404 |