US20090049067A1 - System and Method of Self-Learning Conceptual Mapping to Organize and Interpret Data - Google Patents

System and Method of Self-Learning Conceptual Mapping to Organize and Interpret Data Download PDF

Info

Publication number
US20090049067A1
US20090049067A1 US12/258,959 US25895908A US2009049067A1 US 20090049067 A1 US20090049067 A1 US 20090049067A1 US 25895908 A US25895908 A US 25895908A US 2009049067 A1 US2009049067 A1 US 2009049067A1
Authority
US
United States
Prior art keywords
self
organizing map
vectors
data
dialectic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/258,959
Inventor
Jonathan Murray
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
KinetX Inc
Original Assignee
KinetX Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by KinetX Inc filed Critical KinetX Inc
Priority to US12/258,959 priority Critical patent/US20090049067A1/en
Publication of US20090049067A1 publication Critical patent/US20090049067A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present invention relates in general to data organization and learning systems and, more particularly, to a system and method of using self-learning conceptual maps to organize and interpret data.
  • the system processes large amounts of information using self-learning algorithms and creates an easily accessible interpretation of core concepts for the benefit of the user.
  • data can be found in all forms, sizes, and contexts.
  • data can be found in news media, Internet, databases, data warehouses, published reports, scientific journals, industry publications, government statistics, court papers, recorded phone conversations, and the like.
  • the common approach is to search known data sources and then manually scan the available facts and figures for any useful information.
  • Some data may be stored in a structured format, e.g., in data warehouses or relational databases. Structured data is typically pre-sorted and organized into a useful information format which is relatively easy to search and digest. In fact, assuming the potential questions are known, the data may be properly organized into customized data marts that the user can readily access to retrieve the needed information with minimal effort.
  • Unstructured data may be found in newspaper articles, scientific journals, Internet, emails, letters, and countless other sources that are relatively difficult to organize, search, and retrieve any useful information.
  • the unstructured data is typically just words in a document that have little meaning beyond their immediate context to those in possession of the document. It is most difficult to assess or learn anything from unstructured data, particularly when questions from unrelated areas are posed or when the right questions are not even known.
  • the unstructured data may be just as important as the structured type, sometimes even more so, but its elusiveness often leaves a significant gap in the thoroughness of any search and analysis.
  • the process of searching for relevant and useful information and getting meaningful results is important in many different contexts and applications.
  • the user may be interested in marketing information, medical research, environment problem solving, business analysis, criminal investigation, or anti-terrorist work, just to name a few.
  • the user creates a list of key words or topics and uses a search engine to electronically interrogate available data sources, e.g., the Internet or various public and private databases.
  • the user will get one or more hits from the search and must then manually review and analyze each reference of interest.
  • the process takes considerable time and effort and, with present research tools, will often overlook key elements of relevant data.
  • the present invention is a computer implemented method of researching textual data sources comprising converting textual data into first numeric representations, forming a first self-organizing map using the first numeric representations, wherein the first numeric representations of the textual data are organized by similarities, forming a second self-organizing map from second numeric representations generated from the organization of the first self-organizing map, wherein the second numeric representations are organized into clusters of similarities on the second self-organizing map, and forming dialectic arguments from the second self-organizing map to interpret the textual data.
  • the present invention is a method of interpreting textual data comprising converting the textual data into first numeric representations, forming a first self-organizing map using the first numeric representations, forming a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map, and forming dialectic arguments from the second self-organizing map to interpret the textual data.
  • the present invention is a computer program product usable with a programmable computer processor having a computer readable program code embodied therein, comprising computer readable program code which converts the textual data into first numeric representations, computer readable program code which forms a first self-organizing map using the first numeric representations, computer readable program code which forms a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map, and computer readable program code which forms dialectic arguments from the second self-organizing map to interpret the textual data.
  • the present invention is a computer system for interpreting textual data comprising means for converting the textual data into first numeric representations, means for forming a first self-organizing map using the first numeric representations, means for forming a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map, and means for forming dialectic arguments from the second self-organizing map to interpret the textual data.
  • FIG. 1 is a simplified block diagram of self-learning conceptual mapping and data research tool
  • FIG. 2 illustrates a flow of understanding from raw data to information to knowledge
  • FIG. 3 illustrates a general computer system for executing the research tool
  • FIG. 4 illustrates a computer communication network
  • FIG. 5 is a block diagram of self-learning conceptual mapping and interpretation tool
  • FIG. 6 illustrates further detail of the semantic map
  • FIG. 7 illustrates further detail of the conceptual map
  • FIG. 8 illustrates further detail of the dialectic argument structure
  • FIG. 9 illustrates the process of researching and interpreting textual data.
  • a knowledge-based research tool is presented that is capable of digesting massive amounts of data and creating a visual representation or conceptual map of the information, organized in related topics.
  • the technology utilized in research tool coined as knowledgePOOL and knowledgeSEED (kPS)
  • kPS knowledgePOOL and knowledgeSEED
  • the novel business case of the kPS technology is founded upon the fact that kPS is embodied as an electronic computer processing tool and takes the place of many human analysts and others involved in searching for and analyzing relevant data, given a specific area or topic of interest.
  • the kPS tool is continuous on-line and is significantly faster and more capable than even a large team of people, in terms of accessing large amounts of information, reading and digesting the information, inferring concepts contained in the information digested, and searching for designated concepts.
  • the kPS tool typically runs under the guidance of one or more subject matter experts, who focus the invention's actions with respect to data sources, concept guidance, and other tunable parameters.
  • kPS can be termed a tunable concept inference engine that learns as it goes and that can be guided by human analysts.
  • the kPS tool is more than a key-word search engine or concept-driven search engine; the tool can infer concepts from what it reads.
  • one aspect of kPS is that it is tunable in many different ways, e.g. by concept, data source, language, dialect, and previous results. Language is not innately an obstacle, once some customization takes place to facilitate reading and comprehension.
  • the kPS tool is a self-teaching technology, but in addition to that, intermediate extracted concepts can be used by the controlling human analyst to further tune its activities, which allows for refinement of searches, making them more efficient and productive.
  • the self-learning technology is the root of the novel business case. kPS not only saves labor, but facilitates otherwise impractical analysis efforts.
  • the application areas are numerous and exceptionally rich. Broad applicability exists for criminal investigation, terrorism, homeland security, reconnaissance, national defense, marketing, opinion, scientific, engineering, medical research, environment problem solving, economic studies, and business analysis, just to name a few.
  • kPS can run at a high level over many sources searching for a broadly defined range of topics.
  • the sources can include, for example, federal criminal databases, reconnaissance data from agents in the field, and online newspaper accounts.
  • kPS can read not just intercepted communications in other languages, but different dialects of other languages.
  • kPS focus can be trained upon one specific terrorist organization, or the focus can be broadened to include suspicious chatter from many sources and languages.
  • an analyst or group of analysts can maximize the performance of kPS for a broad range of purposes, such as homeland security.
  • military online communications e.g., Internet
  • An intercepted communication from a hostile source can be analyzed to detect possible intelligence, hostile troop movements, or sabotage.
  • surveys can be analyzed to determine subtleties of preference across certain products or classes of products. Focus can be narrowed to a single product, or it can be widened to discern broad trends in taste and preference for a target market. Previous results that are now intermediate with respect to future searches can be used to refine a kPS search, thus making it much more efficient in its operation, which can make kPS more useable to businesses that are concerned with timeliness of results.
  • Data sources 12 - 16 can be structured or unstructured in context and format and represent many different sources of information.
  • Data source 12 can be a database containing structured data;
  • data source 14 can be a newspaper article containing unstructured data.
  • Other potential sources of data include news media, Internet, published reports, scientific journals, industry publications, government statistics, court papers, recorded phone conversations, etc.
  • the raw data from data sources 12 - 16 is processed through data scrub or conversion block 18 , which strips unnecessary data and converts the data into a numeric format compatible with self-organizing maps 20 .
  • the converted data is trained into self-organizing maps 20 and organized into one or more clusters, as described hereinafter.
  • the self-organizing maps 20 must first learn from the data sources.
  • an actionable intelligence block 22 interprets self-organizing maps 20 to make assessments of the data from data sources 12 - 16 for the benefit of the user.
  • Structured data comes in many different formats, sizes, and contexts. Structured data is typically pre-sorted and organized into a useful information format which is relatively easy to search and digest. Structured data is stored in specific locations of the database and data warehouses. Unstructured data is the words and sentences found in everyday settings, e.g., newspaper articles, scientific journals, Internet, emails, letters, financial records, special government licenses and permits, and countless other sources of routinely keep records. The unstructured data is typically just words in a document that have little meaning beyond their immediate context to those in possession of the document. In general, unstructured data is difficult to organize, search, and retrieve any useful information.
  • FIG. 2 illustrates the flow or hierarchy of information from both structured and unstructured data sources.
  • Raw data 24 is just a word and groups of words that has little meaning beyond its immediate context.
  • Information 26 comes from raw data that is organized and formatted to convey a higher meaning.
  • Information 26 may be a document containing raw data 24 that is put together in a manner which presents ideas to the reader. While information 26 may exist, it may not be understood or fully appreciated by the reader.
  • Knowledge 28 is achieved when the information is understood and appreciated for the purpose which it was presented, as well as other purposes which can be attained from the information.
  • the research tool 10 addresses the need to gain knowledge from information, even when the information is vast, unstructured, fuzzy, and derived from many uncorrelated data sources.
  • the above system and process can be implemented as one or more software applications or computer programs residing and operating on a computer system.
  • the computer system may be a stand-alone unit or part of a distributed computer network.
  • the computer is typically electronically interconnected with other computers using communication links such as Ethernet, radio frequency (RF), satellite, telephone lines, optical, digital subscriber line, cable connection, wireless, and other recognized communication standards.
  • the electronic connection link between computers can be made through an open architecture system such as the World Wide Web, commonly known as the Internet.
  • the Internet offers a significant capability to share information, data, and software.
  • FIG. 3 illustrates a simplified computer system 30 for executing the software program used in executing the research tool.
  • Computer system 30 is a general purpose computer including a central processing unit or microprocessor 32 , mass storage device or hard disk 34 , electronic memory 36 , and communication port 38 .
  • Communication port 38 represents a modem, high-speed Ethernet link, or other electronic connection to transmit and receive input/output (I/O) data with respect to other computer systems.
  • computer 30 is shown connected to server 40 by way of communication port 38 , which in turn is connected to communication network 42 .
  • Server 40 operates as a system controller and includes mass storage devices, operating system, and communication links for interfacing with communication network 42 .
  • Communication network 42 can be a local and secure communication network such as an Ethernet network, global secure network, or open architecture such as the Internet.
  • Computer systems 44 and 46 can be configured as shown for computer 30 or dedicated and secure data terminals. Computers 44 and 46 are also connected to communication network 42 . Computers 30 , 44 , and 46 transmit and receive information and data over communication network 42 .
  • Computers 30 , 44 , and 46 can be physically located in any location with access to a modem or communication link to network 42 .
  • computer 30 can be located in the host service provider's main office.
  • Computer 44 can be located in a first user's office;
  • computer 46 can be located in a second user's office.
  • the computers can be mobile and follow the users to any convenient location with electronic access to communication network 42 .
  • Each of the computers runs application software and computer programs, which can be used to execute the functionality, and provide the research features as described hereinafter.
  • the software is originally provided on computer readable media, such as compact disks (CDs), magnetic tape, or other mass storage medium. Alternatively, the software is downloaded from electronic links such as the host or vendor website.
  • the software is installed onto the computer system hard drive 34 and/or electronic memory 36 , and is accessed and controlled by the computer's operating system. Software updates are also electronically available on mass storage medium or downloadable from the host or vendor website.
  • the software as provided on the computer readable media or downloaded from electronic links, represents a computer program product usable with a programmable computer processor having a computer readable program code embodied therein.
  • the software contains one or more programming modules, subroutines, computer links, and a compilation of executable codes which perform the functionality of the research tool. The user interacts with the software via keyboard, mouse, voice recognition, and other user interface devices to the computer system.
  • Data sources 12 - 16 function as described in FIG. 1 .
  • data scrub or conversion block 18 removes unnecessary data and converts and filters the data into a numeric format for training the self-organizing maps, i.e. semantic map 60 and concept map 70 .
  • the process of converting the data into a numeric format compatible with self-organizing maps can take many forms.
  • the words in the data items are evaluated to identify those words that are distinctive, i.e., words having a high information content. The distinctive words are kept; other words are discarded.
  • the selection of distinctive words is in part dependent on the relevant domain, i.e., application context.
  • One domain may relate to marketing applications; another domain tracks nation defense applications; another domain involves criminal investigations; another domain relates to medical research; and so on.
  • the learning process is tuned to the specific domain of interest, which will impact the selection of distinctive words for training.
  • data items 1 - 9 are filtered to strip off articles and other superfluous or dead words, i.e., words that convey little or no meaning or information in the overall context.
  • dead words are “the”, “and”, “what”, “to”, “be”, “of”, “in”, “his”, “with”, “is”, “on”, “a”, “for”, “that”, etc.
  • the words are also reduced to their root form by stemming, e.g., “called” is changed to “call”, “planning” is changed to “plan”, and “accordingly” is changed to “accord”.
  • stemming e.g., “called” is changed to “call”, “planning” is changed to “plan”, and “accordingly” is changed to “accord”.
  • the stemming of words to their root form will also depend on the domain of interest.
  • the words of the data items are filtered for frequency of use. Each word is counted for its frequency of use in the data items 1 - 9 . Words that are used infrequently are discarded because they are generally not important to the central idea of the passage. Note that the synonym conversion to change similar meaning words to their common form as discussed above will make some infrequently used words into more frequently used words. Words that are used too frequently are discarded because they lose their distinctiveness by redundant usage in the passage.
  • the self-organizing maps (SOM) discussed below have difficulty in learning infrequently used words or non-distinctive terms. The words have mid-range of frequency of use are kept.
  • the word filter further considers the type of words. Nouns or active verbs generally have more information content and are kept.
  • the words can be compared against a database of high information content words.
  • the distinctive words having high information content e.g., “police”, “behavior”, “gun”, “phone”, “attack”, “shot”, “north”, “explode”, “penetrate”, and “damage”, are kept, again in view of the domain of interest.
  • the data items 1 - 9 from Table 1 are reduced to the distinctive words for each data item as provided in Table 2.
  • the words of Table 2 make up a list or dictionary of distinctive words to be trained into the self-organizing maps.
  • Data item 1 pastor, call, JTTF, report consider, suspicious, behavior, give, recent, events, experience, psychological, trouble, men, believe, plan, new, attack, accord, spend, unusual, amount, time, hostel, telephone, talk, people, Far East
  • Data item 2 call report, concern, employee, try, buy, sniper, rifle, supervisor, seven, month, gave, lift, home, hostel, ask, stop, gun, shop, overheard, part, conversation, gun, shop, owner, ask, telescopic, sight, folding, gun, stock
  • Data item 3 call receive, resident, report, hear, sound, rifle, discharge, call, thought, rifle, M16, during, call, dispatch, hear, discharge, phone, patrol, car, dispatch, locate, shot, fire
  • 5 apprehend, during, attempt, rob, jail, investigate, inform, investigate, officer, show, plan,
  • a random number generator generates double precision random numbers. For each word, a series of random numbers are assigned to a vector. The vector of random numbers are the numeric representation of the word. For the word “pastor”, the random number generator creates N random numbers. The vector for “pastor” is V 1 : (A 1 , A 2 , . . . A N ), where A i is a random number and N is an integer. For the word “call”, the random number generator creates another N random numbers. The vector for “call” is V 2 : (B 1 , B 2 , . . .
  • B N B i ), where B i is a random number.
  • the random number generator creates another N random numbers.
  • the vector for “JTTF” is V 3 : (C 1 , C 2 , . . . C N ), where C i is a random number.
  • Each distinctive word from Table 2 now has a unique vector of random numbers. If there are M distinctive words in the dictionary, then there will be M vectors, each vector containing N random numbers. If the same word has multiple occurrences in the data items 1 - 9 , it is given the same vector. Thus, the distinctive word “call” has the same vector for all its occurrences in data items 1 - 9 .
  • the distinctive words in Table 2 are maintained in the same sequence from the data item as read from the data source. From data item 1 , “pastor” is followed by “call” which is followed by “JTTF”, and so on. For each distinctive word in the dictionary, an associating vector is generated indicating its relationship to neighboring distinctive words. The word “call” has neighboring distinctive words “pastor” and “JTTF”.
  • the associating vector AV is a concatenation of the vectors of the distinctive word and its neighboring distinctive words.
  • the associating vector for “call” in data item 1 is AV 1 : (V 1 , V 2 , V 3 ).
  • the vector AV 1 has 3N random numbers, i.e., AV 1 : (A 1 , A 2 , . . . A N , B 1 , B 2 , . . . B N , C 1 , C 2 , . . . C N ).
  • the word “call” appears in data item 3 with neighboring words “discharge” and “thought”.
  • the vector for “discharge” is V 4 : (D 1 , D 2 , . . . D N ), and the vector for “thought” is V 5 : (E 1 , E 2 , . . . E N ).
  • the associating vector for “call” in data item 1 is different than the associating vector for “call” in data item 3 .
  • each distinctive word may use additional neighboring distinctive words in forming the associating vector.
  • the associating vector may use the two closest distinctive words or the three closest distinctive words in forming the associating vector.
  • the learning process performs a statistical combination of the W associating vectors into one composite associating vector.
  • the statistical combination is an average or mean of the W associating vectors.
  • the average may be weighted by scaling a portion of each associating vector.
  • the center portion of the associating vector from the distinctive word may be multiplied by a constant less than 1.0 to de-emphasize its contribution to the overall average of the composite associating vector.
  • the output of data scrub and conversion block 18 is the collection of composite associating vectors for each distinctive word in the dictionary.
  • the composite associating vectors are transferred onto the first self-organizing map embodied as semantic map 60 .
  • Semantic map 60 contains a plurality of cells or zones, and xy coordinates defining the map, see FIG. 6 .
  • the composite associating vectors are arranged on the semantic map so that like vectors are grouped together. A distribution of the associating vectors from the dictionary of distinctive words is thus generated.
  • the associating vectors are each assigned Cartesian coordinates on semantic map 60 so that like vectors are grouped together and dislike vectors are spaced apart.
  • the starting assignment of the associating vectors to specific xy coordinates can be arbitrary, but subsequent assignments must be relative to prior assignments to keep similar vectors nearby and dissimilar vectors apart.
  • the Cartesian coordinates will position each associating vector AV in one of the plurality of cells.
  • FIG. 6 shows further detail of a simplified view of semantic map 60 .
  • Semantic map 60 organizes the words grammatically and semantically into zones or cells 64 - 68 used to encode the information from the data items 1 - 9 .
  • Semantic map 60 can be viewed as a thesaurus of the dictionary of distinctive words to show how these words are used in relative context within the data items 1 - 9 .
  • the associating vectors for distinctive words “army” and “soldier” are placed in cell 64 ; the associating vectors for distinctive words “pastor” and “call” are placed in cell 65 ; the associating vectors for distinctive words “sniper” and “marksman” are placed in cell 66 ; the associating vectors for distinctive words “JTTF” and “police” are placed in cell 67 ; the associating vectors for distinctive words “arrest” and “warrant” are placed in cell 68 .
  • the remaining distinctive words are distributed across the semantic map 60 in xy coordinates according their respective associating vectors, which places each distinctive word into one of the cells as shown.
  • Semantic map 60 is thus a visual representation of the proximity of closely related distinctive words and the separation of dissimilar distinctive words.
  • the second self-organizing map embodied as concept maps 70 from FIG. 5 , is trained or generated from semantic map 60 .
  • the distinctive words are given in sequence from the data item 1 - 9 from the data sources 12 - 16 .
  • the sequence of distinctive words are “pastor”, “call”, “JTTF”, “report”, “consider”, “suspicious”, “behavior”, “give”, “recent”, and so on.
  • the sequence of distinctive words are “call”, “report”, “concern”, “employee”, “try”, “buy”, “sniper”, “rifle”, “supervisor”, “seven”, and so on.
  • Each data item has its given sequence of distinctive words.
  • Each sequence of distinctive words from the data item 1 - 9 is evaluated to find the matches or hits on semantic map 60 .
  • the length of the sequences are selected to be long enough to get sufficient hits to form a meaningful association between distinctive words, but not so long as to make the distinctive word association blurry or lose resolution.
  • a given sequence may be 10-20 or more distinctive words in length.
  • the distinctive words from any data item may be evaluated together or broken into two or more sequences.
  • the hits of distinctive words on semantic map 60 are used to form vector representations of each sequence.
  • the first sequence is the first group of fourteen distinctive words from data item 1 , i.e., “pastor”, “call”, “JTTF”, . . . “men”.
  • a semantic vector is then formed for the first sequence. Assume there are 100 cells in semantic map 60 . Each semantic vector has 100 elements, one for each cell. If any cell C from the semantic map has a distinctive word from the sequence, i.e. a hit, then a value is entered for that element in the vector corresponding to the closeness of the placement of the word to the center of the cell. If cell C has no words from the sequence, then a value of zero is entered for that element in the vector.
  • the first cell of the semantic map has no words from the first sequence
  • the first element of the semantic vector is zero.
  • the second cell contains a distinctive word from the first sequence
  • a value greater than zero and less or equal to one is entered.
  • the non-zero value is representative of the strength of association of the distinctive word with respect to other distinctive words assigned to the same cell.
  • a value of one corresponds to the center of the cell, i.e., high strength of association.
  • a value approaching zero corresponds to the perimeter of the cell, i.e., low strength of association.
  • the word “pastor” is given a value of say 0.25 from its relative position to the center of cell 65 .
  • the word “call” is given a value of say 0.78 from its relative position to the center of cell 65 .
  • the word “JTTF” is given a value of say 0.86 from its relative position to the center of cell 67 .
  • the hits on the semantic map from the first sequence may form the semantic vector SV 1 : (0, 0.25, 0, 0.78, 0, 0, 0, 0, 0.86, 0, 0, 0, . . . ).
  • the second sequence is the second group of fourteen distinctive words from data item 1 .
  • the hits on semantic map 60 from the second sequence may form a second semantic vector SV 2 : (0, 0.34, 0, 0, 0.56, 0.92, 0, 0, 0, 0.80, 0, 0.61, . . . ).
  • the third sequence is the first group of fifteen distinctive words from data item 2 .
  • the hits on semantic map 60 from the third sequence form a third semantic vector SV 3 .
  • the fourth sequence is the second group of sixteen distinctive words from data item 2 .
  • the hits on semantic map 60 from the fourth sequence form a fourth semantic vector SV 4 .
  • a plurality of semantic vectors SV 1-T are formed from each defined sequence of distinctive words from data items 1 - 9 , where T is an integer of the number of defined sequences.
  • the semantic vectors SV 1-T are used to train concept map 70 .
  • the semantic vectors SV 1-T are then transferred onto concept map 70 .
  • the semantic vectors are arranged on the concept map 70 so that like vectors are grouped together.
  • the semantic vectors SV 1-T are each assigned Cartesian coordinates on concept map 70 so that like vectors are grouped together into a cluster.
  • a first cluster 72 contains like semantic vectors; a second cluster 74 contains like semantic vectors; a third cluster 76 contains like semantic vectors.
  • the semantic vectors in cluster 72 are dissimilar to the semantic vectors in cluster 74 and cluster 76 ; the semantic vectors in cluster 74 are dissimilar to the semantic vectors in cluster 72 and cluster 76 ; the semantic vectors in cluster 76 are dissimilar to the semantic vectors in cluster 72 and cluster 74 .
  • cluster 72 is made of semantic vectors like 71 and 73 derived from sequences of distinctive words associated with “suspicious behavior”.
  • Cluster 74 is made of semantic vectors like 75 and 77 derived from sequences of distinctive words associated with “acquiring weapon”.
  • Cluster 76 is made of semantic vectors like 79 and 81 derived from sequences of distinctive words associated with “practicing with weapon”.
  • the semantic vectors link back to the data items used to generate the sequence of distinctive words.
  • the research tool 10 can display the fundamental context of the text fragment associated with the sequence of distinctive words used to form that cluster.
  • a plurality of concept maps like 70 are formed from many different data items and many different data sources.
  • the plurality of concept maps are used together to gain a larger picture of the knowledge contained within the data items from many data sources.
  • the concept maps may have enhanced graphics such as colors, patterns, shapes, and forms to aid in the visual representations.
  • concept maps 70 can be read by analysts having subject matter expertise in the domain of interest to visually search for patterns of recognition and knowledge within the maps.
  • the analyst can point and click on various clusters and features in the concept maps and see the underlying basis for the formation of the clusters.
  • the analyst learns to read the concept maps by recognizing the patterns of knowledge within the clusters.
  • the analyst can look at the clusters and understand what information from the data items 1 - 9 each cluster refers to.
  • the dialectic argument structure 80 is a series of individual dialectic arguments that together form hypothesis 110 as discussed below.
  • the analyst may see that cluster 72 on the concept map 70 associates text fragments related to “suspicious behavior”.
  • Cluster 74 on concept map 70 associates text fragments related to “acquiring weapon”.
  • Cluster 76 on concept map 70 associates text fragments related to “practicing with weapon”.
  • the elements of the cluster have links back to the original data items 1 - 9 from data sources 12 - 16 .
  • the analyst can determine what text fragments for “suspicious” behavior” can be attributed to S 1 .
  • the analyst can determine what text fragments for “acquiring weapon” can be attributed to S 1 , and what text fragments for “practicing with weapon” can be attributed to S 1 . Accordingly a supporting argument can be made that S 1 's behavior is suspicious, S 1 is acquiring a weapon, S 1 is practicing with weapon, and S 1 is a troubled person.
  • the distance from the semantic vector (and accordingly the associated sequence of distinctive words and text fragments) to the center of each cluster can be calculated as a plausibility score, or degree of uncertainty or fuzziness of the text fragment.
  • the plausibility score of the text fragments used to form semantic vector 71 for S 1 's suspicious behavior may be 0.51; the plausibility score of the text fragments used to form semantic vector 75 for S 1 acquiring a weapon may be 0.35; the plausibility score of the text fragments used to form semantic vector 79 for S 1 practicing with weapon may be 0.43; and the plausibility of the text fragments for S 1 being a troubled person may be 0.76.
  • the plausibility score is a function of the distance from the center of the cluster to the semantic vector associated with the text fragment. The greater the distance from the center; the less the value. The less the distance from the center; the greater the value.
  • the text fragment may also contradict the premise. For example, there may be no support for S 1 having any direct terrorist link, e.g., the text fragment used to form a semantic vector may indicate that S 1 has no passport, which is atypical of most terrorists. The average of the semantic vectors, both supporting and non-supporting, are used to form the dialectic argument.
  • the analyst would be aware of common threads and indicia that may lead to the premise of the dialectic argument.
  • An analysis of off-shore terrorist attacks on US interests have shown there is a general pattern of development.
  • First, the would-be terrorist has experienced social trauma that predisposes him to violent or suspicious behavior and a desire for retribution.
  • the relative weight of each text fragment is a function of its plausibility score.
  • the plausibility scores can be viewed as the fuzziness of the text fragment, i.e., the strength or degree of certainty of the statement in supporting or rebutting the claim for the dialectic argument. Even though some text fragments may be farther from the center of the cluster, the semantic vector associated with the distant text fragment will be given is respective plausibility score or fuzziness factor which will be taken into account in the premise of the claim.
  • the analyst can form a second dialectic argument that S 1 is planning an attack.
  • the supporting semantic vectors may be that S 1 is a terrorist, S 1 has a plan, and S 1 has broken the law.
  • a rebuttal text fragment may be that S 1 has passed lie detector tests.
  • the supporting and rebutting text fragments are derived from the clusters of the concept map as read by the analyst.
  • Each supporting semantic vector will have a plausibility score, which in combination define the plausibility of the claim associated with the second dialectic argument.
  • the warrant relied upon by the analyst may be that an analysis of successful attacks on federal buildings has shown that considerable effort is expended into planning.
  • the terrorist leaves an event trail that gives away his or her intentions.
  • the events range from informants giving information to police departments, minor traffic infractions, to suspicious activities reported by the public.
  • the final phase of planning can be identified when there is a surge in communication between the terrorist and his off-shore support network.
  • the plausibility scores for the supporting and rebutting text fragments are combined into the strength of the dialectic argument that S 1 is planning an attack.
  • the analyst can form a third dialectic argument that S 1 is a serial killer.
  • the supporting semantic vectors may be that S 1 has a motive, S 1 destroyed someone, and S 1 has broken the law.
  • a rebuttal text fragment may be that S 1 could not be placed at the scene of the crime.
  • the supporting and rebutting text fragments are derived from the clusters of the concept map as read by the analyst.
  • Each supporting semantic vector will have a plausibility score, which in combination define the plausibility of the claim associated with the third dialectic argument.
  • serial killers have a distinct modus operandi (MO) and signature that align similar events and provide key concepts for finding possible motives.
  • MO modus operandi
  • the plausibility scores for the supporting and rebutting text fragments are combined into the strength of the dialectic argument that S 1 is a serial killer.
  • dialectic argument 82 supports dialectic argument 84 which in turn builds to dialectic argument 86 .
  • the information is discovered by dialectic argument 82 that suggest S 1 might be a terrorist.
  • the first dialectic argument 82 causes a second dialectic argument 84 to look for planning information that validates S 1 is a terrorist, i.e., that S 1 is planning an attack.
  • the third dialectic argument 96 finds the second dialectic argument and uses it as a motive-surrogate due to similarities between the crime MO and the terrorist plan.
  • dialectic arguments 92 and 94 together support dialectic argument 96 .
  • the combination of dialectic arguments are used to form a hypothesis 110 .
  • hypothesis 110 the analyst can make specific and educated conclusions about S 1 , i.e., that the authorities should detain and integrate S 1 .
  • the fragmented and diverse data items 1 - 9 have been compiled and analyzed in a manner not before known to yield a desirable and useful result, a thorough investigation of S 1 toward resolution or prevention of the crimes.
  • step 120 textual data is converted into first numeric representations.
  • the textual data is first reduced to a plurality of distinctive words.
  • the plurality of distinctive words are selected based on frequency of usage within the textual data.
  • step 122 a first self-organizing map is formed using the first numeric representations.
  • the first numeric representations of the textual data are organized by similarities.
  • the first numeric representations include a plurality of vectors of random numbers. The vectors are trained onto the first self-organizing map.
  • step 124 a second self-organizing map is formed from second numeric representations generated from the organization of the first self-organizing map.
  • the second numeric representations are organized into clusters of similarities on the second self-organizing map.
  • a plurality of vectors from the first self-organizing map are used to train the second self-organizing map.
  • dialectic arguments are formed from the second self-organizing map to interpret the textual data.
  • the concept map is developed from a set of training documents provided by a subject matter expert (SME).
  • SME subject matter expert
  • the concept map is focused on specific concepts for which the SME then provides explanations.
  • the SME's knowledge is captured without any apriori structuring of the information such as taxonomies that are popular for organizing unstructured information.
  • Organizing information using concept maps enables the knowledge of SMEs to be remembered and shared. It also provides a basis for organizing all new information of the same type that is developed after the concept map is built. By reusing the process used to first organize the training documents, any new information that belongs to the same domain of knowledge can be mapped into the concept maps, thereby extending the scope of information and knowledge found in that concept map. Over time the scope of the concept map grows and has to be regenerated, using the previous concept map as a starting point for training. In this manner, the concept map tracks the development of new knowledge and may spawn new concept maps to form a tree of concept maps to capture all the knowledge.
  • the dialectic argument is used to capture a SME's belief as to how bits of information support, or rebut, an idea.
  • the SME does this by selecting clusters from a variety of concept maps, where each cluster's conceptual idea provides one of the support or rebuttal ideas central to the dialectic argument.
  • the function of the dialectic argument is then to monitor those specific clusters to find relevant pieces of information that instantiate the dialectic argument.
  • the purpose of the dialectic argument is to provide the SME with a means to join the dots based upon the SME's idea as to how the dots might be joined.
  • the concept map clusters are used to group information that is conceptually relevant for the dialectic argument. All the dialectic argument has to do is to select pieces from relevant clusters that are linked by one or more common entities, for example, find information from the required concept map clusters that talk about the same person, place, or thing.
  • the SME Once the SME has developed a dialectic argument, that dialectic argument will spawn one or more agents to find relevant information with each agent homing in on different opportunities. For example, one agent could use a dialectic argument connecting information about water supply to home in on London and New York. Another agent may use the same dialectic argument to home in on water supply information about Berthoud, but fail to ever converge. Convergence is achieved when the plausibility of the dialectic argument reaches a satisfactory threshold. At that point all successful instantiations of the dialectic argument are listed for the SME to review. Each instantiation is a hypothesis that the SME must evaluate for credibility and if credible further analysis. Such instantiations of the dialectic argument provide the analyst with disparate pieces of information that would not otherwise have been connected, other than by serendipity. It is this serendipity that aids the analyst in thinking outside the box by providing original connections.
  • the dialectic argument homes in on information that both supports and rebuts the dialectic argument's claim.
  • the SME must identify both types of information much like debaters argue for and against a claim.
  • a single dialectic argument might be considered a template for a mini-debate, where realizations of concepts are drawn form the concept map in place of the debater's memory.
  • the dialectic argument functions like a template in that support or rebuttal information is not selected based upon a key word, but is selected because it fits a key concept.
  • the fit can be fuzzy, meaning it does not have to be an exact fit. Fuzziness allows the dialectic argument to look across a broad expanse of information to join dots that might otherwise be missed. But to ensure the dialectic argument instance does not simply collect nonsense, each selected piece of support and rebuttal information must address a common entity such as a person, place, or thing. In this manner, the fuzziness is productive even though non-specific, meaning that fuzziness is not predefined through rules, as is often the case in fuzzy queries or fuzzy searches.
  • fuzziness is measured by assessing how well each piece of selected information fits the conceptual cluster from which it is drawn. The measure is achieved by measuring how close the selected piece of information is to the center of the cluster. With all such measurements made, the fuzziness of all the information that goes into a particular dialectic argument instantiation is rolled up into a plausibility measure, e.g. by using a root-sum-square indexing scheme.
  • dialectic arguments Just as concept maps capture the knowledge of a SME for reuse, so do the dialectic arguments.
  • concept maps share knowledge, so do dialectic arguments as people reuse someone else's dialectic argument.
  • the claim of one dialectic argument can be used as one of the support, or rebuttal, arguments of another dialectic argument.
  • dialectic arguments can be chained to form more sophisticated hypotheses. Note that the plausibility of a dialectic argument becomes the fuzziness measure when used as a support or rebuttal in another dialectic argument.
  • concept maps and dialectic arguments to form and instantiate hypotheses is central to the research tool as it provides a unique and original method of developing new ideas based upon what is known. In this manner, the concept map and dialectic argument combination is thought to capture a reasoning process, thereby providing a powerful means to connect the dots that is novel and unique.
  • the integration of concept maps and dialectic arguments is what distinguishes the approach as knowledge management as opposed to data or information management.
  • the interpretation of the concept maps takes the form of dialectic arguments that search the maps to find information that supports and rebuts each argument's assertion. Assertions about suspect activities can lead with measured plausibilities.
  • the process of finding and interpreting information found within the semantic map and measuring their plausibility is the dialectic search as described above.
  • the dialectic search avoids the problems associated with classical information extraction and analysis that require the development of countless rules. Instead, it reuses the SME's knowledge and experience directly, via a dialectic argument.
  • the dialectic argument is mechanized using Intelligent Software Agents (ISA) that augments the SME's reasoning ability. With the addition of genetic algorithms there is also the potential to adapt searches to track terrorists through their signature.
  • ISA Intelligent Software Agents
  • the concept maps By instantiating an argument, the concept maps generate leads for the SME to follow.
  • the arguments can also be linked to form a lattice of arguments, elaborating on the lead to generate a more complete description of the situation.
  • the plausibility is computed using the fuzziness of each piece of an argument's support and rebuttal information, which is quantified using the proximity of the information to the semantic search center and the maps' fuzziness functions. Based on this fuzziness, plausibility measurements and confidence levels can be computed.
  • the dialectic argument structure does not depend on deductive or inductive logic, though these may be included as part of the warrant. Instead, the dialectic argument structure depends on non-analytic inferences to find new leads that fits the dialectic argument's warrant.
  • the dialectic argument structure is dialectic because its reasoning is based upon what is plausible; the dialectic argument structure is an hypothesis fabricated from bits of information. The hypothesis is developed into a story that explains the information. The claim is then used to invoke one or more new dialectic argument structures that perform their searches. The developing lattice forms a story that renders the intelligence lead plausible and enables the plausibility to be measured.
  • the aggregate plausibility is computed using the fuzziness of the support and rebuttal information.
  • a dialectic argument structure lattice is formed that relates information with its computed plausibility.
  • the computation uses joint information fuzziness to generate a robust measure of plausibility, a process that is not available using Bayesian methods.
  • the dialectic search requires development, meaning it must be seeded with knowledge from the SME. Once seeded, it has the potential of evolving the warrant to present new types of possible leads. Because the source information is encoded as a vector in the concept map, the source can be guarded but still used within the SOM. This is important where the source is compartmentalized information that can only be read by certain SMEs. If necessary, key information can be purged from the source before encoding without losing essential semantic information required to encode the concept map's semantic vector.
  • the guarded source information is used to support the dialectic search. Once the search has been completed and verified using the computed plausibility, the SME validates the lead's support and rebuttal information by referring back through the SOM's link to read the source information. If the source is guarded, the lead would be passed over to the SME from within that compartment.
  • the ISA can be used to implement the dialectic argument structure.
  • the agency consists of three different agents, the coordinator, the dialectic argument structure, and the search, work together, each having its own learning objectives.
  • the coordinator is taught to watch the concept map, responding to new hits that conform to patterns of known interest.
  • the coordinator selects one or more candidate dialectic argument structure agents, and then spawns search agents to find information relevant to each dialectic argument structure.
  • the coordinator learns which hit patterns are most likely to yield a promising lead, adapting to any changes in the concept map structure and sharing what it learns with other active coordinators.
  • the search agent takes the dialectic argument structure prototype search vectors and, through the SOM, finds information that is relevant and related.
  • the search agent learns to adapt to different and changing source formats and would include parsing procedures required to extract detailed information.
  • the final agent learns fuzzy patterns to evaluate information found by the search agent. Any information that does not quite fit is directed to a sandbox where peer agents can exercise a more aggressive routine to search for alternative hypotheses.
  • the principal activities addressed by the use of agents are to learn to adapt to changes in the surrounding environment, capture the knowledge of the SME for reuse, share information and learning between agent peers, hypothesize with on-the-job-training from the SME, and remember so as to avoid old mistakes and false leads.

Abstract

In a computer implemented method of researching textual data sources, textual data is reduced to a plurality of distinctive words based on frequency of usage within the textual data. The distinctive words are converted into first numeric representations of vectors containing random numbers. A first self-organizing map is formed from the first numeric representations and organized by similarities between the vectors. A second self-organizing map is formed from second numeric representations generated from the organization of the first self-organizing map. The second numeric representations are vectors derived from the first self-organizing map. The vectors are used to train the second self-organizing map. The vectors derived from the first self-organizing map are organized into clusters of similarities between the vectors on the second self-organizing map. Dialectic arguments are formed from the second self-organizing map to interpret the textual data.

Description

    CLAIM TO DOMESTIC PRIORITY
  • The present application is a continuation of U.S. application Ser. No. 11/127,657, filed May 10, 2005, which claims the benefit of priority to provisional application Ser. No. 60/569,978, filed May 10, 2004.
  • FIELD OF THE INVENTION
  • The present invention relates in general to data organization and learning systems and, more particularly, to a system and method of using self-learning conceptual maps to organize and interpret data. The system processes large amounts of information using self-learning algorithms and creates an easily accessible interpretation of core concepts for the benefit of the user.
  • BACKGROUND OF THE INVENTION
  • In our information-based society, there are many sources of data and information. In general, data can be found in all forms, sizes, and contexts. For example, data can be found in news media, Internet, databases, data warehouses, published reports, scientific journals, industry publications, government statistics, court papers, recorded phone conversations, and the like. When the need arises to research a topic or find a solution to a problem, the common approach is to search known data sources and then manually scan the available facts and figures for any useful information.
  • Some data may be stored in a structured format, e.g., in data warehouses or relational databases. Structured data is typically pre-sorted and organized into a useful information format which is relatively easy to search and digest. In fact, assuming the potential questions are known, the data may be properly organized into customized data marts that the user can readily access to retrieve the needed information with minimal effort.
  • There also exist vast amounts of unstructured data that are not so easy to access. Unstructured data may be found in newspaper articles, scientific journals, Internet, emails, letters, and countless other sources that are relatively difficult to organize, search, and retrieve any useful information. The unstructured data is typically just words in a document that have little meaning beyond their immediate context to those in possession of the document. It is most difficult to assess or learn anything from unstructured data, particularly when questions from unrelated areas are posed or when the right questions are not even known. The unstructured data may be just as important as the structured type, sometimes even more so, but its elusiveness often leaves a significant gap in the thoroughness of any search and analysis.
  • The process of searching for relevant and useful information and getting meaningful results is important in many different contexts and applications. The user may be interested in marketing information, medical research, environment problem solving, business analysis, criminal investigation, or anti-terrorist work, just to name a few. In a typical approach, the user creates a list of key words or topics and uses a search engine to electronically interrogate available data sources, e.g., the Internet or various public and private databases. The user will get one or more hits from the search and must then manually review and analyze each reference of interest. The process takes considerable time and effort and, with present research tools, will often overlook key elements of relevant data.
  • Consider the example of a search of potential terrorist threats and targets. Authorities have access to vast amounts of structured information in government databases to use as intelligence gathering tools in the war on terror. The numerous government computer systems are generally not linked together. Data from one agency is not necessarily available to another agency. Moreover, the unstructured data which exists in other places is hard to access and even harder to interpret. There is no central depository of all information.
  • Some key piece of intelligence may exist which, if known to the proper authorities, could avert an attack. The data may come from a newspaper article, email, recorded phone call, or police report. Such information is usually in some innocent or hard to find place. Recall that much of the data related to the 9/11 attack on the World Trade Center was known, it was just not recognized as being relevant or significant. Taken in hindsight, the fact that suspicious individuals were taking limited flying lessons, i.e., learning how to fly but not land, was extremely important. Yet, the right people did not understand, the dots were not connected, no one correlated the fragments of data. The situational dynamics of pre-9/11 remained disjointed and fuzzy.
  • The authorities responsible for homeland security have learned much about intelligence and routinely conduct intelligence sweeps. Still it is highly likely that both structured and unstructured data exist today that if known and understood would be most helpful in preventing future incidents. But mere access to the data is not enough. Even if the data is known, it may not be appreciated for its relevance or significance. The data is often fuzzy, vague, ambiguous, or may have special context. Again the connections between all the dots are still not being made. There is a real need for tools to aid in the analysis and interpretation of data that might otherwise be passed over.
  • The use of computer-based search engines is well-known. More advanced data searching and analysis techniques, such as data mining and various taxonomies (hierarchy of information) exist, but do not fully address unstructured data or data interpretation needs. Much of the useful data presently out there remains very difficult to access and understand. People looking for information in virtually any area face this common problem. Using present search and analysis techniques, it is impractical to track all data from all sources. The individual slices of data are but pieces in an intelligence gathering jigsaw puzzle that requires better tools to understand. Missing intelligence leads to missed opportunities and poor decisions.
  • A need exists to organize all types of data to assist in searching data sources and interpreting the retrieved information, particularly from unstructured data sources.
  • SUMMARY OF THE INVENTION
  • In one embodiment, the present invention is a computer implemented method of researching textual data sources comprising converting textual data into first numeric representations, forming a first self-organizing map using the first numeric representations, wherein the first numeric representations of the textual data are organized by similarities, forming a second self-organizing map from second numeric representations generated from the organization of the first self-organizing map, wherein the second numeric representations are organized into clusters of similarities on the second self-organizing map, and forming dialectic arguments from the second self-organizing map to interpret the textual data.
  • In another embodiment, the present invention is a method of interpreting textual data comprising converting the textual data into first numeric representations, forming a first self-organizing map using the first numeric representations, forming a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map, and forming dialectic arguments from the second self-organizing map to interpret the textual data.
  • In another embodiment, the present invention is a computer program product usable with a programmable computer processor having a computer readable program code embodied therein, comprising computer readable program code which converts the textual data into first numeric representations, computer readable program code which forms a first self-organizing map using the first numeric representations, computer readable program code which forms a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map, and computer readable program code which forms dialectic arguments from the second self-organizing map to interpret the textual data.
  • In another embodiment, the present invention is a computer system for interpreting textual data comprising means for converting the textual data into first numeric representations, means for forming a first self-organizing map using the first numeric representations, means for forming a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map, and means for forming dialectic arguments from the second self-organizing map to interpret the textual data.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a simplified block diagram of self-learning conceptual mapping and data research tool;
  • FIG. 2 illustrates a flow of understanding from raw data to information to knowledge;
  • FIG. 3 illustrates a general computer system for executing the research tool;
  • FIG. 4 illustrates a computer communication network;
  • FIG. 5 is a block diagram of self-learning conceptual mapping and interpretation tool;
  • FIG. 6 illustrates further detail of the semantic map;
  • FIG. 7 illustrates further detail of the conceptual map;
  • FIG. 8 illustrates further detail of the dialectic argument structure; and
  • FIG. 9 illustrates the process of researching and interpreting textual data.
  • DETAILED DESCRIPTION OF THE DRAWINGS
  • The present invention is described in one or more embodiments in the following description with reference to the Figures, in which like numerals represent the same or similar elements. While the invention is described in terms of the best mode for achieving the invention's objectives, it will be appreciated by those skilled in the art that it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims and their equivalents as supported by the following disclosure and drawings.
  • A knowledge-based research tool is presented that is capable of digesting massive amounts of data and creating a visual representation or conceptual map of the information, organized in related topics. The technology utilized in research tool, coined as knowledgePOOL and knowledgeSEED (kPS), has numerous business applications in both the government and private sectors. The novel business case of the kPS technology is founded upon the fact that kPS is embodied as an electronic computer processing tool and takes the place of many human analysts and others involved in searching for and analyzing relevant data, given a specific area or topic of interest. The kPS tool is continuous on-line and is significantly faster and more capable than even a large team of people, in terms of accessing large amounts of information, reading and digesting the information, inferring concepts contained in the information digested, and searching for designated concepts. It would take a massive logistical effort to achieve the same results that the kPS technology can achieve, a task that is impractically massive for human solution in most cases. The ability of the software to synthesize the results of its information intake is ultimately beyond even large teams of people from a practical standpoint. Moreover, the cost alone of mounting a human team to achieve similar results would prohibit many such efforts, even if they are considered. The kPS research tool is faster than human effort, saves labor resources, and facilitates otherwise impractical analysis efforts.
  • The kPS tool typically runs under the guidance of one or more subject matter experts, who focus the invention's actions with respect to data sources, concept guidance, and other tunable parameters. kPS can be termed a tunable concept inference engine that learns as it goes and that can be guided by human analysts. The kPS tool is more than a key-word search engine or concept-driven search engine; the tool can infer concepts from what it reads. Thus, one aspect of kPS is that it is tunable in many different ways, e.g. by concept, data source, language, dialect, and previous results. Language is not innately an obstacle, once some customization takes place to facilitate reading and comprehension.
  • The kPS tool is a self-teaching technology, but in addition to that, intermediate extracted concepts can be used by the controlling human analyst to further tune its activities, which allows for refinement of searches, making them more efficient and productive. The self-learning technology is the root of the novel business case. kPS not only saves labor, but facilitates otherwise impractical analysis efforts. The application areas are numerous and exceptionally rich. Broad applicability exists for criminal investigation, terrorism, homeland security, reconnaissance, national defense, marketing, opinion, scientific, engineering, medical research, environment problem solving, economic studies, and business analysis, just to name a few.
  • In fighting crime and terrorism, the focus can be varied for different purposes. kPS can run at a high level over many sources searching for a broadly defined range of topics. The sources can include, for example, federal criminal databases, reconnaissance data from agents in the field, and online newspaper accounts. In a terrorism application, kPS can read not just intercepted communications in other languages, but different dialects of other languages. kPS focus can be trained upon one specific terrorist organization, or the focus can be broadened to include suspicious chatter from many sources and languages. By using a variable focus with respect to language, dialect, data source, concept, analytical target group, and other tunable parameters, an analyst or group of analysts can maximize the performance of kPS for a broad range of purposes, such as homeland security.
  • In a national defense application, military online communications, e.g., Internet, can be monitored and analyzed to detect possible equipment or supply inadequacies, or morale problems. An intercepted communication from a hostile source can be analyzed to detect possible intelligence, hostile troop movements, or sabotage.
  • In a marketing application, surveys can be analyzed to determine subtleties of preference across certain products or classes of products. Focus can be narrowed to a single product, or it can be widened to discern broad trends in taste and preference for a target market. Previous results that are now intermediate with respect to future searches can be used to refine a kPS search, thus making it much more efficient in its operation, which can make kPS more useable to businesses that are concerned with timeliness of results.
  • In an opinion environment, public polls can be analyzed for potential application in formulating the content of a political party's platform. Both the Democratic and Republican parties in the United States could find this quite useful, as well as the myriad political parties in countries across the globe.
  • In a science application, information gathered from papers is organized by concept and new hypotheses developed by connecting concepts in new ways. Presently, researchers must spend hours reading and recalling before new hypotheses spring to mind. Given the conceptual organization of knowledgePOOL and the ability of knowledgeSEED to connect information in new and interesting ways, the cognitive work of the researcher is accelerated.
  • The embodiments described above are representative of the areas of application that the kPS research tool encompasses. Many more such embodiments exist, thus exhibiting an expansive range of possible business applications for the invention.
  • In its simplified architecture, as shown in FIG. 1, the research tool 10 accesses vast amounts of data from data sources 12, 14, and 16. Data sources 12-16 can be structured or unstructured in context and format and represent many different sources of information. Data source 12 can be a database containing structured data; data source 14 can be a newspaper article containing unstructured data. Other potential sources of data include news media, Internet, published reports, scientific journals, industry publications, government statistics, court papers, recorded phone conversations, etc. The raw data from data sources 12-16 is processed through data scrub or conversion block 18, which strips unnecessary data and converts the data into a numeric format compatible with self-organizing maps 20. The converted data is trained into self-organizing maps 20 and organized into one or more clusters, as described hereinafter. The self-organizing maps 20 must first learn from the data sources. Once trained, an actionable intelligence block 22 interprets self-organizing maps 20 to make assessments of the data from data sources 12-16 for the benefit of the user.
  • Data comes in many different formats, sizes, and contexts. Structured data is typically pre-sorted and organized into a useful information format which is relatively easy to search and digest. Structured data is stored in specific locations of the database and data warehouses. Unstructured data is the words and sentences found in everyday settings, e.g., newspaper articles, scientific journals, Internet, emails, letters, financial records, special government licenses and permits, and countless other sources of routinely keep records. The unstructured data is typically just words in a document that have little meaning beyond their immediate context to those in possession of the document. In general, unstructured data is difficult to organize, search, and retrieve any useful information.
  • FIG. 2 illustrates the flow or hierarchy of information from both structured and unstructured data sources. Raw data 24 is just a word and groups of words that has little meaning beyond its immediate context. Information 26 comes from raw data that is organized and formatted to convey a higher meaning. Information 26 may be a document containing raw data 24 that is put together in a manner which presents ideas to the reader. While information 26 may exist, it may not be understood or fully appreciated by the reader. Knowledge 28 is achieved when the information is understood and appreciated for the purpose which it was presented, as well as other purposes which can be attained from the information. The research tool 10 addresses the need to gain knowledge from information, even when the information is vast, unstructured, fuzzy, and derived from many uncorrelated data sources.
  • In one embodiment, the above system and process can be implemented as one or more software applications or computer programs residing and operating on a computer system. The computer system may be a stand-alone unit or part of a distributed computer network. The computer is typically electronically interconnected with other computers using communication links such as Ethernet, radio frequency (RF), satellite, telephone lines, optical, digital subscriber line, cable connection, wireless, and other recognized communication standards. The electronic connection link between computers can be made through an open architecture system such as the World Wide Web, commonly known as the Internet. The Internet offers a significant capability to share information, data, and software.
  • FIG. 3 illustrates a simplified computer system 30 for executing the software program used in executing the research tool. Computer system 30 is a general purpose computer including a central processing unit or microprocessor 32, mass storage device or hard disk 34, electronic memory 36, and communication port 38. Communication port 38 represents a modem, high-speed Ethernet link, or other electronic connection to transmit and receive input/output (I/O) data with respect to other computer systems.
  • In FIG. 4, computer 30 is shown connected to server 40 by way of communication port 38, which in turn is connected to communication network 42. Server 40 operates as a system controller and includes mass storage devices, operating system, and communication links for interfacing with communication network 42. Communication network 42 can be a local and secure communication network such as an Ethernet network, global secure network, or open architecture such as the Internet. Computer systems 44 and 46 can be configured as shown for computer 30 or dedicated and secure data terminals. Computers 44 and 46 are also connected to communication network 42. Computers 30, 44, and 46 transmit and receive information and data over communication network 42.
  • Computers 30, 44, and 46 can be physically located in any location with access to a modem or communication link to network 42. For example, computer 30 can be located in the host service provider's main office. Computer 44 can be located in a first user's office; computer 46 can be located in a second user's office. Alternatively, the computers can be mobile and follow the users to any convenient location with electronic access to communication network 42.
  • Each of the computers runs application software and computer programs, which can be used to execute the functionality, and provide the research features as described hereinafter. The software is originally provided on computer readable media, such as compact disks (CDs), magnetic tape, or other mass storage medium. Alternatively, the software is downloaded from electronic links such as the host or vendor website. The software is installed onto the computer system hard drive 34 and/or electronic memory 36, and is accessed and controlled by the computer's operating system. Software updates are also electronically available on mass storage medium or downloadable from the host or vendor website. The software, as provided on the computer readable media or downloaded from electronic links, represents a computer program product usable with a programmable computer processor having a computer readable program code embodied therein. The software contains one or more programming modules, subroutines, computer links, and a compilation of executable codes which perform the functionality of the research tool. The user interacts with the software via keyboard, mouse, voice recognition, and other user interface devices to the computer system.
  • In the present discussion, an example will be given wherein raw data is self-learned by research tool 10 to create a conceptual map. The concept map will be analyzed to gain knowledge from the raw data, which otherwise would not be understood or appreciated. Consider the example of a criminal investigation, wherein one or more individuals (sniper S1) are terrorizing or preparing to terrorize a large city. There are usually many facts surrounding individual S1, even before he or she begins the criminal activity. The facts may be reported in many different venues and sources. S1 may be in the country illegally, may have a police report for other activity, may have scheduled court appearances, or may simply have come to the attention of someone who made a written record or report of the contact. S1 may be undergoing sniper or paramilitary training, have applied for special permits or licenses, or purchased suspicious materials. Often times, many facts and circumstances are known to certain people and resources before the terrorist acts occur.
  • For the present example, assume the following table of unstructured data items is read from one or more of the data sources 12-16.
  • TABLE 1
    Data items 1-9 from data sources 12-16.
    Data item 1 August 2001: Pastor F L called the Seattle
    JTTF and reported what he considered to
    be suspicious behavior of S1. Given
    the recent events in DC and his
    experience with psychologically
    troubled men, he believes S1 is
    planning a new attack. According to
    F L, S1 has been spending an unusual
    amount of time on the hostel's
    telephone talking to people in the Far
    East.
    Data item 2 October 2001: J Z called the Seattle police
    department to report a concern she has
    that one of her employee's is trying to
    buy a sniper rifle. J Z has been S1's
    supervisor for seven months. Last
    month, J Z gave S1 a lift home to his
    hostel and was asked by S1 to stop by a
    gun shop . She overheard parts of the
    conversation between S1 and gun shop
    owner where S1 was asking about a
    telescopic sight and folding gun stock.
    Data item 3 November 2001: Calls were received by
    Lemmington police department by
    residents of Lee Street who reported
    hearing what sounded like a rifle being
    discharged. One caller said he thought
    the rifle was an M16 and during this
    call the dispatcher could hear the
    discharge over the phone. A patrol car
    was dispatched but could not locate
    where the shots were being fired.
    Data item 4 May 1999: Residents in Lee Street called
    the Lemmington PD to report domestic
    disturbance. A patrol car was
    dispatched and brief investigations
    made during which S1 was taken in
    custody before being freed on bail.
    Data item 5 July 2001: P K was apprehended by
    Lemmington PD during an attempt to rob
    the Lemmington pawn shop and jailed
    overnight. During investigation, P K
    informed the investigating officer that
    S1 had shown him his plan to explode
    fuel tankers unloading at gas stations
    by firing at them with a modified M16.
    The police passed this information onto
    the Seattle JTTF.
    Data item 6 January 2002: FBI agents attached to the
    Seattle JTTF interviewed P K at
    Lemmington PD concerning his report
    that S1 was planning to blow-up gas
    tankers. The plan was considered to be
    impractical as an M16 bullet fired more
    200 feet from a tanker is unlikely to
    be able to penetrate the steel shell
    and cause any damage.
    Data item 7 March 2002: S1 failed to appear before the
    Lemmington County Court for preliminary
    hearing into a speeding offense. An
    arrest warrant has been issued but S1
    no longer lives at the address used to
    obtain the driving license.
    Data item 8 May 2002: D P was shot and killed while
    filling his car at a gas station in
    north VA. Witnesses reported hearing
    what sounded like a rifle. The bullet
    that killed D P was found to be a .223
    caliber and of the kind fired by an M16
    rifle.
    Data item 9 June 2002: J S was shot and wounded at a
    gas station in north VA. She recalls
    getting out of her car and hearing a
    crack before falling to the ground.
    Local police believe a marksman may be
    involved and have linked the shooting
    to an earlier incident where D P was
    shot and killed.
  • Turning to FIG. 5, further detail of the kPS research tool is shown. Data sources 12-16 function as described in FIG. 1. Again, data scrub or conversion block 18 removes unnecessary data and converts and filters the data into a numeric format for training the self-organizing maps, i.e. semantic map 60 and concept map 70. The process of converting the data into a numeric format compatible with self-organizing maps can take many forms. In general, the words in the data items are evaluated to identify those words that are distinctive, i.e., words having a high information content. The distinctive words are kept; other words are discarded.
  • The selection of distinctive words is in part dependent on the relevant domain, i.e., application context. One domain may relate to marketing applications; another domain tracks nation defense applications; another domain involves criminal investigations; another domain relates to medical research; and so on. The learning process is tuned to the specific domain of interest, which will impact the selection of distinctive words for training.
  • In one embodiment, data items 1-9 are filtered to strip off articles and other superfluous or dead words, i.e., words that convey little or no meaning or information in the overall context. Examples of dead words are “the”, “and”, “what”, “to”, “be”, “of”, “in”, “his”, “with”, “is”, “on”, “a”, “for”, “that”, etc.
  • Next, synonyms and words with similar meaning are converted to their common form, e.g., “Federal Bureau of Investigation” and “the Bureau” are changed to “FBI”, “United States” and “America” are changed to “US”, “aircraft” is changed to “airplane”, and “pistol” and “side arm” are changed to “hand gun”. Each domain of interest will have a conversion schedule or thesaurus for assigning synonyms to a common form of the words.
  • The words are also reduced to their root form by stemming, e.g., “called” is changed to “call”, “planning” is changed to “plan”, and “accordingly” is changed to “accord”. The stemming of words to their root form will also depend on the domain of interest.
  • Next, the words of the data items are filtered for frequency of use. Each word is counted for its frequency of use in the data items 1-9. Words that are used infrequently are discarded because they are generally not important to the central idea of the passage. Note that the synonym conversion to change similar meaning words to their common form as discussed above will make some infrequently used words into more frequently used words. Words that are used too frequently are discarded because they lose their distinctiveness by redundant usage in the passage. The self-organizing maps (SOM) discussed below have difficulty in learning infrequently used words or non-distinctive terms. The words have mid-range of frequency of use are kept. The word filter further considers the type of words. Nouns or active verbs generally have more information content and are kept. Finally, the words can be compared against a database of high information content words. The distinctive words having high information content, e.g., “police”, “behavior”, “gun”, “phone”, “attack”, “shot”, “north”, “explode”, “penetrate”, and “damage”, are kept, again in view of the domain of interest.
  • The data items 1-9 from Table 1 are reduced to the distinctive words for each data item as provided in Table 2. The words of Table 2 make up a list or dictionary of distinctive words to be trained into the self-organizing maps.
  • TABLE 2
    Distinctive words for data items 1-9.
    Data item 1 pastor, call, JTTF, report, consider,
    suspicious, behavior, give, recent,
    events, experience, psychological,
    trouble, men, believe, plan, new,
    attack, accord, spend, unusual, amount,
    time, hostel, telephone, talk, people,
    Far East
    Data item 2 call, report, concern, employee, try,
    buy, sniper, rifle, supervisor, seven,
    month, gave, lift, home, hostel, ask,
    stop, gun, shop, overheard, part,
    conversation, gun, shop, owner, ask,
    telescopic, sight, folding, gun, stock
    Data item 3 call, receive, resident, report, hear,
    sound, rifle, discharge, call, thought,
    rifle, M16, during, call, dispatch,
    hear, discharge, phone, patrol, car,
    dispatch, locate, shot, fire
    Data item 4 resident, call, report, domestic,
    disturbance, patrol, car, dispatch,
    investigation, made, take, custody,
    free, bail
    Data item 5 apprehend, during, attempt, rob, jail,
    investigate, inform, investigate,
    officer, show, plan, explode, fuel,
    tank, gas, fire, M16, information, JTTF
    Data item 6 JTTF, interview, concern, report, plan,
    blow-up, gas, tank, plan, consider,
    M16, bullet, fire, tank, penetrate,
    steel, damage
    Data item 7 fail, appear, preliminary, hearing,
    speeding, offense, arrest, warrant,
    issue
    Data item 8 shot, kill, gas, station, north, VA,
    witness, report, hear, sound, rifle,
    bullet, kill, found, .223, caliber,
    fire, M16, rifle
    Data item 9 shot, wound, gas, station, north, VA,
    recall, car, hear, fall, ground, local,
    police, believe, marksman, involve,
    link, shoot, incident, kill
  • The dictionary of distinctive words are converted to numeric representations. Each word is given a unique number. There are many algorithms which can perform the numeric conversion. In one embodiment, a random number generator generates double precision random numbers. For each word, a series of random numbers are assigned to a vector. The vector of random numbers are the numeric representation of the word. For the word “pastor”, the random number generator creates N random numbers. The vector for “pastor” is V1: (A1, A2, . . . AN), where Ai is a random number and N is an integer. For the word “call”, the random number generator creates another N random numbers. The vector for “call” is V2: (B1, B2, . . . BN), where Bi is a random number. For the word “JTTF”, the random number generator creates another N random numbers. The vector for “JTTF” is V3: (C1, C2, . . . CN), where Ci is a random number. The number of random numbers in the vector Vi must be sufficiently large to ensure that the vector representation for each word is mathematically orthogonal and unique. For the present discussion, N=50. Other values of N can be used as well, dependent in part on the domain of interest.
  • Each distinctive word from Table 2 now has a unique vector of random numbers. If there are M distinctive words in the dictionary, then there will be M vectors, each vector containing N random numbers. If the same word has multiple occurrences in the data items 1-9, it is given the same vector. Thus, the distinctive word “call” has the same vector for all its occurrences in data items 1-9.
  • The distinctive words in Table 2 are maintained in the same sequence from the data item as read from the data source. From data item 1, “pastor” is followed by “call” which is followed by “JTTF”, and so on. For each distinctive word in the dictionary, an associating vector is generated indicating its relationship to neighboring distinctive words. The word “call” has neighboring distinctive words “pastor” and “JTTF”. The associating vector AV is a concatenation of the vectors of the distinctive word and its neighboring distinctive words. The associating vector for “call” in data item 1 is AV1: (V1, V2, V3). Thus, in the present example, the vector AV1 has 3N random numbers, i.e., AV1: (A1, A2, . . . AN, B1, B2, . . . BN, C1, C2, . . . CN).
  • If “call” appears in another location of the data items with different neighboring words, then it will likely have a different associating vector. The word “call” appears in data item 3 with neighboring words “discharge” and “thought”. The vector for “discharge” is V4: (D1, D2, . . . DN), and the vector for “thought” is V5: (E1, E2, . . . EN). The associating vector for “call” in data item 3 also contains 3N random numbers from the concatenation of the distinctive word vector and its neighboring vectors, i.e., AV2: (V4, V2, V5)=(D1, D2, . . . DN, B1, B2, . . . BN, E1, E2, . . . EN). The associating vector for “call” in data item 1 is different than the associating vector for “call” in data item 3.
  • In another embodiment, each distinctive word may use additional neighboring distinctive words in forming the associating vector. For example, the associating vector may use the two closest distinctive words or the three closest distinctive words in forming the associating vector.
  • If there are W occurrences of a distinctive word in the data items being learned, then there will be W associating vectors for the distinctive word. The learning process performs a statistical combination of the W associating vectors into one composite associating vector. In one embodiment, the statistical combination is an average or mean of the W associating vectors.
  • The average may be weighted by scaling a portion of each associating vector. The center portion of the associating vector from the distinctive word may be multiplied by a constant less than 1.0 to de-emphasize its contribution to the overall average of the composite associating vector. By scaling the composite associating vector, the context of how the word is used in the passage with its neighboring distinctive words is emphasized.
  • The above process is repeated for each distinctive word in the dictionary. Thus, for each word in the dictionary, a composite associating vector is generated. The composite associating vector is then trained into the self-organizing maps. If there are X distinctive words in the dictionary, then there will be X composite associating vectors generated for training into the self-organizing maps.
  • Returning to FIG. 5, the output of data scrub and conversion block 18 is the collection of composite associating vectors for each distinctive word in the dictionary. The composite associating vectors are transferred onto the first self-organizing map embodied as semantic map 60. Semantic map 60 contains a plurality of cells or zones, and xy coordinates defining the map, see FIG. 6. In general, the composite associating vectors are arranged on the semantic map so that like vectors are grouped together. A distribution of the associating vectors from the dictionary of distinctive words is thus generated. The associating vectors are each assigned Cartesian coordinates on semantic map 60 so that like vectors are grouped together and dislike vectors are spaced apart. The starting assignment of the associating vectors to specific xy coordinates can be arbitrary, but subsequent assignments must be relative to prior assignments to keep similar vectors nearby and dissimilar vectors apart. The Cartesian coordinates will position each associating vector AV in one of the plurality of cells.
  • FIG. 6 shows further detail of a simplified view of semantic map 60. Semantic map 60 organizes the words grammatically and semantically into zones or cells 64-68 used to encode the information from the data items 1-9. Semantic map 60 can be viewed as a thesaurus of the dictionary of distinctive words to show how these words are used in relative context within the data items 1-9.
  • In the present example, the associating vectors for distinctive words “army” and “soldier” are placed in cell 64; the associating vectors for distinctive words “pastor” and “call” are placed in cell 65; the associating vectors for distinctive words “sniper” and “marksman” are placed in cell 66; the associating vectors for distinctive words “JTTF” and “police” are placed in cell 67; the associating vectors for distinctive words “arrest” and “warrant” are placed in cell 68. The remaining distinctive words are distributed across the semantic map 60 in xy coordinates according their respective associating vectors, which places each distinctive word into one of the cells as shown. Semantic map 60 is thus a visual representation of the proximity of closely related distinctive words and the separation of dissimilar distinctive words. Although semantic map 60 is shown in two-dimensional form, the same concept could be applied to Z-dimensional maps, where Z is any integer.
  • The second self-organizing map, embodied as concept maps 70 from FIG. 5, is trained or generated from semantic map 60. Returning to Table 2, the distinctive words are given in sequence from the data item 1-9 from the data sources 12-16. In data item 1, the sequence of distinctive words are “pastor”, “call”, “JTTF”, “report”, “consider”, “suspicious”, “behavior”, “give”, “recent”, and so on. In data item 2, the sequence of distinctive words are “call”, “report”, “concern”, “employee”, “try”, “buy”, “sniper”, “rifle”, “supervisor”, “seven”, and so on. Each data item has its given sequence of distinctive words.
  • Each sequence of distinctive words from the data item 1-9 is evaluated to find the matches or hits on semantic map 60. The length of the sequences are selected to be long enough to get sufficient hits to form a meaningful association between distinctive words, but not so long as to make the distinctive word association blurry or lose resolution. A given sequence may be 10-20 or more distinctive words in length. The distinctive words from any data item may be evaluated together or broken into two or more sequences. The hits of distinctive words on semantic map 60 are used to form vector representations of each sequence.
  • In the present example, the first sequence is the first group of fourteen distinctive words from data item 1, i.e., “pastor”, “call”, “JTTF”, . . . “men”. A semantic vector is then formed for the first sequence. Assume there are 100 cells in semantic map 60. Each semantic vector has 100 elements, one for each cell. If any cell C from the semantic map has a distinctive word from the sequence, i.e. a hit, then a value is entered for that element in the vector corresponding to the closeness of the placement of the word to the center of the cell. If cell C has no words from the sequence, then a value of zero is entered for that element in the vector.
  • To illustrate the formation of the semantic vectors, if the first cell of the semantic map has no words from the first sequence, then the first element of the semantic vector is zero. If the second cell contains a distinctive word from the first sequence, then a value greater than zero and less or equal to one is entered. The non-zero value is representative of the strength of association of the distinctive word with respect to other distinctive words assigned to the same cell. A value of one corresponds to the center of the cell, i.e., high strength of association. A value approaching zero corresponds to the perimeter of the cell, i.e., low strength of association. For example, the word “pastor” is given a value of say 0.25 from its relative position to the center of cell 65. The word “call” is given a value of say 0.78 from its relative position to the center of cell 65. The word “JTTF” is given a value of say 0.86 from its relative position to the center of cell 67. As an illustration, the hits on the semantic map from the first sequence may form the semantic vector SV1: (0, 0.25, 0, 0.78, 0, 0, 0, 0, 0.86, 0, 0, 0, . . . ).
  • The second sequence is the second group of fourteen distinctive words from data item 1. As an illustration, the hits on semantic map 60 from the second sequence may form a second semantic vector SV2: (0, 0.34, 0, 0, 0.56, 0.92, 0, 0, 0, 0.80, 0, 0.61, . . . ). The third sequence is the first group of fifteen distinctive words from data item 2. The hits on semantic map 60 from the third sequence form a third semantic vector SV3. The fourth sequence is the second group of sixteen distinctive words from data item 2. The hits on semantic map 60 from the fourth sequence form a fourth semantic vector SV4.
  • A plurality of semantic vectors SV1-T are formed from each defined sequence of distinctive words from data items 1-9, where T is an integer of the number of defined sequences. The semantic vectors SV1-T are used to train concept map 70. The semantic vectors SV1-T are then transferred onto concept map 70. In general, the semantic vectors are arranged on the concept map 70 so that like vectors are grouped together. The semantic vectors SV1-T are each assigned Cartesian coordinates on concept map 70 so that like vectors are grouped together into a cluster.
  • The placement of semantic vectors SV1-T will form a plurality of clusters on the concept map 70. Further detail of concept map 70 is shown in FIG. 7. A first cluster 72 contains like semantic vectors; a second cluster 74 contains like semantic vectors; a third cluster 76 contains like semantic vectors. The semantic vectors in cluster 72 are dissimilar to the semantic vectors in cluster 74 and cluster 76; the semantic vectors in cluster 74 are dissimilar to the semantic vectors in cluster 72 and cluster 76; the semantic vectors in cluster 76 are dissimilar to the semantic vectors in cluster 72 and cluster 74.
  • For example, cluster 72 is made of semantic vectors like 71 and 73 derived from sequences of distinctive words associated with “suspicious behavior”. Cluster 74 is made of semantic vectors like 75 and 77 derived from sequences of distinctive words associated with “acquiring weapon”. Cluster 76 is made of semantic vectors like 79 and 81 derived from sequences of distinctive words associated with “practicing with weapon”.
  • The semantic vectors link back to the data items used to generate the sequence of distinctive words. Thus, by selecting any semantic vector, the research tool 10 can display the fundamental context of the text fragment associated with the sequence of distinctive words used to form that cluster.
  • In actual practice, a plurality of concept maps like 70 are formed from many different data items and many different data sources. The plurality of concept maps are used together to gain a larger picture of the knowledge contained within the data items from many data sources. The concept maps may have enhanced graphics such as colors, patterns, shapes, and forms to aid in the visual representations.
  • Once trained, concept maps 70 can be read by analysts having subject matter expertise in the domain of interest to visually search for patterns of recognition and knowledge within the maps. The analyst can point and click on various clusters and features in the concept maps and see the underlying basis for the formation of the clusters. The analyst learns to read the concept maps by recognizing the patterns of knowledge within the clusters. The analyst can look at the clusters and understand what information from the data items 1-9 each cluster refers to.
  • From the concept maps 70, the analyst can form the dialectic argument structure 80 in FIG. 5. The dialectic argument structure 80 is a series of individual dialectic arguments that together form hypothesis 110 as discussed below. The analyst may see that cluster 72 on the concept map 70 associates text fragments related to “suspicious behavior”. Cluster 74 on concept map 70 associates text fragments related to “acquiring weapon”. Cluster 76 on concept map 70 associates text fragments related to “practicing with weapon”.
  • As noted above, the elements of the cluster have links back to the original data items 1-9 from data sources 12-16. Given the links to the data items 1-9, the analyst can determine what text fragments for “suspicious” behavior” can be attributed to S1. Likewise, given links to the data items 1-9, the analyst can determine what text fragments for “acquiring weapon” can be attributed to S1, and what text fragments for “practicing with weapon” can be attributed to S1. Accordingly a supporting argument can be made that S1's behavior is suspicious, S1 is acquiring a weapon, S1 is practicing with weapon, and S1 is a troubled person.
  • The distance from the semantic vector (and accordingly the associated sequence of distinctive words and text fragments) to the center of each cluster can be calculated as a plausibility score, or degree of uncertainty or fuzziness of the text fragment. The plausibility score of the text fragments used to form semantic vector 71 for S1's suspicious behavior may be 0.51; the plausibility score of the text fragments used to form semantic vector 75 for S1 acquiring a weapon may be 0.35; the plausibility score of the text fragments used to form semantic vector 79 for S1 practicing with weapon may be 0.43; and the plausibility of the text fragments for S1 being a troubled person may be 0.76.
  • The plausibility score is a function of the distance from the center of the cluster to the semantic vector associated with the text fragment. The greater the distance from the center; the less the value. The less the distance from the center; the greater the value. The text fragment may also contradict the premise. For example, there may be no support for S1 having any direct terrorist link, e.g., the text fragment used to form a semantic vector may indicate that S1 has no passport, which is atypical of most terrorists. The average of the semantic vectors, both supporting and non-supporting, are used to form the dialectic argument.
  • The analyst would be aware of common threads and indicia that may lead to the premise of the dialectic argument. An analysis of off-shore terrorist attacks on US interests have shown there is a general pattern of development. First, the would-be terrorist has experienced social trauma that predisposes him to violent or suspicious behavior and a desire for retribution. Second, there is a distinct acquisition and practice phase leading up to an attack. Third, there are links to known terrorists who provide encouragement and support. Given the above warrant, the analyst may form a first dialectic argument that S1 is a terrorist with a plausibility score of say 0.52.
  • It is important to note that most, if not all, information derived from concept maps 70, both supporting and rebutting text fragments, are used in compiling the plausibility score for the dialectic argument. The relative weight of each text fragment is a function of its plausibility score. The plausibility scores can be viewed as the fuzziness of the text fragment, i.e., the strength or degree of certainty of the statement in supporting or rebutting the claim for the dialectic argument. Even though some text fragments may be farther from the center of the cluster, the semantic vector associated with the distant text fragment will be given is respective plausibility score or fuzziness factor which will be taken into account in the premise of the claim.
  • Using a similar process from concept maps 70, the analyst can form a second dialectic argument that S1 is planning an attack. The supporting semantic vectors may be that S1 is a terrorist, S1 has a plan, and S1 has broken the law. A rebuttal text fragment may be that S1 has passed lie detector tests. The supporting and rebutting text fragments are derived from the clusters of the concept map as read by the analyst. Each supporting semantic vector will have a plausibility score, which in combination define the plausibility of the claim associated with the second dialectic argument.
  • The warrant relied upon by the analyst may be that an analysis of successful attacks on federal buildings has shown that considerable effort is expended into planning. During the planning phase, the terrorist leaves an event trail that gives away his or her intentions. The events range from informants giving information to police departments, minor traffic infractions, to suspicious activities reported by the public. The final phase of planning can be identified when there is a surge in communication between the terrorist and his off-shore support network. The plausibility scores for the supporting and rebutting text fragments are combined into the strength of the dialectic argument that S1 is planning an attack.
  • Again, using concept maps 70, the analyst can form a third dialectic argument that S1 is a serial killer. The supporting semantic vectors may be that S1 has a motive, S1 murdered someone, and S1 has broken the law. A rebuttal text fragment may be that S1 could not be placed at the scene of the crime. The supporting and rebutting text fragments are derived from the clusters of the concept map as read by the analyst. Each supporting semantic vector will have a plausibility score, which in combination define the plausibility of the claim associated with the third dialectic argument.
  • The warrant relied upon by the analyst may be that serial killers have a distinct modus operandi (MO) and signature that align similar events and provide key concepts for finding possible motives. The plausibility scores for the supporting and rebutting text fragments are combined into the strength of the dialectic argument that S1 is a serial killer.
  • A representation of the dialectic arguments is shown in FIG. 8 a. In one representation 80, based on the present example, dialectic argument 82 supports dialectic argument 84 which in turn builds to dialectic argument 86. The information is discovered by dialectic argument 82 that suggest S1 might be a terrorist. The first dialectic argument 82 causes a second dialectic argument 84 to look for planning information that validates S1 is a terrorist, i.e., that S1 is planning an attack. Finally, the third dialectic argument 96 finds the second dialectic argument and uses it as a motive-surrogate due to similarities between the crime MO and the terrorist plan. In another representation 90 from FIG. 8 b, dialectic arguments 92 and 94 together support dialectic argument 96.
  • The combination of dialectic arguments are used to form a hypothesis 110. Through hypothesis 110, the analyst can make specific and educated conclusions about S1, i.e., that the authorities should detain and integrate S1. In the above process, the fragmented and diverse data items 1-9 have been compiled and analyzed in a manner not before known to yield a desirable and useful result, a thorough investigation of S1 toward resolution or prevention of the crimes.
  • The process of researching and interpreting textual data is shown in FIG. 9. In step 120, textual data is converted into first numeric representations. The textual data is first reduced to a plurality of distinctive words. The plurality of distinctive words are selected based on frequency of usage within the textual data. In step 122, a first self-organizing map is formed using the first numeric representations. The first numeric representations of the textual data are organized by similarities. The first numeric representations include a plurality of vectors of random numbers. The vectors are trained onto the first self-organizing map. In step 124, a second self-organizing map is formed from second numeric representations generated from the organization of the first self-organizing map. The second numeric representations are organized into clusters of similarities on the second self-organizing map. A plurality of vectors from the first self-organizing map are used to train the second self-organizing map. In step 126, dialectic arguments are formed from the second self-organizing map to interpret the textual data.
  • In general, the concept map is developed from a set of training documents provided by a subject matter expert (SME). By tuning the distinctive word selection process, the concept map is focused on specific concepts for which the SME then provides explanations. In this manner, the SME's knowledge is captured without any apriori structuring of the information such as taxonomies that are popular for organizing unstructured information.
  • Organizing information using concept maps enables the knowledge of SMEs to be remembered and shared. It also provides a basis for organizing all new information of the same type that is developed after the concept map is built. By reusing the process used to first organize the training documents, any new information that belongs to the same domain of knowledge can be mapped into the concept maps, thereby extending the scope of information and knowledge found in that concept map. Over time the scope of the concept map grows and has to be regenerated, using the previous concept map as a starting point for training. In this manner, the concept map tracks the development of new knowledge and may spawn new concept maps to form a tree of concept maps to capture all the knowledge.
  • As users surf the concept maps, they will use the dialectic arguments to find plausible new links between specific pieces of information mapped into the concept maps. The dialectic argument is used to capture a SME's belief as to how bits of information support, or rebut, an idea. The SME does this by selecting clusters from a variety of concept maps, where each cluster's conceptual idea provides one of the support or rebuttal ideas central to the dialectic argument. The function of the dialectic argument is then to monitor those specific clusters to find relevant pieces of information that instantiate the dialectic argument.
  • The purpose of the dialectic argument is to provide the SME with a means to join the dots based upon the SME's idea as to how the dots might be joined. The concept map clusters are used to group information that is conceptually relevant for the dialectic argument. All the dialectic argument has to do is to select pieces from relevant clusters that are linked by one or more common entities, for example, find information from the required concept map clusters that talk about the same person, place, or thing.
  • Once the SME has developed a dialectic argument, that dialectic argument will spawn one or more agents to find relevant information with each agent homing in on different opportunities. For example, one agent could use a dialectic argument connecting information about water supply to home in on London and New York. Another agent may use the same dialectic argument to home in on water supply information about Berthoud, but fail to ever converge. Convergence is achieved when the plausibility of the dialectic argument reaches a satisfactory threshold. At that point all successful instantiations of the dialectic argument are listed for the SME to review. Each instantiation is a hypothesis that the SME must evaluate for credibility and if credible further analysis. Such instantiations of the dialectic argument provide the analyst with disparate pieces of information that would not otherwise have been connected, other than by serendipity. It is this serendipity that aids the analyst in thinking outside the box by providing original connections.
  • To help the SME assess credibility, the dialectic argument homes in on information that both supports and rebuts the dialectic argument's claim. When designing a dialectic argument the SME must identify both types of information much like debaters argue for and against a claim. In fact, a single dialectic argument might be considered a template for a mini-debate, where realizations of concepts are drawn form the concept map in place of the debater's memory.
  • The dialectic argument functions like a template in that support or rebuttal information is not selected based upon a key word, but is selected because it fits a key concept. The fit can be fuzzy, meaning it does not have to be an exact fit. Fuzziness allows the dialectic argument to look across a broad expanse of information to join dots that might otherwise be missed. But to ensure the dialectic argument instance does not simply collect nonsense, each selected piece of support and rebuttal information must address a common entity such as a person, place, or thing. In this manner, the fuzziness is productive even though non-specific, meaning that fuzziness is not predefined through rules, as is often the case in fuzzy queries or fuzzy searches.
  • To assess the plausibility of the instantiated dialectic argument, fuzziness is measured by assessing how well each piece of selected information fits the conceptual cluster from which it is drawn. The measure is achieved by measuring how close the selected piece of information is to the center of the cluster. With all such measurements made, the fuzziness of all the information that goes into a particular dialectic argument instantiation is rolled up into a plausibility measure, e.g. by using a root-sum-square indexing scheme.
  • Just as concept maps capture the knowledge of a SME for reuse, so do the dialectic arguments. Someone can develop a dialectic argument that looks for information within concept map clusters developed entirely by other experts. And just as concept maps share knowledge, so do dialectic arguments as people reuse someone else's dialectic argument. Furthermore, the claim of one dialectic argument can be used as one of the support, or rebuttal, arguments of another dialectic argument. In this manner dialectic arguments can be chained to form more sophisticated hypotheses. Note that the plausibility of a dialectic argument becomes the fuzziness measure when used as a support or rebuttal in another dialectic argument.
  • The integration of concept maps and dialectic arguments to form and instantiate hypotheses is central to the research tool as it provides a unique and original method of developing new ideas based upon what is known. In this manner, the concept map and dialectic argument combination is thought to capture a reasoning process, thereby providing a powerful means to connect the dots that is novel and unique. The integration of concept maps and dialectic arguments is what distinguishes the approach as knowledge management as opposed to data or information management.
  • The interpretation of the concept maps takes the form of dialectic arguments that search the maps to find information that supports and rebuts each argument's assertion. Assertions about suspect activities can lead with measured plausibilities. The process of finding and interpreting information found within the semantic map and measuring their plausibility is the dialectic search as described above. Together, the concept map, dialectic search and hybrid computing architecture provide new and significant capability for processing information. The dialectic search avoids the problems associated with classical information extraction and analysis that require the development of countless rules. Instead, it reuses the SME's knowledge and experience directly, via a dialectic argument. The dialectic argument is mechanized using Intelligent Software Agents (ISA) that augments the SME's reasoning ability. With the addition of genetic algorithms there is also the potential to adapt searches to track terrorists through their signature.
  • By instantiating an argument, the concept maps generate leads for the SME to follow. The arguments can also be linked to form a lattice of arguments, elaborating on the lead to generate a more complete description of the situation. The plausibility is computed using the fuzziness of each piece of an argument's support and rebuttal information, which is quantified using the proximity of the information to the semantic search center and the maps' fuzziness functions. Based on this fuzziness, plausibility measurements and confidence levels can be computed.
  • The dialectic argument structure does not depend on deductive or inductive logic, though these may be included as part of the warrant. Instead, the dialectic argument structure depends on non-analytic inferences to find new leads that fits the dialectic argument's warrant. The dialectic argument structure is dialectic because its reasoning is based upon what is plausible; the dialectic argument structure is an hypothesis fabricated from bits of information. The hypothesis is developed into a story that explains the information. The claim is then used to invoke one or more new dialectic argument structures that perform their searches. The developing lattice forms a story that renders the intelligence lead plausible and enables the plausibility to be measured.
  • As the lattice develops, the aggregate plausibility is computed using the fuzziness of the support and rebuttal information. Eventually, a dialectic argument structure lattice is formed that relates information with its computed plausibility. The computation uses joint information fuzziness to generate a robust measure of plausibility, a process that is not available using Bayesian methods.
  • The dialectic search requires development, meaning it must be seeded with knowledge from the SME. Once seeded, it has the potential of evolving the warrant to present new types of possible leads. Because the source information is encoded as a vector in the concept map, the source can be guarded but still used within the SOM. This is important where the source is compartmentalized information that can only be read by certain SMEs. If necessary, key information can be purged from the source before encoding without losing essential semantic information required to encode the concept map's semantic vector.
  • The guarded source information is used to support the dialectic search. Once the search has been completed and verified using the computed plausibility, the SME validates the lead's support and rebuttal information by referring back through the SOM's link to read the source information. If the source is guarded, the lead would be passed over to the SME from within that compartment.
  • The ISA can be used to implement the dialectic argument structure. The agency consists of three different agents, the coordinator, the dialectic argument structure, and the search, work together, each having its own learning objectives. The coordinator is taught to watch the concept map, responding to new hits that conform to patterns of known interest. When an interesting hit occurs, the coordinator selects one or more candidate dialectic argument structure agents, and then spawns search agents to find information relevant to each dialectic argument structure. As time proceeds, the coordinator learns which hit patterns are most likely to yield a promising lead, adapting to any changes in the concept map structure and sharing what it learns with other active coordinators.
  • The search agent takes the dialectic argument structure prototype search vectors and, through the SOM, finds information that is relevant and related. The search agent learns to adapt to different and changing source formats and would include parsing procedures required to extract detailed information.
  • The final agent, the dialectic argument structure, learns fuzzy patterns to evaluate information found by the search agent. Any information that does not quite fit is directed to a sandbox where peer agents can exercise a more aggressive routine to search for alternative hypotheses.
  • The principal activities addressed by the use of agents are to learn to adapt to changes in the surrounding environment, capture the knowledge of the SME for reuse, share information and learning between agent peers, hypothesize with on-the-job-training from the SME, and remember so as to avoid old mistakes and false leads.
  • While one or more embodiments of the present invention have been illustrated in detail, the skilled artisan will appreciate that modifications and adaptations to those embodiments may be made without departing from the scope of the present invention as set forth in the following claims.

Claims (27)

1. A computer implemented method of researching textual data sources, comprising:
converting textual data into first numeric representations;
forming a first self-organizing map using the first numeric representations, wherein the first numeric representations of the textual data are organized by similarities;
forming a second self-organizing map from second numeric representations generated from the organization of the first self-organizing map, wherein the second numeric representations are organized into clusters of similarities on the second self-organizing map; and
forming dialectic arguments from the second self-organizing map to interpret the textual data.
2. The computer implemented method of claim 1, wherein the textual data is reduced to a plurality of distinctive words.
3. The computer implemented method of claim 2, wherein the plurality of distinctive words are selected based on frequency of usage within the textual data.
4. The computer implemented method of claim 1, wherein the first numeric representations include a plurality of vectors.
5. The computer implemented method of claim 4, wherein the plurality of vectors include random numbers.
6. The computer implemented method of claim 4, wherein the plurality of vectors are trained onto the first self-organizing map.
7. The computer implemented method of claim 1, wherein a plurality of vectors are formed from the first self-organizing map.
8. The computer implemented method of claim 7, wherein the plurality of vectors from the first self-organizing map are used to train the second self-organizing map.
9. The computer implemented method of claim 8, wherein the plurality of vectors from the first self-organizing map are formed into the clusters on the second self-organizing map.
10. A method of interpreting textual data, comprising:
converting the textual data into first numeric representations;
forming a first self-organizing map using the first numeric representations;
forming a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map; and
forming dialectic arguments from the second self-organizing map to interpret the textual data.
11. The method of claim 10, wherein the textual data is reduced to a plurality of distinctive words.
12. The method of claim 11, wherein the plurality of distinctive words are selected based on frequency of usage within the textual data.
13. The method of claim 10, wherein the first numeric representations include a plurality of vectors.
14. The method of claim 13, wherein the plurality of vectors include random numbers.
15. The method of claim 13, wherein the plurality of vectors are trained onto the first self-organizing map.
16. The method of claim 10, wherein a plurality of vectors are formed from the first self-organizing map.
17. The method of claim 16, wherein the plurality of vectors from the first self-organizing map are used to train the second self-organizing map.
18. The method of claim 16, wherein the plurality of vectors from the first self-organizing map are formed into the clusters on the second self-organizing map.
19. A computer program product usable with a programmable computer processor having a computer readable program code embodied therein, comprising:
computer readable program code which converts the textual data into first numeric representations;
computer readable program code which forms a first self-organizing map using the first numeric representations;
computer readable program code which forms a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map; and
computer readable program code which forms dialectic arguments from the second self-organizing map to interpret the textual data.
20. The computer program product of claim 19, wherein the textual data is reduced to a plurality of distinctive words.
21. The computer program product of claim 20, wherein the plurality of distinctive words are selected based on frequency of usage within the textual data.
22. The computer program product of claim 19, wherein the first numeric representations include a plurality of vectors.
23. The computer program product of claim 22, wherein the plurality of vectors are trained onto the first self-organizing map.
24. The computer program product of claim 19, wherein a plurality of vectors are formed from the first self-organizing map.
25. The computer program product of claim 24, wherein the plurality of vectors from the first self-organizing map are used to train the second self-organizing map.
26. The computer program product of claim 24, wherein the plurality of vectors from the first self-organizing map are formed into the clusters on the second self-organizing map.
27. A computer system for interpreting textual data, comprising:
means for converting the textual data into first numeric representations;
means for forming a first self-organizing map using the first numeric representations;
means for forming a second self-organizing map from second numeric representations generated from the first self-organizing map, wherein the second numeric representations are organized into clusters on the second self-organizing map; and
means for forming dialectic arguments from the second self-organizing map to interpret the textual data.
US12/258,959 2004-05-10 2008-10-27 System and Method of Self-Learning Conceptual Mapping to Organize and Interpret Data Abandoned US20090049067A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/258,959 US20090049067A1 (en) 2004-05-10 2008-10-27 System and Method of Self-Learning Conceptual Mapping to Organize and Interpret Data

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US56997804P 2004-05-10 2004-05-10
US11/127,657 US7447665B2 (en) 2004-05-10 2005-05-10 System and method of self-learning conceptual mapping to organize and interpret data
US12/258,959 US20090049067A1 (en) 2004-05-10 2008-10-27 System and Method of Self-Learning Conceptual Mapping to Organize and Interpret Data

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/127,657 Continuation US7447665B2 (en) 2004-05-10 2005-05-10 System and method of self-learning conceptual mapping to organize and interpret data

Publications (1)

Publication Number Publication Date
US20090049067A1 true US20090049067A1 (en) 2009-02-19

Family

ID=35240509

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/127,657 Active 2026-08-29 US7447665B2 (en) 2004-05-10 2005-05-10 System and method of self-learning conceptual mapping to organize and interpret data
US12/258,959 Abandoned US20090049067A1 (en) 2004-05-10 2008-10-27 System and Method of Self-Learning Conceptual Mapping to Organize and Interpret Data

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/127,657 Active 2026-08-29 US7447665B2 (en) 2004-05-10 2005-05-10 System and method of self-learning conceptual mapping to organize and interpret data

Country Status (1)

Country Link
US (2) US7447665B2 (en)

Cited By (180)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011124A1 (en) * 2010-07-07 2012-01-12 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
WO2016018258A1 (en) * 2014-07-29 2016-02-04 Hewlett-Packard Development Company, L.P. Similarity in a structured dataset
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US20180366013A1 (en) * 2014-08-28 2018-12-20 Ideaphora India Private Limited System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
US20180373791A1 (en) * 2017-06-22 2018-12-27 Cerego, Llc. System and method for automatically generating concepts related to a target concept
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10394851B2 (en) 2014-08-07 2019-08-27 Cortical.Io Ag Methods and systems for mapping data items to sparse distributed representations
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10572221B2 (en) 2016-10-20 2020-02-25 Cortical.Io Ag Methods and systems for identifying a level of similarity between a plurality of data representations
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10684703B2 (en) 2018-06-01 2020-06-16 Apple Inc. Attention aware virtual assistant dismissal
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10789945B2 (en) 2017-05-12 2020-09-29 Apple Inc. Low-latency intelligent automated assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US10885089B2 (en) 2015-08-21 2021-01-05 Cortical.Io Ag Methods and systems for identifying a level of similarity between a filtering criterion and a data item within a set of streamed documents
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11010127B2 (en) 2015-06-29 2021-05-18 Apple Inc. Virtual assistant for media playback
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US11231904B2 (en) 2015-03-06 2022-01-25 Apple Inc. Reducing response latency of intelligent automated assistants
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11314370B2 (en) 2013-12-06 2022-04-26 Apple Inc. Method for extracting salient dialog usage from live data
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11495218B2 (en) 2018-06-01 2022-11-08 Apple Inc. Virtual assistant operation in multi-device environments
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11551567B2 (en) * 2014-08-28 2023-01-10 Ideaphora India Private Limited System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11734332B2 (en) 2020-11-19 2023-08-22 Cortical.Io Ag Methods and systems for reuse of data item fingerprints in generation of semantic maps
US11928604B2 (en) 2019-04-09 2024-03-12 Apple Inc. Method and apparatus for building an intelligent automated assistant

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6424969B1 (en) * 1999-07-20 2002-07-23 Inmentia, Inc. System and method for organizing data
US6944619B2 (en) * 2001-04-12 2005-09-13 Primentia, Inc. System and method for organizing data
KR100500329B1 (en) * 2001-10-18 2005-07-11 주식회사 핸디소프트 System and Method for Workflow Mining
US20040158561A1 (en) * 2003-02-04 2004-08-12 Gruenwald Bjorn J. System and method for translating languages using an intermediate content space
US20060036451A1 (en) 2004-08-10 2006-02-16 Lundberg Steven W Patent mapping
AU2006239734B2 (en) * 2005-04-27 2011-09-15 The University Of Queensland Automatic concept clustering
US20090327259A1 (en) * 2005-04-27 2009-12-31 The University Of Queensland Automatic concept clustering
WO2006128183A2 (en) 2005-05-27 2006-11-30 Schwegman, Lundberg, Woessner & Kluth, P.A. Method and apparatus for cross-referencing important ip relationships
US8161025B2 (en) * 2005-07-27 2012-04-17 Schwegman, Lundberg & Woessner, P.A. Patent mapping
US20150347600A1 (en) * 2005-10-21 2015-12-03 Joseph Akwo Tabe Broadband Centralized Transportation Communication Vehicle For Extracting Transportation Topics of Information and Monitoring Terrorist Data
US20100299364A1 (en) * 2006-10-20 2010-11-25 Peter Jeremy Baldwin Web application for debate maps
US7937349B2 (en) * 2006-11-09 2011-05-03 Pucher Max J Method for training a system to specifically react on a specific input
CN101226523B (en) * 2007-01-17 2012-09-05 国际商业机器公司 Method and system for analyzing data general condition
US7987484B2 (en) 2007-06-24 2011-07-26 Microsoft Corporation Managing media content with a self-organizing map
US7945525B2 (en) * 2007-11-09 2011-05-17 International Business Machines Corporation Methods for obtaining improved text similarity measures which replace similar characters with a string pattern representation by using a semantic data tree
US20100131513A1 (en) 2008-10-23 2010-05-27 Lundberg Steven W Patent mapping
US8458105B2 (en) 2009-02-12 2013-06-04 Decisive Analytics Corporation Method and apparatus for analyzing and interrelating data
US9183288B2 (en) * 2010-01-27 2015-11-10 Kinetx, Inc. System and method of structuring data for search using latent semantic analysis techniques
US8676565B2 (en) 2010-03-26 2014-03-18 Virtuoz Sa Semantic clustering and conversational agents
US9378202B2 (en) 2010-03-26 2016-06-28 Virtuoz Sa Semantic clustering
US8694304B2 (en) 2010-03-26 2014-04-08 Virtuoz Sa Semantic clustering and user interfaces
US9524291B2 (en) 2010-10-06 2016-12-20 Virtuoz Sa Visual display of semantic information
US9904726B2 (en) 2011-05-04 2018-02-27 Black Hills IP Holdings, LLC. Apparatus and method for automated and assisted patent claim mapping and expense planning
US20130085946A1 (en) 2011-10-03 2013-04-04 Steven W. Lundberg Systems, methods and user interfaces in a patent management system
US8972385B2 (en) 2011-10-03 2015-03-03 Black Hills Ip Holdings, Llc System and method for tracking patent ownership change
JP6019968B2 (en) * 2012-09-10 2016-11-02 株式会社リコー Report creation system, report creation apparatus and program
US9569467B1 (en) * 2012-12-05 2017-02-14 Level 2 News Innovation LLC Intelligent news management platform and social network
JP6020161B2 (en) * 2012-12-28 2016-11-02 富士通株式会社 Graph creation program, information processing apparatus, and graph creation method
US10699351B1 (en) * 2016-05-18 2020-06-30 Securus Technologies, Inc. Proactive investigation systems and methods for controlled-environment facilities
US11449744B2 (en) 2016-06-23 2022-09-20 Microsoft Technology Licensing, Llc End-to-end memory networks for contextual language understanding
US10366163B2 (en) * 2016-09-07 2019-07-30 Microsoft Technology Licensing, Llc Knowledge-guided structural attention processing
US10380162B2 (en) 2016-10-26 2019-08-13 Accenture Global Solutions Limited Item to vector based categorization
US11216900B1 (en) 2019-07-23 2022-01-04 Securus Technologies, Llc Investigation systems and methods employing positioning information from non-resident devices used for communications with controlled-environment facility residents

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6081750A (en) * 1991-12-23 2000-06-27 Hoffberg; Steven Mark Ergonomic man-machine interface incorporating adaptive pattern recognition based control system
US6115480A (en) * 1995-03-31 2000-09-05 Canon Kabushiki Kaisha Method and apparatus for processing visual information
US6278799B1 (en) * 1997-03-10 2001-08-21 Efrem H. Hoffman Hierarchical data matrix pattern recognition system
US6728707B1 (en) * 2000-08-11 2004-04-27 Attensity Corporation Relational text index creation and searching
US6732098B1 (en) * 2000-08-11 2004-05-04 Attensity Corporation Relational text index creation and searching
US6732097B1 (en) * 2000-08-11 2004-05-04 Attensity Corporation Relational text index creation and searching
US6738765B1 (en) * 2000-08-11 2004-05-18 Attensity Corporation Relational text index creation and searching
US6741988B1 (en) * 2000-08-11 2004-05-25 Attensity Corporation Relational text index creation and searching
US20040167887A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Integration of structured data with relational facts from free text for data mining

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3728707A (en) * 1971-05-10 1973-04-17 S Herrnreiter Automatic alarm setting system

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6081750A (en) * 1991-12-23 2000-06-27 Hoffberg; Steven Mark Ergonomic man-machine interface incorporating adaptive pattern recognition based control system
US6115480A (en) * 1995-03-31 2000-09-05 Canon Kabushiki Kaisha Method and apparatus for processing visual information
US6278799B1 (en) * 1997-03-10 2001-08-21 Efrem H. Hoffman Hierarchical data matrix pattern recognition system
US6728707B1 (en) * 2000-08-11 2004-04-27 Attensity Corporation Relational text index creation and searching
US6732098B1 (en) * 2000-08-11 2004-05-04 Attensity Corporation Relational text index creation and searching
US6732097B1 (en) * 2000-08-11 2004-05-04 Attensity Corporation Relational text index creation and searching
US6738765B1 (en) * 2000-08-11 2004-05-18 Attensity Corporation Relational text index creation and searching
US6741988B1 (en) * 2000-08-11 2004-05-25 Attensity Corporation Relational text index creation and searching
US20040167887A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Integration of structured data with relational facts from free text for data mining
US20040167884A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Methods and products for producing role related information from free text sources
US20040167886A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Production of role related information from free text sources utilizing thematic caseframes
US20040167870A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Systems and methods for providing a mixed data integration service
US20040167883A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Methods and systems for providing a service for producing structured data elements from free text sources
US20040167885A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Data products of processes of extracting role related information from free text sources
US20040167907A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Visualization of integrated structured data and extracted relational facts from free text
US20040167908A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Integration of structured data with free text for data mining
US20040167910A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Integrated data products of processes of integrating mixed format data
US20040167909A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Methods and products for integrating mixed format data
US20040167911A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Methods and products for integrating mixed format data including the extraction of relational facts from free text
US20040215634A1 (en) * 2002-12-06 2004-10-28 Attensity Corporation Methods and products for merging codes and notes into an integrated relational database

Cited By (263)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10741185B2 (en) 2010-01-18 2020-08-11 Apple Inc. Intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US10692504B2 (en) 2010-02-25 2020-06-23 Apple Inc. User profiling for voice input processing
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US20120011124A1 (en) * 2010-07-07 2012-01-12 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US8713021B2 (en) * 2010-07-07 2014-04-29 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10417405B2 (en) 2011-03-21 2019-09-17 Apple Inc. Device access using voice authentication
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11350253B2 (en) 2011-06-03 2022-05-31 Apple Inc. Active transport based notifications
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US11069336B2 (en) 2012-03-02 2021-07-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10714117B2 (en) 2013-02-07 2020-07-14 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US11048473B2 (en) 2013-06-09 2021-06-29 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10769385B2 (en) 2013-06-09 2020-09-08 Apple Inc. System and method for inferring user intent from speech inputs
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US11314370B2 (en) 2013-12-06 2022-04-26 Apple Inc. Method for extracting salient dialog usage from live data
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US10699717B2 (en) 2014-05-30 2020-06-30 Apple Inc. Intelligent assistant for home automation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10417344B2 (en) 2014-05-30 2019-09-17 Apple Inc. Exemplar-based natural language processing
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US10878809B2 (en) 2014-05-30 2020-12-29 Apple Inc. Multi-command single utterance input method
US10714095B2 (en) 2014-05-30 2020-07-14 Apple Inc. Intelligent assistant for home automation
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10657966B2 (en) 2014-05-30 2020-05-19 Apple Inc. Better resolution when referencing to concepts
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
WO2016018258A1 (en) * 2014-07-29 2016-02-04 Hewlett-Packard Development Company, L.P. Similarity in a structured dataset
US10394851B2 (en) 2014-08-07 2019-08-27 Cortical.Io Ag Methods and systems for mapping data items to sparse distributed representations
US20180366013A1 (en) * 2014-08-28 2018-12-20 Ideaphora India Private Limited System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
US11551567B2 (en) * 2014-08-28 2023-01-10 Ideaphora India Private Limited System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US10390213B2 (en) 2014-09-30 2019-08-20 Apple Inc. Social reminders
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10438595B2 (en) 2014-09-30 2019-10-08 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10453443B2 (en) 2014-09-30 2019-10-22 Apple Inc. Providing an indication of the suitability of speech recognition
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US11231904B2 (en) 2015-03-06 2022-01-25 Apple Inc. Reducing response latency of intelligent automated assistants
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10930282B2 (en) 2015-03-08 2021-02-23 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US10529332B2 (en) 2015-03-08 2020-01-07 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US11127397B2 (en) 2015-05-27 2021-09-21 Apple Inc. Device voice control
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10681212B2 (en) 2015-06-05 2020-06-09 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US11010127B2 (en) 2015-06-29 2021-05-18 Apple Inc. Virtual assistant for media playback
US10885089B2 (en) 2015-08-21 2021-01-05 Cortical.Io Ag Methods and systems for identifying a level of similarity between a filtering criterion and a data item within a set of streamed documents
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10354652B2 (en) 2015-12-02 2019-07-16 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10942703B2 (en) 2015-12-23 2021-03-09 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10942702B2 (en) 2016-06-11 2021-03-09 Apple Inc. Intelligent device arbitration and control
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10580409B2 (en) 2016-06-11 2020-03-03 Apple Inc. Application integration with a digital assistant
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10572221B2 (en) 2016-10-20 2020-02-25 Cortical.Io Ag Methods and systems for identifying a level of similarity between a plurality of data representations
US11714602B2 (en) 2016-10-20 2023-08-01 Cortical.Io Ag Methods and systems for identifying a level of similarity between a plurality of data representations
US11216248B2 (en) 2016-10-20 2022-01-04 Cortical.Io Ag Methods and systems for identifying a level of similarity between a plurality of data representations
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11656884B2 (en) 2017-01-09 2023-05-23 Apple Inc. Application integration with a digital assistant
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10741181B2 (en) 2017-05-09 2020-08-11 Apple Inc. User interface for correcting recognition errors
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10847142B2 (en) 2017-05-11 2020-11-24 Apple Inc. Maintaining privacy of personal information
US10789945B2 (en) 2017-05-12 2020-09-29 Apple Inc. Low-latency intelligent automated assistant
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US10909171B2 (en) 2017-05-16 2021-02-02 Apple Inc. Intelligent automated assistant for media exploration
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US11086920B2 (en) * 2017-06-22 2021-08-10 Cerego, Llc. System and method for automatically generating concepts related to a target concept
US11238085B2 (en) * 2017-06-22 2022-02-01 Cerego Japan Kabushiki Kaisha System and method for automatically generating concepts related to a target concept
US11347784B1 (en) * 2017-06-22 2022-05-31 Cerego Japan Kabushiki Kaisha System and method for automatically generating concepts related to a target concept
US20220245184A1 (en) * 2017-06-22 2022-08-04 Cerego Japan Kabushiki Kaisha System and method for automatically generating concepts related to a target concept
US20180373791A1 (en) * 2017-06-22 2018-12-27 Cerego, Llc. System and method for automatically generating concepts related to a target concept
US11487804B2 (en) * 2017-06-22 2022-11-01 Cerego Japan Kabushiki Kaisha System and method for automatically generating concepts related to a target concept
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US10720160B2 (en) 2018-06-01 2020-07-21 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10684703B2 (en) 2018-06-01 2020-06-16 Apple Inc. Attention aware virtual assistant dismissal
US11495218B2 (en) 2018-06-01 2022-11-08 Apple Inc. Virtual assistant operation in multi-device environments
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US10944859B2 (en) 2018-06-03 2021-03-09 Apple Inc. Accelerated task performance
US10504518B1 (en) 2018-06-03 2019-12-10 Apple Inc. Accelerated task performance
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11928604B2 (en) 2019-04-09 2024-03-12 Apple Inc. Method and apparatus for building an intelligent automated assistant
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11360739B2 (en) 2019-05-31 2022-06-14 Apple Inc. User activity shortcut suggestions
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11734332B2 (en) 2020-11-19 2023-08-22 Cortical.Io Ag Methods and systems for reuse of data item fingerprints in generation of semantic maps

Also Published As

Publication number Publication date
US20050251383A1 (en) 2005-11-10
US7447665B2 (en) 2008-11-04

Similar Documents

Publication Publication Date Title
US7447665B2 (en) System and method of self-learning conceptual mapping to organize and interpret data
Clark Intelligence analysis: a target-centric approach
Lim Big data and strategic intelligence
Marcellino et al. Monitoring social media
Lande et al. OSINT as a part of cyber defense system
Lu et al. Managing uncertainty in crisis
Kaplan OR forum—intelligence operations research: the 2010 Philip McCord Morse Lecture
Mannes et al. A computationally-enabled analysis of Lashkar-e-Taiba attacks in Jammu and Kashmir
Bossé et al. Exploitation of a priori knowledge for information fusion
Singh et al. Modeling threats
Prunckun Methods of inquiry for intelligence analysis
Hung et al. INSiGHT: A system for detecting radicalization trajectories in large heterogeneous graphs
Kris The NSA’s new sigint annex
US20120036098A1 (en) Analyzing activities of a hostile force
Zhao et al. Study on military equipment knowledge construction based on knowledge graph
Nieves et al. Finding patterns of terrorist groups in Iraq: A knowledge discovery analysis
Mohanty A computational approach to identify covertness and collusion in social networks
Eilstrup-Sangiovanni Competition and Strategic Differentiation among Transnational Advocacy Groups
Badalič The metadata-driven killing apparatus: big data analytics, the target selection process, and the threat to international humanitarian law
Rubel et al. Democratic Obligations and Technological Threats to Legitimacy: PredPol, Cambridge Analytica, and Internet Research Agency
McQuade World Histories of Big Data Policing: The Imperial Epistemology of the Police-Wars of US Hegemony
Hamdan Framing Islamophobia and Civil Liberties: American Political Discourse Post 9/11
Memon et al. Investigative data mining toolkit: a software prototype for visualizing, analyzing and destabilizing terrorist networks
Rogova et al. Multi-Agent System for Threat Assessment and Action Selection under Uncertainty and Ambiguity
Gberinyer et al. An Evaluation of Relevance of Criminal Intelligence Management and Implications for Security and Public Safety in Benue State, Nigeria

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION