US20230316097A1 - Collaborative sensemaking system and method - Google Patents
Collaborative sensemaking system and method Download PDFInfo
- Publication number
- US20230316097A1 US20230316097A1 US18/206,160 US202318206160A US2023316097A1 US 20230316097 A1 US20230316097 A1 US 20230316097A1 US 202318206160 A US202318206160 A US 202318206160A US 2023316097 A1 US2023316097 A1 US 2023316097A1
- Authority
- US
- United States
- Prior art keywords
- entities
- sensemaking
- docset
- collaborative
- users
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 239000012634 fragment Substances 0.000 claims description 56
- 230000008569 process Effects 0.000 claims description 52
- 238000012552 review Methods 0.000 claims description 15
- 230000000007 visual effect Effects 0.000 claims description 9
- 230000003213 activating effect Effects 0.000 claims 2
- 238000004458 analytical method Methods 0.000 abstract description 15
- 238000013459 approach Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 10
- 238000012800 visualization Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 230000009471 action Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 5
- 238000001914 filtration Methods 0.000 description 5
- 241000282412 Homo Species 0.000 description 4
- 210000004884 grey matter Anatomy 0.000 description 4
- 230000008450 motivation Effects 0.000 description 4
- 230000002860 competitive effect Effects 0.000 description 3
- 238000012517 data analytics Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000012447 hatching Effects 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000010899 nucleation Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010079 rubber tapping Methods 0.000 description 2
- 241000282341 Mustela putorius furo Species 0.000 description 1
- 238000003070 Statistical process control Methods 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000002431 foraging effect Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/169—Annotation, e.g. comment data or footnotes
Definitions
- This invention relates to the field of data analysis and sensmaking. More specifically, the invention comprises a system and method for allowing a wide variety of smart machines and virtually any size group of humans to interact collaboratively in seeking to discern hidden meaning and make sense of information spread across virtually any size collection and variety of sources and file types.
- Big Data is generally defined as high volume, high velocity, high variety data, often from many disparate sources, usually made up of largely unstructured and semi-structured data, and characterized by different levels of veracity or trustworthiness.
- Smart machines e.g., platforms that provide various forms of text analytics, visual analytics, image pattern matching, and machine learning capabilities. This deficiency is expected to persist for several years and is generally seen as a major bottleneck to maximizing return on investment (ROl) for machine analytics.
- ROI return on investment
- Smart machines lack an integrated social component for obtaining the benefits of sharing diverse perspectives (i.e., the wisdom of the crowd) on the accumulating store of largely unstructured data distributed across the wider organizational ecosystem.
- Machines perform many other types of analytical tasks at orders of magnitude, faster and more accurately than humans. Examples include finding people, places, and things in Big Data and finding connections between the identified people, places, and things. For example, when analysts want to uncover the range of operating terrorist networks within an area, they feed data on events involving improvised explosive devices (IEDs) into a machine-based analysis system. The machine combines geospatial data on where the IEDs exploded with the serial numbers, fingerprints, and DNA evidence found on recovered shards of IEDs to establish a range of operation for a terrorist cell.
- IEDs improvised explosive devices
- An ideal solution to this problem would shift the focus of smart machines from delivering insights to serving as a differently-abled teammate.
- the machine as teammate collaborates with analysts and decision makers to support human sensemaking and improve the performance of the team in accomplishing mission critical goals.
- an ideal solution would allow many humans to simultaneously work on the problem in order to bring diverse perspectives and the wisdom of the crowd into play.
- the present invention provides such a solution by making high speed, high velocity, high variety machine-scale analytics compatible with deliberative, collaborative, human-scale sensemaking practices.
- the present invention comprises a system and method allowing various smart machines and virtually any number of people to work together in a collaborative fashion in the analysis of virtually any size corpus and variety of file types (i.e., Big Data and smaller data sets too).
- the invention the HyLighter Collaborative Sensemaking System (CSS), is the first-in-class instantiation of a Big Data social sensemaking platform. Sensemaking involves the continued revision and expansion of a developing “story” so that it becomes more comprehensible and more defensible with respect to contrary arguments.
- HyLighter CSS includes the HyLighter sensemaking system as the front-end component for a variety of machines that are integrated to support multimedia, multilingual, machine analytics and collaborative sensemaking.
- the system and its various configurations are constructed from six main building blocks or capabilities. These include the following:
- HyLighter CSS is a flexible system that can function using various combinations of building blocks or components for a wide variety of use cases and scenarios. These include using HyLighter CSS as a standalone application or in combination with other components (e.g., a text analytics platform) that add additional capabilities to the system. Different contexts have different requirements (e.g., work with very large or relatively small collections, text only or multimedia, one language or multiple languages, single individuals to groups of any size). When combined with a text analytics platform, the machine participates in the collaborative sensemaking process as a “differently-abled” teammate.
- the system represents machine intelligence combined with the thinking of any size group across any size collection of any type of file (text, images, audio, video) as a layer of color-coded highlighting and related meta-content.
- This layer of social annotation includes machine- and human-generated comments/discussion threads, tags, and links between related fragments within and between documents.
- the system has provisions for displaying the disaggregated hylights in sortable, filterable, searchable, customizable displays (e.g., reports, mashups, node graphs) with links back to the exact locations of highlighted fragments in high-fidelity versions of the native files.
- Analysts review the machine hylights, add additional meta-content across the collection, view the accumulating social annotation layer in reports, mashups, and various visualizations, feed meta-content from the annotation layer back to the machine to improve machine performance, and publish reports that include links back to hylights that provide evidence for conclusions expressed in the publication.
- a large manufacturer wants to run competitive business intelligence using open source content and subscription sites from around the world.
- the user specifies topics of interest (e.g., what is going on in the target market in India, China, and other countries), and the machine automatically pulls in documents and web-based material on these topics.
- the machine runs analytics including entity extraction and resolution on the data at Big Data scale.
- HyLighter converts the machine analytics into social annotation, displaying the machine intelligence as highlighted entities and associated comments and tags across the collection. Entities appear highlighted in gray with related meta-content displayed in the margin.
- each entity may include a category tag (e.g., person, place, or date), a list of all entities in the corpus that are the same as the current entity (e.g., John Smith, Smitty, and Mr. Smith), a sentiment tag (e.g., positive or negative), a salience tag to indicate importance within the source document (e.g., a number on a scale from 1-100), and a list entities related to the selected entity.
- a category tag e.g., person, place, or date
- a list of all entities in the corpus that are the same as the current entity e.g., John Smith, Smitty, and Mr. Smith
- a sentiment tag e.g., positive or negative
- a salience tag to indicate importance within the source document (e.g., a number on a scale from 1-100)
- a HyLighter “mashup” displays hylights extracted from multiple documents in the margin of a single HyLighter window.
- a primary feature of the mashup is that users navigate from hylight to hylight as if they were navigating through a single document. By clicking on a hylight, the user instantly arrives at the original fragment in its exact location in the source document.
- the capability to quickly view related hylights in context helps users to better understand intended meaning, evaluate veracity, and make associations between related pieces of information distributed across a large collection.
- a human operator (a team leader or “curator”) establishes data collection rules for building and managing a corpus of documents or files for analysis.
- a text analytics platform ingests and processes the corpus.
- the curator and/or assigned others use various approaches to select a subset of promising files from the corpus.
- HyLighter CSS then ingests the subset and displays the machine analytics across the subset as machine hylights (e.g., the system highlights entities identified by the machine in gray and displays related meta-content in the margin as comments and tags).
- the curator may add additional files to the subset by merging in files from previously annotated subsets (also, referred to as collections).
- the curator has the option to pass an entire corpus of any size from the machine analytics platform to HyLighter CSS.
- the machine processed subset or collection is then subjected to the “HYLIGHTER” social sensemaking process, wherein multiple users independently select and annotate fragments of documents within the collection that are relevant to each user (based on his or her own field of endeavor, personal experience, objectives, etc.). This includes reviewing machine hylights across the collection for accuracy and relevance and highlighting over machine hylights to add context to the machines selections.
- the “HYLIGHTER” process enables user to access a Report and Mashup Builder and a variety of visualization tools that includes provisions for filtering hylights from any number of documents or document collections and displaying the results in various displays and formats.
- each hylight in a display has a link or tether back to its exact location in its source. No matter where a hylight travels (e.g., to a tabular report, mashup, color-coded node graph, or Excel spreadsheet), the hylight maintains its provenance or place of origin. Instant access to the context of a fragment and related meta-content enables users to better understand the intended meaning. Further, by knowing the lineage (where did this information originate from) and the pedigree (how reliable is the source of this information), users are able to efficiently evaluate the veracity of the content.
- a final “communication” may be created, with the communication ideally telling the “story” that is emerging from the sensemaking process.
- This communication includes links back to the HyLighter social annotation layer, so that the report remains an active document.
- meta-content assembled in various displays may be packaged and fed back to machine analytics platform that started the whole process.
- the metadata is configured to improve the machine analytics process so that the machine “learns” from the input provided by the human analysts. The process thus described is often iterative, with multiple cycles of machine analytics being run and the “story” continuing to evolve over time.
- FIG. 1 is a schematic view, showing the general operation of the present invention.
- FIG. 2 is a schematic view, showing more details of the operation of the present invention.
- FIG. 3 is a schematic view, showing details of how human users interact with the present invention.
- FIG. 4 is a screen shot, showing a report used to edit the machine-generated results might appear in an exemplary embodiment of the invention.
- FIG. 5 is a screen shot, showing how the editing functions might be presented.
- FIG. 6 is a screen shot, showing how the editing functions might be presented.
- FIG. 7 is a schematic view, depicting the optional incorporation of libraries of previously annotated documents.
- FIG. 8 is a screen shot, showing how the interface used to facilitate human annotation of the results might appear.
- FIG. 9 is a screen shot, showing a representative visualization tool used to view the data.
- FIG. 10 is a screen shot, showing a second visualization tool (a node graph) used to view the data.
- FIG. 1 depicts a very broad overview of the functions carried out in the present inventive process.
- the first step 10 involves the definition of the body of data to which the process is applied and rules for managing the collection. This will generally be a large body of data; however, the process can apply to small collections too. In any case, some definition of scope is required such as all data produced by recognized media sources over a window of time.
- Machine analytics step 20 involves the application of text analytics to the data corpus.
- the present inventive process can integrate or plugin with a variety of text analytics platforms. Products in this field take a variety of approaches to the text analytics process and provide different sets of capabilities and results. The following includes a broad, though not all inclusive, list of information processing performed by text analytics platforms.
- the text analytics platform ingests files from the corpus, generally, as ASCII text and processes the text, as follows:
- the application of the inventive process is not limited to situations where a working hypothesis or hypotheses exist.
- the invention is intended to assist in finding “unknown unknowns.”
- the machine analytics may search for patterns in the data corpus and extract documents relating to those patterns, even when no “working assumption” has been provided by a human operator.
- unknown unknowns are entities that are not in a subset of files exported from a larger collection and are related at some criterion-level of strength to entities that are in the subset.
- HYLIGHTER The process described in the prior patents is referred to as “HYLIGHTER.”
- a user selects fragments of texts or images that he or she believes to be important. Other users do the same. Text selected by the current user is highlighted in a first color (such as yellow). Text selected by other users is highlighted in a second color (such as blue). Text selected by both the current user and other users is highlighted in a third color (such as green), indicating consensus. As more users highlight a particular element, its highlighting color grows darker (such as darker and darker shades of green indicating increasing consensus). The users are preferably allowed to add annotation notes as well.
- the results of the social engine analysis are used to generate visualizations in step 40 .
- the process next considers whether it will be iterative in step 50 . If the process is to be repeated, then meta-content is fed back to the machine analysis via step 70 . The meta-content is used to refine the machine analysis. Additions and amendments made by human operators are fed back into the machines so that their functionality is improved.
- HyLighter CSS when integrated with smart machines borrows a key concept of statistical process control and continuous improvement: Establish a feedback loop to allow a process to monitor itself, learn from its miscues, and adjust as necessary.
- HyLighter CSS adds a social sensemaking component to machine analytics that makes high speed, high velocity, high variety machine-scale analytics compatible with deliberative, collaborative, human-scale sensemaking practices.
- the system establishes a two-way feedback loop between users and smart machines that enables the co-active emergence and continuous improvement of machine intelligence and human sensemaking performance.
- a communication is produced in step 60 .
- This may assume many forms, with one example being a large text document with embedded links.
- the text document is the result of the “sensemaking” process. It is intended to tell a story that a ready may assimilate and understand.
- the end product includes source links back to the set of documents that were extracted from the data corpus. Thus, if a reader wishes to see the source for a particular portion of the text document, he or she clicks on the link and the source document will then be presented.
- FIG. 2 provides an expanded depiction of the elements shown in FIG. 1 .
- FIG. 3 depicts the interaction of humans within the process.
- the definition of the data corpus is generally done by a team leader, though it may be the result of a group decision. It will also be common for a particular problem or topic of interest to be defined at this point.
- the topic of interest may be the identification of finding sources for a new commercial development in the Middle East, and the corpus may be conventional media and social media sources.
- the process may sometimes be implemented without a topic or working hypothesis.
- observer theory operators may be applied to the data corpus to detect patterns in the data without actually knowing what problem the process is attempting to solve. It will be more common, however, to have at least a defined topic.
- the topic may be defined externally to the process. An agency head may simply direct that a team using the process investigate a particular topic. The team leader would likely still need to select an appropriate data corpus.
- the data corpus can include virtually any data source. Portions may be extracted from the Web using Internet search engines and clustering engines. Portions may come from classified data collected by intelligence agencies.
- Machine analysis 20 may be subdivided into two broad categories—text processing and image processing.
- the text processing engine begins by collecting references (person names, geospatial locations, business names, etc.). Adjacent terms are linked as phrases. If the name of a person is being sought, when the analytics see “Doe” next to “John,” the software links this into a “phrase” defining the name of a person—“John Doe.”
- the analytics recognize that a person's name may appear in different ways: “Jon Doe,” “Doe, Jon,” “Jonathan Doe.”
- the software attempts to create a standard representation of repeating elements.
- the analytics also seek to discern relationships between identified entities and the strength of those relationships. Weaker relationships are filtered out while stronger ones are strengthened.
- the machine analytics create a resource descriptor framework (“RDF”).
- RDF resource descriptor framework
- This framework attempts to organize elements as subject-object-predicate (an “RDF triple”). For example, if it detects elements “A” and “B” it may relate them as “A increases B” or “A interferes with B.”
- Tags may include geographical references (such as latitude and longitude). Tags may also include a “time stamp” indicating when the data was acquired, when it was assimilated, etc. At all times the source information is maintained so that a human operator can link back to the source material.
- the image processing is of course designed to operate on image files (PDF, TIFF, HTML bit maps, images within word processing documents, etc.).
- the machine analytics segment the image in order to identify objects within the image (such as geospatial landmarks).
- the software then creates a searchable tag of what is recognized in the image, including various levels of recognition (“human face,” “human face: male/Caucasian,” “human face: John Doe”).
- the image is then labeled with a matching tag.
- the label is passed to the text processing module which then applies normal text processing.
- the text and image processing modules select a subset of the data corpus for further review.
- the automated processing identifies entities, rolled-up concepts, relationships, and categories.
- a curator 46 takes these results and may refine the selection process in order to create a desired final subset of the data corpus (a “subset”) in step 22 .
- the curator is preferably provided with a computer-based editing tool presented in the form of a graphical user interface.
- the curator selects a subset of the most promising files from the results of the machine analytics.
- the curator is preferably given three options for importing the files into the continued process, depending on the level of fidelity required. These options are:
- the inventive method transforms the machine-generated results into four types of meta-content.
- the meta content is referred to as “gray matter.” It appears as follows:
- a curator 46 or user with administrative privileges 48 performs a preliminary review of the gray matter and performs a “cleansing” function.
- the user creates a report by selecting columns from a menu of available data types.
- FIG. 4 shows a representative report 58 opened in a computer interface window 56 .
- the curator has created a table by selecting column headers 62 .
- the curator is preferably able to arrange the columns in a desired order.
- the curator has the option to search, sort, and filter the report.
- One of the curator's tasks is to review the gray highlights in the doscset and consider whether their designation as a highlight is appropriate.
- the curator may select a particular gray highlight and pull up the source material to review the entity in the context of the file from which it was taken.
- the curator then deletes or revises the gray matter as necessary within the report.
- the curator's actions are saved so that they may be used to refine the application of the machine analytics in the future. The changes are used to update the subset.
- FIGS. 5 and 6 show a representative graphical user interface that might be provided to the curator.
- the graphical window has been divided into three portions.
- the portion on the right shows the currently active fragment 66 .
- Machine selections 64 are highlighted in gray (In the actual interface, the highlight would appear as a gray shadow. According to accepted patent drawing standards, this gray shadow is shown in the views as broken horizontal hatching).
- the sub-window in the upper left show tree representation 68 . This may be used to show the structure of the data groups pertaining to the fragment in the right-hand window.
- FIG. 6 shows an editing window that the curator may open using a pull-down menu. Editing functions 74 are provided so that the user can correct, alter, or delete the machine highlights.
- step 24 in FIGS. 2 and 3 The actions may be taken by a single curator or by a curator and one or more users having administrative privileges 48 .
- the result is a completed subset on which the remaining steps of the inventive method will operate.
- library decision point 26 in FIG. 2 . It is possible to bypass the historical data and proceed directly to the social annotation engine. However, as the inventive method stores the results of the sensemaking activities in continually improving social libraries, it will often be desirable to access these libraries.
- FIG. 7 shows how cleansed subset 76 (emerging from steps 22 and 24 ) may be merged with the prior subset libraries.
- the curator is given the option to add files to a subset from the available libraries of annotated documents. Users may juxtapose fragments and comments that come from different subsets, libraries, agencies, organizations, and disciplines. This mechanism supports creative thinking and innovation by increasing the potential of teams to discover knowledge and skills in one context that have value in a different context.
- the subset is then fed into the social annotation engine (step 30 in FIGS. 2 and 3 ).
- Multiple users 52 take the files in the subset, review the, and add highlights, remarks, and links.
- the highlighting process is very significant in the present invention.
- a user reviews each file and applies his or her unique perspective in determining which portions are significant.
- the user employs software to “highlight” the important portions.
- a “highlight” is an emphasizing color that is added to a portion of the file in a manner that is analogous to mechanical highlighting using a fluorescent pen on a paper document.
- the highlighting functions preferably include the following features:
- FIG. 8 shows a screenshot depicting the application of the process thus described.
- the depiction is of the unified results of several users.
- Current user highlight 82 reflects a selection made by the user that was not selected by anyone else (Standard hatching patterns are again used to represent the colors that actually appear in the user interface).
- Other user highlight 86 represents a selection that was made by another user but not by the current user.
- Consensus highlight 88 represents a selection that was made by the current user and at least one other user.
- Current selection 84 represents the current entity selected.
- the annotation window on the left corresponds to the current fragment selected.
- Each user may also establish links, which may be graphically displayed in the annotations section or elsewhere.
- the linking functions are preferably based on a URL for each document or fragment.
- the linking function is more than simply a tie between two elements.
- a user may also add the reason for the link. As an example, if a user links “B” to “A” the motivation may be the fact that B provides evidence for A.
- FIG. 9 shows one example (exemplary report 92 ).
- the columns reflect the document identifier, the title, whether there are new changes pertaining to the document, any images associated with the document, a listing of the selected fragment itself, and comments added by a user.
- a reader of the report can select the fragment itself in order to link back to the source document. This allows a reader to view the fragment in context.
- a reader may also add a link. Note, for instance, the link added in the right hand column by “User 660 .” The ability to add the link saves the user from having to recreate the underlying rational and instead link the interested reader (preferably via URL) to another fragment elsewhere.
- the report preferably includes provisions for searching, sorting, and filtering meta-content that is split out of the subset.
- the report is continually updated with real-time data (as users continue to contribute new meta-content to the annotation layer). It is important to keep in mind that the inventive method is not a static process that is run once with a final result. Rather, all the steps and reports will be continually updated as multiple users provide new input in the evolving situation.
- Users are preferably able to run various analytics (such as identifying superior performers by generating metrics on the number of links across a subset by user name). Through the reporting functions, each user has access to a picture of the integrated “story” as it emerges from the collaborative process.
- the report works in concert with a type of “mash-up” feature for efficiently navigating to related fragments and comments that are spread across multiple documents and file types. For example, the user may filter all the fragments in a report related to a specific entity.
- the mash-up feature allows the user to navigate from fragment to fragment using only a next button or an arrow key. The rapid navigation allows users to more quickly perceive relationships.
- step 42 shows the next step in the process (viewing visualizations).
- the inventive method preferably includes a visual analytics platform that is capable of representing the annotation layer for selected documents in the form of an RDF graph or a data graph.
- An RDF graph conforms to a World Wide Web Consortium (W3C) specification originally designed as a metadata model.
- W3C World Wide Web Consortium
- users tag fragments across a subset with RDF triples (machine readable tags with three parts including a subject, an object, and a predicate that indicates a relationship between the subject and the object).
- tagging a fragment with a triple is equivalent to placing the fragment and related comments into a container labeled with the node object.
- the node graph itself if simply the presentation of the aggregated RDF triples.
- FIG. 10 An exemplary node graph is depicted in FIG. 10 (a very simple one). Nodes are presented and the relationship between nodes is presented. The thickness of the line between nodes indicates the number of fragments linking them. Color coding may also be used. For instance, a white node is empty. A yellow node is one designated by the current user. A blue node is one designated by another user. A green node is one designated by both the current user and another user.
- a node graph serves as (a) an efficient mechanism for navigating across a subset (i.e., clicking a node opens a report), b) a coordination mechanism that provides each contributor with information about the combined efforts of all contributors (i.e., the collective visualization of the annotation layer provides a birds-eye view of the breadth and depth of coverage achieved by the group which may serve to guide the future distribution of individual effort), and (c) a cognitive tool to help teams recognize patterns hidden in data spread across many documents. For example, users engage in “what if” thinking by adding or excluding documents, triples, people, and/or groups from the reports and representing the resulting data as node graphs. Structural changes in node graphs that occur as the result of running different filters are intuitively grasped as a visual change. This is one of the most powerful aspects of node graphs.
- the system updates the subset for all users. This suggests the idea of distributed sensemaking activities where a small number of specialists use large arrays to work on data coming in to a central location from team members in the field.
- the machines should be configured to “learn” from the exemplars (e.g., targeted fragments, comments, and tags) to identify files and fragments across a corpus that match the exemplars in meaning.
- the exemplars e.g., targeted fragments, comments, and tags
- the machines become more capable at foraging the corpus for the most pertinent information related to the task at hand.
- users become more capable of thinking within the problem space.
- the present invention establishes a two-way feedback loop that accelerates machine and human learning (i.e., co-active evolution).
- the present invention is a flexible system that allows a variety of workflows to match requirements of a given task environment.
- a curator assembles a group of people with diverse perspectives to define and solve a challenging problem.
- each member imports relevant documents into the social annotation engine as part of a “seeding” process.
- the results of the seeding process i.e., selected fragments, comments, tags, and questions
- the group selects a subset of promising files from the results, brings these files into the social annotation engine with associated gray matter, and repeats the cycle.
- Step 60 in FIGS. 2 and 3 represents an ultimate product of the inventive process.
- the invention preferably includes provisions for creating a three-dimensional publication in either PDF or HTML.
- the top layer is a report authored and reviewed by whoever is responsible for producing a final deliverable based on results of the sensemaking activity.
- the second layer is a table (i.e., a PDF or HTML version of a report) that includes fragments and related meta-content relevant to the report from across the subset.
- the third layer is the subset itself converted to PDF or HTML. A reader can navigate from a fragment in the report to related fragments and discussion threads in the table and, from the table, to the exact location of the fragments in their sources.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A system and method allowing machine analytics and multiple human analysts to work together in a collaborative fashion in the analysis of large amounts of data. The collaborative analysis promotes “sensemaking.” Sensemaking is the continued revising and expansion of a developing “story” so that it becomes more comprehensible. As the emerging story is refined, it should successfully incorporate more and more of the observed data and be more defensible with respect to contrary arguments.
Description
- This non-provisional patent application claims the benefit of previously-filed U.S. patent application Ser. No. 13/894,013 (now U.S. Pat. No. 11,669,748). The parent application listed the same inventor. It was filed on May 14, 2013.
- Not applicable
- Not Applicable
- This invention relates to the field of data analysis and sensmaking. More specifically, the invention comprises a system and method for allowing a wide variety of smart machines and virtually any size group of humans to interact collaboratively in seeking to discern hidden meaning and make sense of information spread across virtually any size collection and variety of sources and file types.
- The total quantity of digital information available in the world is growing exponentially. Most of this collection is unstructured data, such as free form text, images, and video (e.g., news feeds, emails, Office files, PDFs, Web pages, images, social media streams, video, sensor data, and reconnaissance data). One of the biggest challenges of the digital information age is how to enable groups of individuals with diverse perspectives to collaborate on making sense of this deluge of information spread across many sources and various file types. Sensemaking, as commonly used within various scientific communities, represents the integration of observations and evidence with human values, intuitions, and story-telling capabilities in the generation of new knowledge and decision making. As of yet, sensemaking skills remain difficult for machines to emulate and are not well understood by scientists.
- Organizations in all sectors of the economy including such areas as financial services, communications, online content, life sciences, education, government, national defense and intelligence desire to use machine analytics to turn “Big Data” (and smaller data sets) into actionable knowledge for making data-driven decisions. Big data is generally defined as high volume, high velocity, high variety data, often from many disparate sources, usually made up of largely unstructured and semi-structured data, and characterized by different levels of veracity or trustworthiness. By tapping into Big Data and the collective intelligence of online workgroups and communities, organizations hope to improve execution of their workflows and achieve other desirable outcomes and competitive advantages. To accomplish these goals, organizations are promoting social networking across the enterprise (e.g., interaction between employees, customers, suppliers, and partners using various forms of social software) and running machine analytics on the stream of conversations and exchange of information to gather business intelligence (BI). This growing trend is sometimes referred to as Social Business.
- In fields such as tactical military intelligence, retrospective analysis has repeatedly demonstrated that patterns within the unstructured data could have been used to predict future events such as terrorist attacks. However, the data is being generated at such a rate that it is impossible for human analysts to review a significant portion within mission timelines. Similarly, in most knowledge-based fields such as Life Sciences, the number of research publications and related content has expanded beyond the capacity of individual practitioners to stay current, even within narrow areas of specialization.
- Increasingly, organizations across all sectors are using machine analytics to tap into “Big Data” for a wide variety of purposes. The results for those organizations with the right mix of leadership, technology, and personnel has been a wealth of valuable insights, as smart machines uncover correlations and patterns hidden deep within the information deluge. However, even the most advanced users of machine analytics have run into marked limitations of current technologies and practices for addressing certain types of complex problems. Such problems, sometimes referred to as “wicked” problems, require the integration of human values, intuitions, and story-telling capabilities. In other words, the solutions to wicked problems are not in the data for smart machines to ferret out. Wicked problems have no right or wrong solutions. Rather, they require people to choose a course of action despite their incomplete understanding of the situation.
- Recently, researchers have documented a severe shortage of data scientists, managers, and analysts with the specialized skills required to understand the results generated by smart machines (e.g., platforms that provide various forms of text analytics, visual analytics, image pattern matching, and machine learning capabilities). This deficiency is expected to persist for several years and is generally seen as a major bottleneck to maximizing return on investment (ROl) for machine analytics. However, from a Social Business perspective (i.e., an approach to business process redesign that engages all the individuals of its ecosystem to maximize the co-created value), even an organization that has the resources and good fortune to hire talented data scientists will miss out on a major competitive advantage. Smart machines lack an integrated social component for obtaining the benefits of sharing diverse perspectives (i.e., the wisdom of the crowd) on the accumulating store of largely unstructured data distributed across the wider organizational ecosystem.
- The lack of this social component in machine analytics is especially limiting when individuals are facing wicked problems that inevitably require human sensemaking skills. “Sensemaking is not about truth and getting it right. Instead, it is about continued redrafting of an emerging story so that it becomes more comprehensible, incorporates more of the observed data, and is more resilient in the face of criticism.”. An example of sensemaking is attempting to discern the motivations of various groups in a geopolitical conflict or attempting to predict the impact of a particular course of action on human motivation and behavior. Sensemaking is not about truth and getting it right. Instead, it is about continued redrafting of an emerging story so that it becomes more comprehensible, incorporates more of the observed data, and is more resilient in the face of criticism. Unguided machine analytics simply do not perform this type of task well.
- Machines, however, perform many other types of analytical tasks at orders of magnitude, faster and more accurately than humans. Examples include finding people, places, and things in Big Data and finding connections between the identified people, places, and things. For example, when analysts want to uncover the range of operating terrorist networks within an area, they feed data on events involving improvised explosive devices (IEDs) into a machine-based analysis system. The machine combines geospatial data on where the IEDs exploded with the serial numbers, fingerprints, and DNA evidence found on recovered shards of IEDs to establish a range of operation for a terrorist cell.
- However, the machine-based analysis will not determine how best to respond to the terrorist activity. Seeking to answer that question requires the solution of a wicked problem. Unlike the goal of mapping the geospatial extent of a terrorist network, the question of what to do about it requires the application of human sensemaking skills.
- An ideal solution to this problem would shift the focus of smart machines from delivering insights to serving as a differently-abled teammate. The machine as teammate collaborates with analysts and decision makers to support human sensemaking and improve the performance of the team in accomplishing mission critical goals. Further, an ideal solution would allow many humans to simultaneously work on the problem in order to bring diverse perspectives and the wisdom of the crowd into play. The present invention provides such a solution by making high speed, high velocity, high variety machine-scale analytics compatible with deliberative, collaborative, human-scale sensemaking practices.
- The present invention comprises a system and method allowing various smart machines and virtually any number of people to work together in a collaborative fashion in the analysis of virtually any size corpus and variety of file types (i.e., Big Data and smaller data sets too). The invention, the HyLighter Collaborative Sensemaking System (CSS), is the first-in-class instantiation of a Big Data social sensemaking platform. Sensemaking involves the continued revision and expansion of a developing “story” so that it becomes more comprehensible and more defensible with respect to contrary arguments. HyLighter CSS includes the HyLighter sensemaking system as the front-end component for a variety of machines that are integrated to support multimedia, multilingual, machine analytics and collaborative sensemaking.
- The system and its various configurations are constructed from six main building blocks or capabilities. These include the following:
-
- 1. Users Share Color-Coded Hylights
- a. Users add color-coded highlighting to selected snippets or fragments of text, images, and video. The system saves the coordinates of where the emphasis begins and ends.
- b. Users add comments, replies to existing comments, customizable tags, and other meta-content (i.e., content about content) to highlighted fragments.
- c. System adds date/time stamps, user ID information, and other meta-content to highlighted fragments.
- 2. System Assigns Unique URLs to Each Hylight
- a. Users link related hylights within and between documents.
- b. System provides various search and filtering mechanisms for easy linking of related hylights. This includes tracking user mouse clicks across documents and providing users with access to their own history and the history of others (i.e., shadowing feature).
- c. User sends a hylight (i.e., a selected comment and related fragment) from a document to a recipient by email (or other electronic communication modality such as instant messaging). When the recipient replies to the message (i.e., referred to as a Pinpoint email or instant message), the system posts the response in the margin under the original comment as part of a discussion thread linked to the related fragment within the source document.
- d. User marks a hylight with a due date for completing a task, and the system sends reminders to responsible individuals as the due date draws near.
- e. System has special provisions for working with semi-structured data (e.g., forms, records, surveys, and email) that builds on the color-coding mechanism and the assignment of a unique URLs to each highlighted fragment.
- 3. System Assembles Hylights as Requested by Users in Various Displays and Applications
- a. Users create customizable reports including fragments, comments, and other meta-content disaggregated from document collections and filtered as required with provisions for linking back to the exact location of each hylight within its source.
- b. Users create customizable “mashups” that enable navigation from hylight to hylight across multiple documents from the same HyLighter window by simply clicking on each fragment displayed in the margin.
- c. Users bookmark key hylights from a mashup, arrange the hylights in a desired order, and save the set to a library that is available to other users
- d. Users create various visualizations of the meta-content or social annotation layer such as color-coded node graphs—each node contains fragments and related meta-content extracted from the annotation layer with links back to their context.
- e. Users export meta-content to external applications such as Excel.
- 4. System Converts Machine Analytics to Machine HyLights
- a. System has provisions (e.g., APIs) for integrating or plugging into a variety of applications for such functions as text analytics, visual analytics, visual search and image pattern matching, machine translation, audio transcription, and data acquisition.
- b. Machine performs recognition, resolution and categorization of entities, and HyLighter CSS converts the machine analytics to machine hylights (i.e., entities are highlighted in the source and related comments, tags, and other meta-content appear in the margin).
- c. Machine identifies various types of relationships between entities, and HyLighter CSS adds the data to machine hylights.
- 5. System Establishes a Two-Way Feedback Loop Between Smart Machines and Human Sensemakers
- a. Users select key meta-content from the annotation layer and feed it back to the machine to target relevant content for building a second subset. The process continues through multiple cycles.
- b. The system provides instrumentation to measure sufficiency and quality of data by comparing among other metrics.
- 6. System is Capable of Managing Big Data
- a. The system provides various configurations to manage virtually any size collection (e.g., using a NOSQL approach).
- b. Users create customizable reports, mashups and other displays of any number of hylights disaggregated from any size collection with links back to the location of each hylight within its source.
- HyLighter CSS is a flexible system that can function using various combinations of building blocks or components for a wide variety of use cases and scenarios. These include using HyLighter CSS as a standalone application or in combination with other components (e.g., a text analytics platform) that add additional capabilities to the system. Different contexts have different requirements (e.g., work with very large or relatively small collections, text only or multimedia, one language or multiple languages, single individuals to groups of any size). When combined with a text analytics platform, the machine participates in the collaborative sensemaking process as a “differently-abled” teammate. In a nutshell, the system represents machine intelligence combined with the thinking of any size group across any size collection of any type of file (text, images, audio, video) as a layer of color-coded highlighting and related meta-content. This layer of social annotation includes machine- and human-generated comments/discussion threads, tags, and links between related fragments within and between documents. The system has provisions for displaying the disaggregated hylights in sortable, filterable, searchable, customizable displays (e.g., reports, mashups, node graphs) with links back to the exact locations of highlighted fragments in high-fidelity versions of the native files.
-
- HyLighter CSS provides two main approaches to the integration of smart machines for tapping into Big Data, as described briefly below:
- Incremental “Subset” Approach. This approach allows teams to tap into Big Data by analyzing manageable subsets of selected files in collaboration with smart machines.
- 1. A text analytics platform processes a large number of files.
- 2. Users select a promising and manageable subset for HyLighter to ingest.
- 3. HyLighter CSS displays the files with the machine analytics converted to machine hylights (i.e., all entities are highlighted in gray and related meta-content such as machine-generated tags appears in the margin).
- 4. Users review the machine hylights and add a layer of social annotation or hylights (i.e., highlighted fragments, related comments, replies to existing comments, various tags and tag types, and other meta-content) to the subset.
- 5. Users have numerous options for extracting and filtering hylights from a collection and assembling the results in a variety of displays (e.g., tabular reports, mashups, node graphs).
- 6. Users select relevant meta-content from the social annotation layer as feedback to the machine to target known unknowns (i.e., learn more about a subject of interest) and unknown unknowns (e.g., uncover entities that are not in the current subset but are related to entities in the subset).
- 7. Users repeat the cycle until “sufficiency” is achieved (i.e., the team has reviewed enough information with consideration of time available, the quality or provenance of the information, and the criticality of the problem to make data-driven decisions).
- Analysts review the machine hylights, add additional meta-content across the collection, view the accumulating social annotation layer in reports, mashups, and various visualizations, feed meta-content from the annotation layer back to the machine to improve machine performance, and publish reports that include links back to hylights that provide evidence for conclusions expressed in the publication.
- Direct Approach. This approach, though similar to the incremental approach, enables collaborative sensemaking on Big Data across whole organizations or, even, multiple organizations. HyLighter CSS makes the Direct approach feasible by maintaining consistent performance on documents and displays with any number of machine- and/or human-generated hylights. As a major benefit, each authorized user has access to the historical record of thinking of all users across the enterprise through a continuously improving social library
-
- 1. A text analytics platform processes millions of files including various types (e.g., text and preprocessed images, audio, and video).
- 2. HyLighter CSS, configured as a NOSQL system, ingests all of the files and converts the machine analytics to machine hylights.
- 3. Any number of analysts from any number of teams and organizations review the machine hylights and add a layer of human hylights across the ingested corpus.
- 4. As the layer of meta-content accumulates (i.e., representing the collective intelligence of the machine combined with human teammates from across the enterprise), users feed selected parts of the social annotation layer to the text analytics platform.
- 5. The machine learns from the meta-content to better support sensemaking activities by, for example, more accurately finding known unknowns and unknown unknowns in current sources and new sources.
- 6. HyLighter ingests the results of new searches, informed by the previous HyLighter session.
- 7. As these cycles run across an organization or multiple organizations, each authorized user has access to the historical record of machine- and human-generated annotation represented in the accumulating annotation layer spread across various collections and disaggregated from sources and assembled in reports, mashups, and other visualizations, in real time.
- As an example use case, suppose that a large manufacturer wants to run competitive business intelligence using open source content and subscription sites from around the world. The user specifies topics of interest (e.g., what is going on in the target market in India, China, and other countries), and the machine automatically pulls in documents and web-based material on these topics. The machine runs analytics including entity extraction and resolution on the data at Big Data scale. HyLighter converts the machine analytics into social annotation, displaying the machine intelligence as highlighted entities and associated comments and tags across the collection. Entities appear highlighted in gray with related meta-content displayed in the margin. For example, each entity may include a category tag (e.g., person, place, or date), a list of all entities in the corpus that are the same as the current entity (e.g., John Smith, Smitty, and Mr. Smith), a sentiment tag (e.g., positive or negative), a salience tag to indicate importance within the source document (e.g., a number on a scale from 1-100), and a list entities related to the selected entity.
- Users filter and extract hylights from the collection and assemble the results in various reports and displays. For example, a HyLighter “mashup” displays hylights extracted from multiple documents in the margin of a single HyLighter window. A primary feature of the mashup is that users navigate from hylight to hylight as if they were navigating through a single document. By clicking on a hylight, the user instantly arrives at the original fragment in its exact location in the source document. The capability to quickly view related hylights in context helps users to better understand intended meaning, evaluate veracity, and make associations between related pieces of information distributed across a large collection.
- As another example of the method as typically practiced, a human operator (a team leader or “curator”) establishes data collection rules for building and managing a corpus of documents or files for analysis. A text analytics platform ingests and processes the corpus. The curator and/or assigned others use various approaches to select a subset of promising files from the corpus. HyLighter CSS then ingests the subset and displays the machine analytics across the subset as machine hylights (e.g., the system highlights entities identified by the machine in gray and displays related meta-content in the margin as comments and tags). The curator may add additional files to the subset by merging in files from previously annotated subsets (also, referred to as collections). Of note, when properly configured, the curator has the option to pass an entire corpus of any size from the machine analytics platform to HyLighter CSS.
- The machine processed subset or collection is then subjected to the “HYLIGHTER” social sensemaking process, wherein multiple users independently select and annotate fragments of documents within the collection that are relevant to each user (based on his or her own field of endeavor, personal experience, objectives, etc.). This includes reviewing machine hylights across the collection for accuracy and relevance and highlighting over machine hylights to add context to the machines selections. The “HYLIGHTER” process enables user to access a Report and Mashup Builder and a variety of visualization tools that includes provisions for filtering hylights from any number of documents or document collections and displaying the results in various displays and formats.
- Significantly, each hylight in a display has a link or tether back to its exact location in its source. No matter where a hylight travels (e.g., to a tabular report, mashup, color-coded node graph, or Excel spreadsheet), the hylight maintains its provenance or place of origin. Instant access to the context of a fragment and related meta-content enables users to better understand the intended meaning. Further, by knowing the lineage (where did this information originate from) and the pedigree (how reliable is the source of this information), users are able to efficiently evaluate the veracity of the content.
- A final “communication” may be created, with the communication ideally telling the “story” that is emerging from the sensemaking process. This communication includes links back to the HyLighter social annotation layer, so that the report remains an active document. Optionally, meta-content assembled in various displays may be packaged and fed back to machine analytics platform that started the whole process. The metadata is configured to improve the machine analytics process so that the machine “learns” from the input provided by the human analysts. The process thus described is often iterative, with multiple cycles of machine analytics being run and the “story” continuing to evolve over time.
-
FIG. 1 is a schematic view, showing the general operation of the present invention. -
FIG. 2 is a schematic view, showing more details of the operation of the present invention. -
FIG. 3 is a schematic view, showing details of how human users interact with the present invention. -
FIG. 4 is a screen shot, showing a report used to edit the machine-generated results might appear in an exemplary embodiment of the invention. -
FIG. 5 is a screen shot, showing how the editing functions might be presented. -
FIG. 6 is a screen shot, showing how the editing functions might be presented. -
FIG. 7 is a schematic view, depicting the optional incorporation of libraries of previously annotated documents. -
FIG. 8 is a screen shot, showing how the interface used to facilitate human annotation of the results might appear. -
FIG. 9 is a screen shot, showing a representative visualization tool used to view the data. -
FIG. 10 is a screen shot, showing a second visualization tool (a node graph) used to view the data. -
-
- 10 data definition step
- 20 machine analysis step
- 22 import subset step
- 24 report building step
- 26 library decision point
- 28 library addition step
- 30 social analysis step
- 40 report generation step
- 42 visualization step
- 44 team leader
- 46 curator
- 48 administrative user
- 50 iteration decision point
- 52 user
- 54 reader
- 56 window
- 58 report
- 60 communication production step
- 62 column header
- 64 machine selection
- 66 fragment
- 68 tree presentation
- 70 feedback step
- 72 explanatory note
- 74 editing functions
- 76 cleansed subset
- 78 user annotation
- 82 current user highlight
- 84 current selection
- 86 other user highlight
- 88 consensus highlight
- 92 exemplar report
- 94 node graph
- 96 node
- 98 link indicator
-
FIG. 1 depicts a very broad overview of the functions carried out in the present inventive process. Thefirst step 10 involves the definition of the body of data to which the process is applied and rules for managing the collection. This will generally be a large body of data; however, the process can apply to small collections too. In any case, some definition of scope is required such as all data produced by recognized media sources over a window of time. - Machine analytics step 20 involves the application of text analytics to the data corpus. The present inventive process can integrate or plugin with a variety of text analytics platforms. Products in this field take a variety of approaches to the text analytics process and provide different sets of capabilities and results. The following includes a broad, though not all inclusive, list of information processing performed by text analytics platforms. The text analytics platform ingests files from the corpus, generally, as ASCII text and processes the text, as follows:
-
- a. Locates and collects entity references (i.e., entity extraction) such as people, places, things, concepts, key terms, and user-defined taxonomies from unstructured, semi-strucutred, and structured sources.
- b. Determines if any adjacent terms should be co-joined (i.e., lemmatization or chunking) such as “John Smith” or “the President of the United States.”
- c. Rolls up all concepts that refer to the same entity into a global representation of that concept (i.e., concept resolution).
- d. Classifies elements into categories (e.g., “John Smith” is the name of a person).
- e. Determines the strength of relationships between entities, generates graphs representing types of relationships present in the data, and provides a mechanism for filtering out weaker links.
- f. Assesses the salience of each entity within each source and tags the entity with a salience tag (e.g., from 1-100 with 100 indicating the highest level).
- g. Assesses the sentiment associated with each entity within each source and tags the entity with a sentiment tag (e.g., using a Likert scale to indicate positive/negative sentiment or agreement/disagrrement).
- h. Finds and extracts subject-object-predicate elements such as A increases B from the text of all processed files. This type of data is sometimes referred as a resource descriptor framework (RDF) triple (i.e., machine readable tags with three parts including a subject, an object, and a predicate that denotes a relationship).
- i. Maintains pedigree connections to source file information.
- j. Tags named locations found in the text with their respective geo-coordinates (i.e., latitude and longitude).
- i. Maintains time index information for processing new data without re-indexing existing data.
- k. If the targeted corpus includes image files, documents that include images (e.g., Word, PDF, HTML), or videos, an image pattern matching platform processes the files, as follows:
- Automatically segments each image in a way that discerns the individual objects in the image.
- Creates searchable tags of what it recognized in an image (e.g., sky, vegetation, a face, a building, a car, a map, and, even, text strings that might appear in an image).
- Automatically labels an image/video with matching tags.
- Passes the labels to the data analytics platform which runs the text through the normal data analytics process.
- However, it is important to realize that the application of the inventive process is not limited to situations where a working hypothesis or hypotheses exist. The invention is intended to assist in finding “unknown unknowns.” In that situation, the machine analytics may search for patterns in the data corpus and extract documents relating to those patterns, even when no “working assumption” has been provided by a human operator.
- supports a systematic approach to the detection of known unknowns and unknown unknowns hidden in Big Data and smaller datasets. As an operational definition, unknown unknowns are entities that are not in a subset of files exported from a larger collection and are related at some criterion-level of strength to entities that are in the subset.
- Once the machine analysis step is completed, the information is fed into the
social analysis step 30. In this step multiple human users review a subset of the data corpus and highlight fragments that they believe are significant. Mechanisms that are actually used for highlighting are described in detail in my prior U.S. patents (U.S. Pat. Nos. 7,080,317 and 7,921,357). These two prior patents are hereby incorporated by reference. - The process described in the prior patents is referred to as “HYLIGHTER.” In general, a user selects fragments of texts or images that he or she believes to be important. Other users do the same. Text selected by the current user is highlighted in a first color (such as yellow). Text selected by other users is highlighted in a second color (such as blue). Text selected by both the current user and other users is highlighted in a third color (such as green), indicating consensus. As more users highlight a particular element, its highlighting color grows darker (such as darker and darker shades of green indicating increasing consensus). The users are preferably allowed to add annotation notes as well.
- The results of the social engine analysis are used to generate visualizations in
step 40. The process next considers whether it will be iterative instep 50. If the process is to be repeated, then meta-content is fed back to the machine analysis viastep 70. The meta-content is used to refine the machine analysis. Additions and amendments made by human operators are fed back into the machines so that their functionality is improved. - HyLighter CSS when integrated with smart machines borrows a key concept of statistical process control and continuous improvement: Establish a feedback loop to allow a process to monitor itself, learn from its miscues, and adjust as necessary. HyLighter CSS adds a social sensemaking component to machine analytics that makes high speed, high velocity, high variety machine-scale analytics compatible with deliberative, collaborative, human-scale sensemaking practices. The system establishes a two-way feedback loop between users and smart machines that enables the co-active emergence and continuous improvement of machine intelligence and human sensemaking performance.
- Finally, a communication is produced in
step 60. This may assume many forms, with one example being a large text document with embedded links. The text document is the result of the “sensemaking” process. It is intended to tell a story that a ready may assimilate and understand. Significantly, the end product includes source links back to the set of documents that were extracted from the data corpus. Thus, if a reader wishes to see the source for a particular portion of the text document, he or she clicks on the link and the source document will then be presented. - The inventive process will now be described in more detail.
FIG. 2 provides an expanded depiction of the elements shown inFIG. 1 .FIG. 3 depicts the interaction of humans within the process. The definition of the data corpus is generally done by a team leader, though it may be the result of a group decision. It will also be common for a particular problem or topic of interest to be defined at this point. For example, the topic of interest may be the identification of finding sources for a new commercial development in the Middle East, and the corpus may be conventional media and social media sources. - Alternatively, the process may sometimes be implemented without a topic or working hypothesis. As one example, observer theory operators may be applied to the data corpus to detect patterns in the data without actually knowing what problem the process is attempting to solve. It will be more common, however, to have at least a defined topic. In some instances the topic may be defined externally to the process. An agency head may simply direct that a team using the process investigate a particular topic. The team leader would likely still need to select an appropriate data corpus.
- The data corpus can include virtually any data source. Portions may be extracted from the Web using Internet search engines and clustering engines. Portions may come from classified data collected by intelligence agencies.
-
Machine analysis 20 may be subdivided into two broad categories—text processing and image processing. The text processing engine begins by collecting references (person names, geospatial locations, business names, etc.). Adjacent terms are linked as phrases. If the name of a person is being sought, when the analytics see “Doe” next to “John,” the software links this into a “phrase” defining the name of a person—“John Doe.” - The analytics recognize that a person's name may appear in different ways: “Jon Doe,” “Doe, Jon,” “Jonathan Doe.” The software attempts to create a standard representation of repeating elements. The analytics also seek to discern relationships between identified entities and the strength of those relationships. Weaker relationships are filtered out while stronger ones are strengthened.
- The machine analytics create a resource descriptor framework (“RDF”). This framework attempts to organize elements as subject-object-predicate (an “RDF triple”). For example, if it detects elements “A” and “B” it may relate them as “A increases B” or “A interferes with B.”
- A machine-readable tag is placed on each element and relationship. Tags may include geographical references (such as latitude and longitude). Tags may also include a “time stamp” indicating when the data was acquired, when it was assimilated, etc. At all times the source information is maintained so that a human operator can link back to the source material.
- The image processing is of course designed to operate on image files (PDF, TIFF, HTML bit maps, images within word processing documents, etc.). The machine analytics segment the image in order to identify objects within the image (such as geospatial landmarks). The software then creates a searchable tag of what is recognized in the image, including various levels of recognition (“human face,” “human face: male/Caucasian,” “human face: John Doe”). The image is then labeled with a matching tag. The label is passed to the text processing module which then applies normal text processing.
- The text and image processing modules select a subset of the data corpus for further review. The automated processing identifies entities, rolled-up concepts, relationships, and categories. A curator 46 (see
FIG. 3 ) takes these results and may refine the selection process in order to create a desired final subset of the data corpus (a “subset”) instep 22. The curator is preferably provided with a computer-based editing tool presented in the form of a graphical user interface. - The curator selects a subset of the most promising files from the results of the machine analytics. The curator is preferably given three options for importing the files into the continued process, depending on the level of fidelity required. These options are:
-
- 1. Low fidelity—The files are imported directly from the data analytics database as ASCII text files. No images are provided and no formatting is provided.
- 2. Medium fidelity—The native files are imported by converting to XHTML (provides images and most of the formatting for MS OFFICE files but inconsistent results for HTML and PDF).
- 3. High fidelity—The native files are imported by converting to PDF images. The content usually appears the same as in the original native files, even including footers, headers and page numbers.
- During the importation process used in creating the “subset,” the inventive method transforms the machine-generated results into four types of meta-content. The meta content is referred to as “gray matter.” It appears as follows:
-
- 1. Entities identified by the machine analytics are emphasized with a gray shadow effect.
- 2. Each identified entity is tagged with a list of rolled up concepts (if any).
- 3. Each entity is tagged with a category tag by the machine.
- 4. All entities related to a selected entity are listed as determined by linkage depth filters generated by the machine analytics.
- In
step 24, acurator 46 or user withadministrative privileges 48 performs a preliminary review of the gray matter and performs a “cleansing” function. First, the user creates a report by selecting columns from a menu of available data types.FIG. 4 shows arepresentative report 58 opened in acomputer interface window 56. The curator has created a table by selectingcolumn headers 62. The curator is preferably able to arrange the columns in a desired order. Next, the curator has the option to search, sort, and filter the report. - One of the curator's tasks is to review the gray highlights in the doscset and consider whether their designation as a highlight is appropriate. The curator may select a particular gray highlight and pull up the source material to review the entity in the context of the file from which it was taken. The curator then deletes or revises the gray matter as necessary within the report. The curator's actions are saved so that they may be used to refine the application of the machine analytics in the future. The changes are used to update the subset.
-
FIGS. 5 and 6 show a representative graphical user interface that might be provided to the curator. In the view ofFIG. 5 , the graphical window has been divided into three portions. The portion on the right shows the currentlyactive fragment 66.Machine selections 64 are highlighted in gray (In the actual interface, the highlight would appear as a gray shadow. According to accepted patent drawing standards, this gray shadow is shown in the views as broken horizontal hatching). The sub-window in the upper leftshow tree representation 68. This may be used to show the structure of the data groups pertaining to the fragment in the right-hand window. - Finally, the lower left hand window shows explanatory notes 72. These may be used to explain some details about the machine selections and the machine's motivation(s) for making the selection.
FIG. 6 shows an editing window that the curator may open using a pull-down menu. Editing functions 74 are provided so that the user can correct, alter, or delete the machine highlights. - These actions are shown as
step 24 inFIGS. 2 and 3 . The actions may be taken by a single curator or by a curator and one or more users havingadministrative privileges 48. The result is a completed subset on which the remaining steps of the inventive method will operate. - Next, a decision is made regarding whether historical library files will be added to the subset. This is shown as
library decision point 26 inFIG. 2 . It is possible to bypass the historical data and proceed directly to the social annotation engine. However, as the inventive method stores the results of the sensemaking activities in continually improving social libraries, it will often be desirable to access these libraries. -
FIG. 7 shows how cleansed subset 76 (emerging fromsteps 22 and 24) may be merged with the prior subset libraries. The curator is given the option to add files to a subset from the available libraries of annotated documents. Users may juxtapose fragments and comments that come from different subsets, libraries, agencies, organizations, and disciplines. This mechanism supports creative thinking and innovation by increasing the potential of teams to discover knowledge and skills in one context that have value in a different context. - The subset is then fed into the social annotation engine (
step 30 inFIGS. 2 and 3 ).Multiple users 52 take the files in the subset, review the, and add highlights, remarks, and links. The highlighting process is very significant in the present invention. A user reviews each file and applies his or her unique perspective in determining which portions are significant. The user employs software to “highlight” the important portions. A “highlight” is an emphasizing color that is added to a portion of the file in a manner that is analogous to mechanical highlighting using a fluorescent pen on a paper document. The highlighting functions preferably include the following features: -
- 1. An active fragment is highlighted in a first color, such as orange.
- 2. When the current user selects a fragment for emphasis, it changes to a second color, such as yellow. (Note that a fragment may be a portion of text, a portion of image, or both).
- 3. When another user has selected a fragment for emphasis that is not selected by the current user, it appears in a third color, such as blue. The blue color grows darker as more and more users select a particular fragment.
- 4. When the current user selects a fragment that has also been selected by another user, it appears in a fourth color such as green. The green color preferably starts out rather faint. As more and more users select the same fragment, the green color gets darker and darker (indicating increasing consensus).
- 5. When a user highlights over a gray shadow added by the machine analytics, the gray shadow remains visible along with the highlighting color.
- 6. When a user is actually performing the highlighting task, he or she is preferably not shown the color coding representing the selections of other users (since this might bias the result).
- 7. The cumulative color coding is generally available to a user after he or she has already contributed selections.
- 8. The user has the opportunity to add a comment explaining the reasoning behind the selection of a fragment. The user may also create a link from one fragment to other things deemed relevant.
-
FIG. 8 shows a screenshot depicting the application of the process thus described. The depiction is of the unified results of several users.Current user highlight 82 reflects a selection made by the user that was not selected by anyone else (Standard hatching patterns are again used to represent the colors that actually appear in the user interface).Other user highlight 86 represents a selection that was made by another user but not by the current user.Consensus highlight 88 represents a selection that was made by the current user and at least one other user.Current selection 84 represents the current entity selected. - The annotation window on the left corresponds to the current fragment selected. One may look in the annotation window to see the comments made by various users concerning the currently selected highlight (user annotation 78). Each user may also establish links, which may be graphically displayed in the annotations section or elsewhere. The linking functions are preferably based on a URL for each document or fragment.
- The linking function is more than simply a tie between two elements. A user may also add the reason for the link. As an example, if a user links “B” to “A” the motivation may be the fact that B provides evidence for A.
- Once all the relevant users have participated in the highlighting and annotation process, reports reflecting the accumulated annotations are created (
step 40 inFIGS. 2 and 3 ). The reports are useful in the sensemaking process and may assume a virtually endless variety of forms.FIG. 9 shows one example (exemplary report 92). In this example, the columns reflect the document identifier, the title, whether there are new changes pertaining to the document, any images associated with the document, a listing of the selected fragment itself, and comments added by a user. - A reader of the report can select the fragment itself in order to link back to the source document. This allows a reader to view the fragment in context. A reader may also add a link. Note, for instance, the link added in the right hand column by “User 660.” The ability to add the link saves the user from having to recreate the underlying rational and instead link the interested reader (preferably via URL) to another fragment elsewhere.
- The report preferably includes provisions for searching, sorting, and filtering meta-content that is split out of the subset. The report is continually updated with real-time data (as users continue to contribute new meta-content to the annotation layer). It is important to keep in mind that the inventive method is not a static process that is run once with a final result. Rather, all the steps and reports will be continually updated as multiple users provide new input in the evolving situation.
- Users are preferably able to run various analytics (such as identifying superior performers by generating metrics on the number of links across a subset by user name). Through the reporting functions, each user has access to a picture of the integrated “story” as it emerges from the collaborative process.
- At all times the users are only one click away from seeing a selected fragment in the context of its source document. This capability assures that, no matter where the disaggregated meta-content travels, the data maintains a connection to its place of origin. Users are thereby able to better evaluate lineage and security classifications.
- The report works in concert with a type of “mash-up” feature for efficiently navigating to related fragments and comments that are spread across multiple documents and file types. For example, the user may filter all the fragments in a report related to a specific entity. The mash-up feature allows the user to navigate from fragment to fragment using only a next button or an arrow key. The rapid navigation allows users to more quickly perceive relationships.
- Returning now to
FIGS. 2 and 3 , step 42 shows the next step in the process (viewing visualizations). The inventive method preferably includes a visual analytics platform that is capable of representing the annotation layer for selected documents in the form of an RDF graph or a data graph. An RDF graph conforms to a World Wide Web Consortium (W3C) specification originally designed as a metadata model. As a prerequisite to generating a node graph, users tag fragments across a subset with RDF triples (machine readable tags with three parts including a subject, an object, and a predicate that indicates a relationship between the subject and the object). In a sense, tagging a fragment with a triple is equivalent to placing the fragment and related comments into a container labeled with the node object. The node graph itself if simply the presentation of the aggregated RDF triples. - An exemplary node graph is depicted in
FIG. 10 (a very simple one). Nodes are presented and the relationship between nodes is presented. The thickness of the line between nodes indicates the number of fragments linking them. Color coding may also be used. For instance, a white node is empty. A yellow node is one designated by the current user. A blue node is one designated by another user. A green node is one designated by both the current user and another user. - A node graph serves as (a) an efficient mechanism for navigating across a subset (i.e., clicking a node opens a report), b) a coordination mechanism that provides each contributor with information about the combined efforts of all contributors (i.e., the collective visualization of the annotation layer provides a birds-eye view of the breadth and depth of coverage achieved by the group which may serve to guide the future distribution of individual effort), and (c) a cognitive tool to help teams recognize patterns hidden in data spread across many documents. For example, users engage in “what if” thinking by adding or excluding documents, triples, people, and/or groups from the reports and representing the resulting data as node graphs. Structural changes in node graphs that occur as the result of running different filters are intuitively grasped as a visual change. This is one of the most powerful aspects of node graphs.
- As a user carries out the various actions enabled in the present invention, the system updates the subset for all users. This suggests the idea of distributed sensemaking activities where a small number of specialists use large arrays to work on data coming in to a central location from team members in the field.
- Users are preferably given the option to filter meta-content through a report to serve as feedback to smart machines. This step is depicted in
FIGS. 2 and 3 assteps - The machines should be configured to “learn” from the exemplars (e.g., targeted fragments, comments, and tags) to identify files and fragments across a corpus that match the exemplars in meaning. By ingesting meta-content from the social annotation layer, the machines become more capable at foraging the corpus for the most pertinent information related to the task at hand. As the machines become smarter at finding relevant content, users become more capable of thinking within the problem space. In effect, the present invention establishes a two-way feedback loop that accelerates machine and human learning (i.e., co-active evolution).
- The present invention is a flexible system that allows a variety of workflows to match requirements of a given task environment. A curator assembles a group of people with diverse perspectives to define and solve a challenging problem. Once the group achieves consensus on a written problem statement, each member imports relevant documents into the social annotation engine as part of a “seeding” process. The results of the seeding process (i.e., selected fragments, comments, tags, and questions) are ingested by smart machines to refine machine analytics on a large corpus. The group selects a subset of promising files from the results, brings these files into the social annotation engine with associated gray matter, and repeats the cycle. This process continues until the team attains a state of sufficiency and moves to take action (e.g., write a report, enact a solution). Sufficiency is achieved when group members agree that they have (a) reviewed a diverse enough set of sources, (b) generated a broad enough range of pertinent questions, and (c) adequately addressed major concerns.
-
Step 60 inFIGS. 2 and 3 represents an ultimate product of the inventive process. The invention preferably includes provisions for creating a three-dimensional publication in either PDF or HTML. The top layer is a report authored and reviewed by whoever is responsible for producing a final deliverable based on results of the sensemaking activity. The second layer is a table (i.e., a PDF or HTML version of a report) that includes fragments and related meta-content relevant to the report from across the subset. The third layer is the subset itself converted to PDF or HTML. A reader can navigate from a fragment in the report to related fragments and discussion threads in the table and, from the table, to the exact location of the fragments in their sources. Rather than relying on references to whole documents and footnotes for finding support for the authors' arguments and conclusions, the consumers of the report have access to the evidence in context. This arrangement supports transparency and accountability in communications and maintains an historical record of the thinking of the group as a hyperlinked set of PDFs or HTML files. - Although the preceding description contains significant detail, it should not be construed as limiting the scope of the invention but rather as providing illustrations of the preferred embodiments of the invention. Those skilled in the art will know that many other variations are possible without departing from the scope of the invention. Accordingly, the scope of the invention should properly be determined with respect to the following claims rather than the examples given.
Claims (20)
1. A method for collaborative sensemaking in which a group of human users work with machine analytics, comprising:
(a) defining a data corpus comprising a large body of data;
(b) applying machine analytic software running on a computer to said data corpus, wherein said machine analytic software selects a docset from said data corpus, said docset being a much smaller subset of said data corpus;
(c) wherein said machine analytic software selects said docset on the basis of said docset containing entities that are previously defined as being of interest;
(d) wherein said machine analytic software annotates said docset by highlighting said selected entities previously defined as being of interest within said docset in a first color;
(e) creating a display of highlighted entities of said docset, wherein each highlighted entity includes said highlighting in said first color;
(f) providing a group of human users;
(g) defining criteria for the appropriate selection of entities within said docset by said human users;
(h) providing a software-based system for each of said human users to review said docset and highlight entities within said docset according to said defined criteria, using a second color that is different from said first color;
(i) providing software that creates a report including a visual presentation of said entities, wherein said software automatically applies color graduation to said visual presentation of said entities, with said color graduation indicating an increasing level of consensus as to entities highlighted by said users in said group, but preserving said highlights of said machine analytic software;
(j) wherein said color graduation is not visible to each of said users as said user is highlighting said entities, and said software only makes said color graduation visible after a user has added said highlights and said software has generated said report; and
(k) for each highlighted entity contained within said report, providing a link that leads back to a depiction of a source file from which said entity was originally taken.
2. The method for collaborative sensemaking as recited in claim 1 , wherein said docset includes at least one file type selected from the group consisting of text files, still image files, video image files, and audio files.
3. The method for collaborative sensemaking as recited in claim 1 , wherein said color graduation comprises using a progressively darker shade of a color to indicate an increasing level of consensus.
4. The method for collaborative sensemaking as recited in claim 1 , wherein said link leading back to a depiction of said source file comprises:
(a) assigning a unique uniform resource identifier or uniform resource locator to a fragment containing said highlighted entity, said unique uniform resource identifier or uniform resource locator identifying a larger data source from which said fragment is taken; and
(b) when a user actuates a link in said report, using said unique uniform resource identifier or uniform resource locator to pull up a depiction of said larger data source so that said user may view said highlighted entity in context.
5. The method for collaborative sensemaking as recited in claim 1 , further comprising providing a linking function for said users, whereby each of said users can link one highlighted entity to another highlighted entity.
6. The method for collaborative sensemaking as recited in claim 1 , further comprising after said software selects said docset, allowing a user to review and edit said highlighting selections made by said machine before said docset is provided to said group of users.
7. The method for collaborative sensemaking as recited in claim 1 , further comprising said machine analytic software creating RDF triples using said highlighted entities, each of said RDF triples being a machine-readable tags with three parts including a subject, an object, and a predicate that indicates a relationship between said subject and said object.
8. The method for collaborative sensemaking as recited in claim 7 , further comprising creating a node graph depicting said relationships between said highlighted entities.
9. The method for collaborative sensemaking as recited in claim 5 , further comprising allowing a user to move among a series of linked entities by activating a next function.
10. The method for collaborative sensemaking as recited in claim 1 , wherein said entities to be highlighted include a pictorial image, a graphical image, a portion of text, and a portion of video.
11. A method for collaborative sensemaking in which a group of human users work with machine analytics, comprising:
(a) defining a data corpus;
(b) creating a defined software process for extracting a subset of said data corpus, said defined software process configured to be run on a computer as an automated process;
(c) applying said defined software process running on a computer to said data corpus, wherein said defined software process selects a docset from said data corpus, said docset being a smaller subset of said data corpus;
(d) wherein said defined software process annotates said docset by highlighting said selected entities previously defined as being of interest within said docset in a first color;
(e) creating a display of highlighted entities of said docset, wherein each highlighted entity includes said highlighting in said first color;
(f) providing a group of human users;
(g) defining criteria for the appropriate selection of entities within said docset by said human users;
(h) providing a software-based system for each of said users to review said docset and highlight entities within said docset according to said defined criteria using a second color that is different from said first color;
(i) providing software that creates a report including a visual presentation of said entities wherein said software automatically applies color graduation to said visual presentation of said entities, with said color graduation indicating an increasing level of consensus as to entities highlighted by said users in said group, but preserving said highlights of said machine analytic software;
(j) wherein said color graduation is not visible to each of said users as said user is highlighting said entities, and said software only makes said color graduation visible after a user has added said highlights and said software has generated said report; and
(k) for each highlighted entity contained within said report, providing a link that leads back to a depiction of a source file from which said entity was originally taken.
12. The method for collaborative sensemaking as recited in claim 11 , wherein said docset includes at least one file type selected from the group consisting of text files, still image files, video image files, and audio files.
13. The method for collaborative sensemaking as recited in claim 11 , wherein said color graduation comprises using a progressively darker shade of a color to indicate an increasing level of consensus.
14. The method for collaborative sensemaking as recited in claim 11 , wherein said link leading back to a depiction of said source file comprises:
(a) assigning a unique uniform resource identifier or uniform resource locator to a fragment containing said highlighted entity, said unique uniform resource identifier or uniform resource locator identifying a larger data source from which said fragment is taken; and
(b) when a user actuates a link in said report, using said unique uniform resource identifier or uniform resource locator to pull up a depiction of said larger data source so that said user may view said highlighted entity in context.
15. The method for collaborative sensemaking as recited in claim 11 , further comprising providing a linking function for said users, whereby each of said users can link one highlighted entity to another highlighted entity.
16. The method for collaborative sensemaking as recited in claim 11 ,
further comprising after said software selects said docset, allowing a user to review and edit said highlighting selections made by said machine before said docset is provided to said group of users.
17. The method for collaborative sensemaking as recited in claim 11 , further comprising said machine analytic software creating RDF triples using said highlighted entities, each of said RDF triples being a machine-readable tags with three parts including a subject, an object, and a predicate that indicates a relationship between said subject and said object.
18. The method for collaborative sensemaking as recited in claim 17 , further comprising creating a node graph depicting said relationships between said highlighted entities.
19. The method for collaborative sensemaking as recited in claim 15 , further comprising allowing a user to move among a series of linked entities by activating a next function.
20. The method for collaborative sensemaking as recited in claim 11 , wherein said entities to be highlighted include a pictorial image, a graphical image, a portion of text, and a portion of video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/206,160 US20230316097A1 (en) | 2013-05-14 | 2023-06-06 | Collaborative sensemaking system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/894,013 US11669748B2 (en) | 2013-05-14 | 2013-05-14 | Collaborative sensemaking system and method |
US18/206,160 US20230316097A1 (en) | 2013-05-14 | 2023-06-06 | Collaborative sensemaking system and method |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/894,013 Continuation US11669748B2 (en) | 2013-05-14 | 2013-05-14 | Collaborative sensemaking system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230316097A1 true US20230316097A1 (en) | 2023-10-05 |
Family
ID=51896586
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/894,013 Active 2034-01-14 US11669748B2 (en) | 2013-05-14 | 2013-05-14 | Collaborative sensemaking system and method |
US18/206,160 Pending US20230316097A1 (en) | 2013-05-14 | 2023-06-06 | Collaborative sensemaking system and method |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/894,013 Active 2034-01-14 US11669748B2 (en) | 2013-05-14 | 2013-05-14 | Collaborative sensemaking system and method |
Country Status (1)
Country | Link |
---|---|
US (2) | US11669748B2 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9886448B2 (en) * | 2013-12-06 | 2018-02-06 | Media Gobbler, Inc. | Managing downloads of large data sets |
US20150324689A1 (en) * | 2014-05-12 | 2015-11-12 | Qualcomm Incorporated | Customized classifier over common features |
US10706371B2 (en) * | 2015-03-12 | 2020-07-07 | Accenture Global Solutions Limited | Data processing techniques |
US9875230B2 (en) * | 2016-04-08 | 2018-01-23 | International Business Machines Corporation | Text analysis on unstructured text to identify a high level of intensity of negative thoughts or beliefs |
US20180081885A1 (en) * | 2016-09-22 | 2018-03-22 | Autodesk, Inc. | Handoff support in asynchronous analysis tasks using knowledge transfer graphs |
US11663235B2 (en) | 2016-09-22 | 2023-05-30 | Autodesk, Inc. | Techniques for mixed-initiative visualization of data |
US11507859B2 (en) | 2019-01-08 | 2022-11-22 | Colorado State University Research Foundation | Trackable reasoning and analysis for crowdsourcing and evaluation |
CN110263265B (en) * | 2019-04-10 | 2024-05-07 | 腾讯科技(深圳)有限公司 | User tag generation method, device, storage medium and computer equipment |
US11500934B2 (en) * | 2020-06-30 | 2022-11-15 | Vesoft. Inc | POI recommendation method and device based on graph database, and storage medium |
US11822599B2 (en) * | 2020-12-16 | 2023-11-21 | International Business Machines Corporation | Visualization resonance for collaborative discourse |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030061201A1 (en) * | 2001-08-13 | 2003-03-27 | Xerox Corporation | System for propagating enrichment between documents |
US20030191608A1 (en) * | 2001-04-30 | 2003-10-09 | Anderson Mark Stephen | Data processing and observation system |
US20060236240A1 (en) * | 2002-05-23 | 2006-10-19 | Lebow David G | Highlighting comparison method |
US20100332360A1 (en) * | 2009-06-30 | 2010-12-30 | Sap Ag | Reconciliation of accounting documents |
US20170111236A1 (en) * | 2015-10-19 | 2017-04-20 | Nicira, Inc. | Virtual Network Management |
US20200301999A1 (en) * | 2019-03-21 | 2020-09-24 | International Business Machines Corporation | Cognitive multiple-level highlight contrasting for entities |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7089222B1 (en) * | 1999-02-08 | 2006-08-08 | Accenture, Llp | Goal based system tailored to the characteristics of a particular user |
US7080317B2 (en) * | 2001-05-31 | 2006-07-18 | Lebow David G | Text highlighting comparison method |
US7614057B2 (en) * | 2003-03-28 | 2009-11-03 | Microsoft Corporation | Entity linking system |
US8200775B2 (en) * | 2005-02-01 | 2012-06-12 | Newsilike Media Group, Inc | Enhanced syndication |
WO2009047570A1 (en) * | 2007-10-10 | 2009-04-16 | Iti Scotland Limited | Information extraction apparatus and methods |
WO2010121422A1 (en) * | 2009-04-22 | 2010-10-28 | Peking University | Connectivity similarity based graph learning for interactive multi-label image segmentation |
US8701087B2 (en) * | 2010-10-26 | 2014-04-15 | Sap Ag | System and method of annotating class models |
-
2013
- 2013-05-14 US US13/894,013 patent/US11669748B2/en active Active
-
2023
- 2023-06-06 US US18/206,160 patent/US20230316097A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030191608A1 (en) * | 2001-04-30 | 2003-10-09 | Anderson Mark Stephen | Data processing and observation system |
US20030061201A1 (en) * | 2001-08-13 | 2003-03-27 | Xerox Corporation | System for propagating enrichment between documents |
US20060236240A1 (en) * | 2002-05-23 | 2006-10-19 | Lebow David G | Highlighting comparison method |
US20100332360A1 (en) * | 2009-06-30 | 2010-12-30 | Sap Ag | Reconciliation of accounting documents |
US20170111236A1 (en) * | 2015-10-19 | 2017-04-20 | Nicira, Inc. | Virtual Network Management |
US20200301999A1 (en) * | 2019-03-21 | 2020-09-24 | International Business Machines Corporation | Cognitive multiple-level highlight contrasting for entities |
Also Published As
Publication number | Publication date |
---|---|
US20140344191A1 (en) | 2014-11-20 |
US11669748B2 (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230316097A1 (en) | Collaborative sensemaking system and method | |
Zhang et al. | Towards a comprehensive model of the cognitive process and mechanisms of individual sensemaking | |
Chen et al. | Click2annotate: Automated insight externalization with rich semantics | |
Loukissas | Taking Big Data apart: local readings of composite media collections | |
Dadzie et al. | Structuring visual exploratory analysis of skill demand | |
Kang et al. | Characterizing the intelligence analysis process through a longitudinal field study: Implications for visual analytics | |
Chen et al. | A systematic review for MOOC dropout prediction from the perspective of machine learning | |
Lowrance et al. | Template-based structured argumentation | |
Abbas et al. | SOA issues and their solutions through knowledge based techniques—a review | |
Bailey et al. | Cnvis: A web-based visual analytics tool for exploring conference navigator data | |
Correa et al. | End-user development landscape: A tour into tailoring software research | |
Elias | Enhancing User Interaction with Business Intelligence Dashboards | |
Lemieux et al. | Provenance: Past, present and future in interdisciplinary and multidisciplinary perspective | |
Blandford et al. | Conceptual design for sensemaking | |
Ben Sassi et al. | Data Science with Semantic Technologies: Application to Information Systems Development | |
Yalcin | A systematic and minimalist approach to lower barriers in visual data exploration | |
Abughazala | Architecting Data-Intensive Applications: From Data Architecture Design to Its Quality Assurance | |
Schröder | Building Knowledge Graphs from Messy Enterprise Data | |
Adorjan et al. | Towards a human-in-the-loop curation: A qualitative perspective | |
He | Entity-Based Insight Discovery in Visual Data Exploration | |
Morishima et al. | The hondigi/l-crowd joint project: A microtask-based approach for transcribing japanese texts | |
Thullen | From Information Overload to Knowledge Graphs: An Automatic Information Process Model | |
Safadi et al. | One Picture to Study One Thousand Words: Visualization for Qualitative Research in the Age of Digitalization | |
Shrinivasan | Supporting the sensemaking process in visual analytics | |
Wagner | Interorganizational Collaborative Architecture: A Systematic Mapping Study |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |