CN107885744B - Conversational data analysis - Google Patents

Conversational data analysis Download PDF

Info

Publication number
CN107885744B
CN107885744B CN201610867019.5A CN201610867019A CN107885744B CN 107885744 B CN107885744 B CN 107885744B CN 201610867019 A CN201610867019 A CN 201610867019A CN 107885744 B CN107885744 B CN 107885744B
Authority
CN
China
Prior art keywords
data analysis
user
data
information
content item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610867019.5A
Other languages
Chinese (zh)
Other versions
CN107885744A (en
Inventor
侯智涛
楼建光
梁潇
张博
张海东
张冬梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to CN202211627592.0A priority Critical patent/CN115858730A/en
Priority to CN201610867019.5A priority patent/CN107885744B/en
Priority to US16/338,061 priority patent/US11423229B2/en
Priority to EP17780278.2A priority patent/EP3519988A1/en
Priority to PCT/US2017/052839 priority patent/WO2018063924A1/en
Publication of CN107885744A publication Critical patent/CN107885744A/en
Priority to US17/813,435 priority patent/US20220405479A1/en
Application granted granted Critical
Publication of CN107885744B publication Critical patent/CN107885744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)
  • Machine Translation (AREA)

Abstract

Embodiments of the present disclosure relate to conversational data analysis. After receiving a data analysis request of a user, heuristic information may be determined based on the data analysis request. The heuristic information referred to herein is not the result of analyzing the request for the data, but rather information that may be used to direct the session to proceed. The user may provide supplemental information associated with the data analysis request based on such heuristic information, such as clarifying the meaning of the data analysis request, making a related further analysis request, and so forth. Based on the supplemental information from the user, the user can be provided with its true needs and meaningful data analysis results. In this way, the data analysis will be more accurate and efficient. The user can have good user experience while obtaining truly helpful information.

Description

Conversational data analysis
Background
Data analysis plays a very important role in many application fields such as data-driven decision-making systems. A user may submit a data query to a data analysis tool in order to query data from a desired perspective and create a visual report. In order to make data analysis more convenient and easy to use, a scheme of applying natural language processing to a user interface of data analysis has been proposed. Natural language processing refers to a technique for processing human language using a computer, which enables the computer to understand the human language.
Traditional data analysis schemes based on natural language processing are mainly based on a single input box (single input box) approach. When a data analysis request in the form of natural language input by a user is received, the machine performs a corresponding operation and provides a corresponding result. For simple or basic data analysis requests, such data analysis schemes are typically capable of obtaining corresponding data analysis results. However, for a complex data analysis request, it is often difficult for the existing data analysis scheme to correctly understand the real intention of the user, and the data analysis result required by the user cannot be provided.
Disclosure of Invention
To address the above and potential problems, embodiments of the present disclosure provide a two-way conversational data analysis method and apparatus. According to embodiments of the present disclosure, a user may complete a data analysis request in a dialog with a machine. After receiving a data analysis request of a user, heuristic information may be determined based on the data analysis request. The heuristic information referred to herein is not the result of analyzing the request for the data, but rather information that may be used to direct the session to proceed. The user may provide supplemental information associated with the data analysis request based on such heuristic information, such as clarifying the meaning of the data analysis request, making a related further analysis request, and so forth. Based on the supplemental information from the user, the user can be provided with its real needs and meaningful data analysis results. In this way, the data analysis will be more accurate and efficient. The user can have a good user experience while obtaining truly helpful information.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the disclosure, nor is it intended to be used to limit the scope of the disclosure.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings. The same or similar reference numbers in the drawings identify the same or similar elements, of which:
FIG. 1 illustrates a block diagram of a computing environment 100 in which one or more embodiments of the present disclosure may be implemented;
FIG. 2 shows a schematic diagram of a data set 200 for performing data analysis according to an embodiment of the present disclosure;
FIG. 3 shows a schematic diagram 300 of data analysis of a data set 200 according to an embodiment of the present disclosure;
FIG. 4A shows a schematic diagram 400 of data analysis of a data set 200 according to an embodiment of the present disclosure;
FIG. 4B illustrates a data analysis process diagram 450 for a two-way conversation based on the heuristic information of FIG. 4A, in accordance with an embodiment of the present disclosure;
FIG. 5 shows a flow diagram of a method 500 for data analysis according to an embodiment of the present disclosure;
FIG. 6 shows a flow diagram of a method 600 for data analysis according to an embodiment of the present disclosure;
FIG. 7 illustrates a user interface 700 for multiple dialogs according to an embodiment of the present disclosure; and
fig. 8 illustrates a user interface 800 according to an embodiment of the disclosure.
Throughout the drawings, the same or similar reference numbers refer to the same or similar elements.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
In general, "data analysis" described in the embodiments of the present disclosure refers to a process of analyzing a large amount of collected data (hereinafter, simply referred to as "data set") with an appropriate statistical analysis method, extracting useful information, and forming a conclusion, thereby studying and summarizing the data in detail.
The term "heuristic information" as used by embodiments of the present disclosure refers to information used to guide a session between a user and a data analysis device, such as information used to guide a user in clarifying data analysis requests, information used to provide users with extensive data analysis results, and so forth. The heuristic information is different from a result generated for a data analysis request of a user (hereinafter also referred to as "data analysis result").
The term "content item" as used by embodiments of the present disclosure refers to a semantic unit used to characterize data in a dataset, such as words or phrases relating to location, time, date, event, brand, category, and the like.
The term "code fragment" as used by embodiments of the present disclosure refers to a piece of code for implementing one or more operations associated with a content item. When the piece of code is run with the content item as input, the resulting output may be used as part or all of the result of the data analysis request.
The term "include" and its variants as used in this disclosure are intended to be inclusive, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment". Relevant definitions for other terms will be given in the following description.
Conventionally, data analysis schemes employ a one-way dialog approach, which can only provide corresponding data analysis results for simple or basic data analysis requests. When a user inputs a complex data analysis request, it is often difficult for conventional data analysis schemes to understand such a complex data analysis request, thereby causing the system to report an error or give an erroneous data analysis result. As a result, the user cannot be helped to obtain the data analysis result that he/she really wants to obtain, and the user's demand cannot be satisfied, so that the data analysis loses its meaning.
To this end, the present disclosure proposes a data analysis method and apparatus in a bidirectional dialogue manner, which can not only receive a data analysis request from a user, but also generate heuristic information by analyzing the data analysis request. The term "heuristic information" as used herein refers to information that is not the result of data analysis that is being used to guide the continuation of a data analysis session. For example, the heuristic information may guide the user to make further interpretations or supplements to get questions that the device can understand. The heuristic information may also be extensional information actively recommended to the user by the data analysis device that is relevant to the user's current analysis. These scalability information can be obtained, for example, by the data analysis device from the analyzed data by means of a data mining method. In this way, the method and the device of the embodiment of the disclosure can provide a data analysis result which meets requirements for a user more, and user experience is improved remarkably.
The basic principles and several example implementations of the present disclosure are explained below with reference to fig. 1-8. FIG. 1 illustrates a block diagram of a computing environment 100 in which a data analysis device of an embodiment of the present disclosure may be implemented. It should be understood that the computing environment 100 illustrated in FIG. 1 is only exemplary and should not be construed as limiting in any way the functionality and scope of the embodiments described herein.
As shown in FIG. 1, computing environment 100 includes a user 101 and a computing system/server 105 in the form of a general purpose computing device. Computing system/server 105 may be used to implement a data analysis device (hereinafter also referred to as "data analysis device 105") of embodiments of the present disclosure. User 101 may interact with computing system/server 105 to make data analysis request 102 and obtain desired data analysis results 180. Components of computing system/server 105 may include, but are not limited to, one or more processors or processing units 110, memory 120, storage 130, one or more communication units 140, one or more input devices 150, and one or more output devices 160. The processing unit 110 may be a real or virtual processor and can perform various processes according to programs stored in the memory 120. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of computing system/server 105.
Computing system/server 105 typically includes a number of computer storage media. Such media may be any available media that is accessible by computing system/server 105 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. Memory 120 may be volatile memory (e.g., registers, cache, random Access Memory (RAM)), non-volatile memory (e.g., read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof. Storage 130 may be a removable or non-removable medium and may include a machine-readable medium, such as a flash drive, a magnetic disk, or any other medium that may be capable of being used to store information and/or data 170 (e.g., data set 172) and that may be accessed within computing system/server 105. It should be appreciated that the above description is merely exemplary, and that the data set 172 can be stored not only in the storage device 130, but also in a network storage device or any other suitable form of storage.
Computing system/server 105 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 1, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces. Memory 120 may include one or more program products 122 having one or more sets of program modules configured to perform the functions of the various embodiments described herein.
The communication unit 140 enables communication with another computing device over a communication medium. Additionally, the functionality of the components of computing system/server 105 may be implemented in a single computing cluster or multiple computing machines, which are capable of communicating over a communications connection. Thus, computing system/server 105 may operate in a networked environment using logical connections to one or more other servers, a network Personal Computer (PC), or another general network node.
The input device 150 may be one or more of a variety of input devices such as a mouse, keyboard, trackball, voice input device, and the like. Output device 160 may be one or more output devices such as a display, speakers, printer, or the like. Computing system/server 105 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., as desired, through communication unit 140, with one or more devices that enable a user to interact with computing system/server 105, or with any device (e.g., network card, modem, etc.) that enables computing system/server 105 to communicate with one or more other computing devices. Such communication may be performed via input/output (I/O) interfaces (not shown).
As shown in fig. 1, storage device 130 has stored therein data 170 comprising a data set 172 (e.g., statistics relating to annual shark attacking humans), computing system/server 105 is capable of receiving data analysis request 102 for data set 172 input by user 101 via input device 150, determining heuristic information 103 for conducting a conversation based on data analysis request 102, and providing heuristic information 103 to user 101 via output device 160 for conducting user 101 to provide supplemental information associated with the data analysis request. Computing system/server 105 can then complete the data analysis process based on the supplemental information, resulting in data analysis results 180 that satisfy the user's needs. Logically, the data analysis results 180 may be presented using graphics, tables, text, audio, video, or any combination thereof. It should be understood that the data analysis results 180 may be presented in any suitable form, which is merely exemplary and is not intended to limit the scope of the present disclosure.
Embodiments of the present disclosure are further described below by way of specific examples. FIG. 2 shows a schematic diagram of a data set 200 for performing data analysis according to an embodiment of the present disclosure. While the data set 200 is shown in FIG. 2 in the form of a multi-dimensional table, it should be understood that the data set 200 may have any suitable form and the example in FIG. 2 is not intended to limit the scope of the present disclosure. The data set 200 may be implemented as the data set 172 in the data analysis device 105 of fig. 1.
In some embodiments, the data set 200 may be a single table, comma Separated Values (CSV) file, or any other suitable form of file stored in a database, or may be derived from a combination of multiple tables. As shown in FIG. 2, in this example, data set 200 is a table containing worldwide shark attack records, having a plurality of rows and a plurality of columns. Each record is a row in the table, and the columns "Country" 210, "sex" 220, "lethality" 230, "Activity" 240, "number of attacks" 250, and "year" 260 are dimensions of the data, respectively. A data model may be pre-established for the data set 200, which may include one or more content items and one or more operations associated with the content items. The content items may include dimensions of data and may include other content items determined from the content items according to a predetermined algorithm.
The data analysis tasks for the data set 200 may include a variety of online analysis techniques (OLAP), such as aggregation, cross-analysis (Slicing and ringing), drill-down (drill-down), and roll-up (roll-up), among others. In addition, the data analysis tasks may also include pattern mining, such as trends, outliers or outliers, correlations, and so forth. A complex data analysis task may include multiple subtasks. By parsing the data analysis request into operations corresponding to query languages (e.g., SQL, DAX, and MDX) based on semantics, the data analysis task may perform such operations on the data set 200 to arrive at a result for the data analysis request.
According to some embodiments of the present disclosure, data analysis device 105 may receive multiple forms of data analysis request 102 from user 101. Such data analysis requests may be simple short sentences; it may also be a complex sentence, such as a combination of multiple simple sentences or a long sentence with many definitions. Fig. 3 shows a schematic diagram 300 of data analysis for the data set 200 according to an embodiment of the disclosure. In the embodiment of fig. 3, the data analysis request 102 input by the user 101 is "please list dangerous countries annually". Upon receiving the data analysis request 102, the data analysis device 105 identifies one or more content items therefrom, such as "year," "danger," "country," and so forth.
The data analysis device 105 then compares the identified content item with a pre-established data model for the data set 200 to determine an action associated with the identified content item. In this embodiment, both the "year" and "country" content items have defined associated actions in the data model, but the "danger" content item does not define corresponding actions. Therefore, the data analysis device 105 cannot determine the operation associated with the content item "danger". It will be appreciated that this uncertainty is not due to the unclear meaning of the word "dangerous" itself, but rather to the inability to determine what operations should be performed on the data set based on the word.
In this case, the data analysis device 105 can generate a question for the data analysis request 102, such as "please explain what is the ' danger ' in the ' dangerous country? ". This problem is used to motivate the user 101 to provide clarification information about the content item "danger" guiding the dialog between the user 101 and the data analysis device 105.
Upon receiving the heuristic information 103 described above, the user 101 may enter clarifying information, for example, "the number of fatal attacks is above 100". The clarification information further explains the meaning of the content item "danger". Thus, according to embodiments of the present disclosure, a data analysis session does not terminate or report errors because operations corresponding to certain items in an analysis request are uncertain. Instead, the system will guide the data analysis session through normal by enlightening the user to enter clarification information.
Since both the "fatal" and the "attack times" belong to the content items in the established data model, the data analysis device 105 can search for the corresponding operation according to the content items and perform the searched operation on the data set 200. In this embodiment, the data analysis device 105 determines that australia and the united states are countries with a fatal attack number higher than 100, i.e., "dangerous countries" input by the user 101. In addition, the data analysis device 105 also gives a statistical graph of attack times of the two countries according to the annual attack times, so that the user can conveniently view the relevant information.
With this two-way conversation manner, the data analysis apparatus 105 can supplement the data analysis request 102 by letting the user 101 provide the clarifying information, thereby obtaining a data analysis result that more satisfies the user's needs. This reduces the likelihood that the data analysis device 105 will not obtain a data analysis result or will obtain an erroneous result, significantly improving the user experience.
In addition to or in addition to enticing the user to provide the clarifying information, the data analysis device 105 may also provide the user with expanded heuristic information for the data analysis results. Fig. 4A shows a schematic diagram 400 of data analysis for a data set 200 according to an embodiment of the present disclosure. In the embodiment shown in FIG. 4A, the data analysis request 102 entered by the user 101 is "yearly. When the data analysis device 105 receives the data analysis request 102, the content item "year" is identified from the request, and one or more operations associated with the "year" are determined from the data model. By performing these operations, an annual attack times curve 410 can be obtained. In addition, the data analysis device 105 also applies the data analysis result to one or more predefined operation templates, so that the abnormal value 411 in the curve 410 is subjected to expansibility analysis, and the following heuristic information is obtained: "do you want to know about the problem of outliers in 1960? "and provides the corresponding option" good "or" not thanks ".
According to embodiments of the present disclosure, the predefined operation template may be a set of one or more operations established according to historical statistics, profiles or preferences of the user 101, access records of multiple users, and the like. In some embodiments, the predefined operational template may be an analysis of outlier values, an analysis of data trends, an analysis of highest or lowest data, and so forth. It should be understood that the above description of predefined operational templates is merely exemplary, and is not intended to limit the scope of the present disclosure in any way. It will be appreciated by those skilled in the art that the predetermined operation template may be implemented in any suitable form.
Fig. 4B shows a data analysis process diagram 450 for a two-way conversation based on the heuristic information of fig. 4A, in accordance with an embodiment of the present disclosure. In the embodiment shown in FIG. 4B, the user 101 enters supplemental information based on the heuristic information provided by the data analysis device 105, such as entering "good" or clicking a button with "good". Upon receiving the supplementary information, the data analysis apparatus 105 obtains a corresponding data analysis result using a predefined operation template or an operation determined again in the operation model according to the supplementary information input by the user.
Still referring to the example of fig. 2, the result includes the text "if the outlier is decomposed by activity," number of fishing "attacks is first in all activities in 1960" 451, chart 452, and further heuristic information 453, i.e., "fishing has 2 main aspects, which one does you need to know? "and" male "," non-fatal ", and" not thanks ". The user 101 may continue to provide supplemental information based on the further heuristic information 453, such as selecting one of the three buttons "male," "non-fatal," and "not thank you," to obtain corresponding data analysis results.
With this two-way conversation approach, the data analysis device 105 can provide extensions to the data analysis results by providing heuristic information to the user 101, such that data analysis results that are more likely to meet the user's further needs can be provided from multiple perspectives or multiple perspectives. The method effectively improves the possibility that the user obtains the required further data analysis result, and obviously improves the user experience.
Several example embodiments of data analysis methods and apparatus relating to a two-way conversation approach are described in more detail below. Fig. 5 shows a flow diagram of a method 500 for data analysis according to an embodiment of the present disclosure. It should be appreciated that method 500 may be performed by processing unit 110 as described with reference to fig. 1.
At 510, a data analysis request for a data set is received from a user in a dialog. Taking the embodiment of fig. 1 as an example, user 101 provides data analysis request 102 to data analysis device 105, such as input device 150 of data analysis device 105. For example, the user 101 may input the data analysis request 101 in a dialog box in the form of text, voice, or a combination thereof, may input the data analysis request 101 by clicking or touching a button, a drop-down box, a graphic, a curve, text, or may input the data analysis request 101 by dragging a predetermined control, a graphic, text, or the like. It should be understood that the above-described example to input the data analysis request 101 is for discussion purposes only, is not intended to be limiting, and is not intended to limit the scope of the present disclosure in any way.
In some embodiments, when the data analysis device 105 receives a data analysis request 101 in the form of text, voice, or a combination thereof from a user, the data analysis device 105 may save it to memory or a predetermined storage space for later use. When a click or touch of a button, drop-down box, graphic, curve, text by the user 101 is detected, the data analysis device 105 may determine one or more events associated with the click or touch and derive information related to the received data analysis request based on the events. When a drag of a predetermined control, graphic, text, etc. by the user 101 is detected, the data analysis device 105 may determine one or more events associated with the drag and derive information related to the received data analysis request based on the events.
At 520, heuristic information is determined based on the data analysis request. In an embodiment of the present disclosure, the heuristic information is information that is used to guide a session between the user and the data analysis device so that it is not interrupted or error-prone, other than results produced for the user's data analysis request. For example, the heuristic information may guide the user to clarify certain concepts in the data analysis request that he enters, or provide the user with other expanded information about the data analysis results, and so on.
According to embodiments of the present disclosure, the data analysis device 105 may determine heuristic information in a variety of ways. In some embodiments, the data analysis device 105 may analyze the request by extracting content items from the data analysis request, such as words or phrases, for example, relating to location, time, date, event, brand, category, and the like. In some alternative embodiments, the data analysis device 105 may also determine content items that are more relevant to the extracted content items based on the extracted content items. Subsequently, the data analysis device 105 may determine whether at least one operation to be applied to the data set can be determined based on the extracted content items and/or the determined content items.
For example, the data analysis device 105 may perform linguistic analysis on the data analysis request to determine the part of speech that a word in the data analysis request has, such as "noun", "pronoun", "adverb", and the like, to determine the limiting effect of the word, such as "subject", "adverb", "predicate", and the like, and/or to determine other linguistic properties of the word. It should be understood that the above-mentioned linguistic analysis process may be implemented by using a conventional linguistic analysis algorithm, such as a Part-Of-Speech (POS) tagging algorithm, which will not be described herein.
Optionally, the data analysis device 105 may also detect the context of the data analysis request. In this process, the data analysis device 105 can determine under what circumstances the data analysis request is made, what content item the pronoun in the request refers to, what content is omitted from the request, and the like, by inquiring about the content input by the user within a predetermined period of time or in a predetermined number of sentences before the data analysis request is input.
The data analysis device 105 may then attempt to determine the at least one operation based on the content item, the linguistic analysis result, the context, and the predefined data model obtained above. The predefined data model may include a content item and one or more operations associated with the content item. In one embodiment, each operation of a content item may be associated with a different linguistic pattern, for example. In this case, the data analysis device 105 can determine the linguistic pattern of the content item from the above-described linguistic analysis result and context, and can further determine the operation of the content item having the linguistic pattern.
In some embodiments, if no action can be determined, the data analysis device 105 may consider the identified content item to have an unintelligible meaning, requiring clarification by the user. In this case, the data analysis device 105 may generate a question for the data analysis request based on the identified content item to motivate the user to provide clarification information about the content item.
Alternatively, in other embodiments, if the data analysis device 105 is capable of determining one or more operations based on the identified content item, the data analysis device 105 may determine a code fragment for implementing the at least one operation and determine the aforementioned heuristic information based on the code fragment. According to an embodiment of the present disclosure, a code segment is, for example, a program or a piece of code for implementing operations associated with a content item. The content item may be, for example, an input or part of an input of this code fragment and may have a different category, purpose or usage. A code segment may comprise one or more operations that are performed in an order. Code fragments may be on-demand, dynamically and/or automatically generated programs; or a predefined program stored in a specific memory. It should be appreciated that the code segments are flexibly configurable, which may be implemented in any suitable programming language or format, and are not intended to limit the scope of the present disclosure in any way.
In some embodiments, if the data analysis device 105 determines a plurality of code snippets based on the identified content item, the data analysis device 105 may sort the plurality of code snippets, such as by scoring the code snippets according to linguistic analysis results and/or contextual information for the data analysis request, and then sort the code snippets according to the score. A higher scoring code snippet means that the code snippet is more likely to satisfy the user's data analysis needs. The data analysis device 105 may provide the user 101 with data analysis results from the code snippet with the highest score. Further, the data analysis device 105 may provide options corresponding to code segments having a slightly lower score or results obtained from these code segments as heuristic information to the user 101. Such heuristic information includes extended information for the data analysis results, thereby increasing the likelihood of providing data analysis results that meet the user's needs.
According to some alternative embodiments of the present disclosure, the data analysis device 105 may also determine the heuristic information by expanding the data analysis request based on content items in the data analysis request, results for the data analysis request, predetermined expansion rules, and/or the like. In some embodiments, the data analysis device 105 may determine other operations associated with the results for the data analysis request, such as operations that may find a match according to a pre-established data model by extracting the results from the content item. The data analysis device 105 may then derive a code snippet based on the determined operation and run the derived code snippet to derive a result for subsequent provision of the result to the user 101 as heuristic information.
Alternatively, in further embodiments, the data analysis device 105 may also apply the content items in the data analysis request, the content items extracted from the results for the data analysis request, and the like to a predetermined expansion rule, resulting in one or more expanded content items associated with the existing content items. At this time, the data analysis device 105 may attempt to determine the operation associated with the extended content item and obtain the corresponding code segment, and then obtain the extended analysis result by running the code segment.
At 530, heuristic information is provided to the user to enable the user to provide supplemental information associated with the data analysis request based on the heuristic information. In some embodiments, the data analysis device 105 may provide the extended analysis results obtained above to the user as heuristic information for the user to choose from.
In some alternative embodiments, the data analysis device 105 may also provide the user 101 with only an identification of the code snippet associated with the expanded content item as heuristic information, for example presented to the user in the form of a sequence number, a keyword, etc., and only when the user clicks or enters the respective sequence number or keyword, the data analysis device 105 runs the corresponding code snippet to obtain the expanded analysis result. By the method, unnecessary system resource consumption can be reduced, and the operation efficiency and speed are improved.
In yet further alternative embodiments, at 530, the data analysis device 105 may provide a question generated for the data analysis request based on the identified content item to the user as heuristic information to motivate the user 101 to provide clarification information about the content item.
According to an embodiment of the present disclosure, after the user 101 receives the heuristic information, supplemental information associated with the data analysis request may be provided. The supplementary information may be clarifying information for a question posed by the data analysis apparatus 105 about a data analysis request, and may also be selective information as to whether to view extended analysis results and which extended analysis result to view. The data analysis device 105, upon receiving the supplemental information from the user, may determine a data analysis result associated with the supplemental information and provide the determined data analysis result to the user. In this way, the possibility that the user obtains the required data analysis result can be effectively improved, and the user experience is obviously improved.
According to an embodiment of the present disclosure, the data analysis device 105 stores at least one of the data analysis request and/or the supplemental information as a user profile for subsequent provision of relevant data analysis results based on the user profile. The user profile may include data analysis history information of the user, which may accurately reflect the data analysis habits of the user.
In some embodiments, even if the user does not submit any data analysis requests, the data analysis device 105 may automatically mine the data in the data set according to the user profile, proactively providing heuristic information to the user. In some alternative or additional embodiments, the data analysis device 105 may also obtain profiles of other users and determine a data analysis policy based on the user's profile and/or the profiles of the other users. For example, a data analysis policy may be, for example, "analyze outliers" when a majority of users have desired outliers of data over a period of time in the past. It should be understood that this is merely exemplary. In embodiments of the present disclosure, the data analysis strategy may also include average results, extrema, or other information of the analysis data, may also include priorities or execution orders of the analyses, and may also include any other suitable analysis manner. The data analysis device 105 may then analyze the data according to the determined data analysis policy, resulting in heuristic information and providing to the user. The user may determine the data analysis results that the user wants to view based on the heuristic information. In this way, the user can be helped to understand the data better, and the data analysis dialogue is effectively guided.
An embodiment of a data analysis method in connection with a two-way conversation mode according to the present disclosure is now described in more detail in connection with fig. 6. Fig. 6 shows a flow diagram of a method 600 for data analysis according to an embodiment of the present disclosure. It should be appreciated that method 600 may be performed by processing unit 110 as described with reference to fig. 1 and may be considered a particular implementation of method 500. It should also be appreciated that the method 600 is merely exemplary and not limiting, and that the operations in the method 600 may be increased or decreased as appropriate, and may be performed in any other suitable order.
At 610, the data analysis device 105 receives a data analysis request from a user for a data set. Still referring to the embodiment shown in fig. 3, the data analysis request received by the data analysis device 105 from the user 101 is, for example, "please list dangerous countries annually," as shown at 310. At 620, the data analysis device 105 extracts the content item from the data analysis request. With continued reference to the embodiment of fig. 3, the data analysis device 105 requests, from the data analysis, that the content item in "please list dangerous countries annually" is, for example, "annual", "dangerous", "country".
At 630, the data analysis device 105 determines whether an operation to be applied to the data set can be determined based on the content item. In the above-described embodiment, since both the "year" and the "country" content items have defined associated operations in the data model, but the "danger" content item does not define a corresponding operation, the data analysis device 105 cannot determine an operation associated with the content item "danger".
At 640, a question for the data analysis request is generated based on the content item as heuristic information. For example, the data analysis device 105 may generate a question for the data analysis request 102, such as "please explain what is the ' danger ' in the ' dangerous country? ". At 680, the data analysis device 105 provides the heuristic information to the user, i.e., presents the user with the above-described questions, as shown at 320 of FIG. 3.
At 690, supplemental information is received from the user. In the embodiment shown in FIG. 3, the user 101 enters the following clarification information: the number of fatal attacks is greater than 100, as shown at 330, to explain the meaning of the content item "dangerous". The method 600 then proceeds to 620, where the data analysis device 105 extracts the content item from the user's supplemental information. For example, the data analysis device 105 can extract the content item "fatal" and "number of attacks" and the like from the supplemental information "number of fatal attacks higher than 100".
Then, at 630, the data analysis device 105 determines whether an operation to be applied to the data set can be determined based on the extracted content item. In this embodiment, assuming that both the "fatal" and the "number of attacks" belong to content items in the established data model, the data analysis device 105 can determine, at 640, the operations corresponding thereto to be applied to the data set 200 from the content items. Then, at 650, the data analysis device 105 determines the code snippet to implement the determined operation. At 660, the data analysis device 105 may generate a corresponding data analysis result by executing the code snippet and provide the result to the user 101.
It is to be understood that the above examples are illustrative only and not limiting. In still other embodiments according to the present disclosure, if the data analysis device 105 determines at 630 that the content item (e.g., "fatal" and "number of attacks") determined from the supplemental information still does not result in an operation to be applied to the data set, the data analysis device 105 can continue to provide heuristic information to the user 101 for the user 101 to provide further clarifying information.
In still other embodiments according to the present disclosure, if the data analysis device 105 receives a data analysis request at 610 as shown at 420 in fig. 4A, i.e., "yearly", the data analysis device 105 identifies the content item "yearly" from the request at 620. Since the data model has stored therein operations associated with "year," the data analysis device 105 determines 630 that operations to be applied to the data set can be determined based on the content item.
The data analysis device 105 then determines 650 the code fragments to implement the above-described operations. At 660, the data analysis device 105 may generate corresponding data analysis results by executing the code snippet and provide the results to the user 101, as shown at 430.
In some embodiments, the data analysis device 105 may also determine heuristic information based on the data analysis results. As shown in the embodiment of fig. 6, after the data analysis device 105 provides 660 the data analysis results (i.e., the curve of the number of attacks per year) to the user 101, an extended analysis result may also be determined 670 based on the data analysis results and provided 680 to the user 101 as heuristic information. This heuristic information is for example "want to learn more about the problem of outliers in 1960? ", as shown at 440 in fig. 4, and corresponding candidates 441 and 442.
At 690, the data analysis device 105 receives supplemental information from the user, such as "good" as indicated by 443 in FIG. 4B. In some embodiments, through contextual analysis, the data analysis device 105 may determine the supplemental information entered by the user as "want to learn more about the 1960 outliers". The data analysis device 105 then extracts content items, such as "1960" and "outliers" etc., from the supplemental information at 620.
Then, at 630, the data analysis device 105 determines whether an operation to be applied to the data set can be determined based on the extracted content item. In this embodiment, since "1960" and "outliers" both belong to content items in the established data model, the data analysis device 105 at 640 can determine from these content items the operations that correspond to them that are to be applied to the data set 200. Then, at 650, the data analysis device 105 determines the code snippet to implement the determined operation. At 660, the data analysis device 105 generates corresponding data analysis results by executing the code fragments and provides the results to the user 101, as shown at 451 and 452.
It should be understood that the above examples are illustrative only and not limiting. In still other embodiments according to the present disclosure, after providing the above data analysis results, the data analysis device 105 may also continue to provide heuristic information, such as 453 in FIG. 4B, to the user 101 for the user to view more relevant data analysis results.
According to embodiments of the present disclosure, the user interface between the user 101 and the data analysis device 105 may include one or more dialogs. In some embodiments, the dialog described above in connection with the embodiments of fig. 5-6 may be considered the first dialog in the user interface. The data analysis device 105 may establish a second session different from the first session in a drag-and-drop manner upon receiving another data analysis request from the user 101, and may provide a data analysis result for the other data analysis request to the user in the second session. Fig. 7 illustrates a user interface 700 for multiple dialogs according to an embodiment of the present disclosure. In the embodiment shown in FIG. 7, during the first session 710, the user 101 creates a second session 720 by clicking or dragging the text or button "gender" on the left side of the user interface (shown at 701). In this second session, the data analysis device 105 may provide the results of a data analysis request 721 for "gender," as shown at 722. By the method, a user can simultaneously perform a plurality of data analysis tasks in a plurality of conversations, so that the working efficiency is effectively improved, and the use by the user is facilitated.
According to further embodiments of the present disclosure, a user may be provided with candidates in each dialog to present only the portion related to the data analysis result. Fig. 8 illustrates a user interface 800 according to an embodiment of the disclosure. In FIG. 8, the user makes only the portions 810, 820, and 830 relevant to the data analysis results visible by clicking on the relevant buttons or controls on the user interface. The user can perform an arrangement of views and a scaling of the size, etc. by dragging these parts 810, 820, and 830. At this time, portions related to the data analysis request, the heuristic information, and the like are not visible. In this way, the user can conveniently obtain the required data analysis result, such as directly generating a report or a chart. Therefore, the method and the device are convenient for users to use to a certain extent, and the user experience is improved.
According to an embodiment of the present disclosure, the data analysis device 105 may establish a user profile by learning the data analysis request of the user 101 to determine the user's preferences or interests. The data analysis device 105 may also update and refine the user profile in a dialogue with the user 101 to better understand and analyze the user's needs and intentions. For example, with respect to shark attack recording this data set 200, if the user 101 is preparing to write a word about australian shark attack analysis, it is likely that the user does not require data analysis about the united states, but only about australia. In this case, the data analysis apparatus 105 may consider the program fragment consisting of various operations on this content item in australia to have a higher score. In this way, the data analysis results provided to the user by the data analysis device 105 are likely to be relevant to australia, so as to better meet the user requirements.
The methods and functions described herein may be performed, at least in part, by one or more hardware logic components. By way of example, and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Some example implementations of the present disclosure are listed below.
Embodiments of the present disclosure include a computer-implemented method. The method comprises the following steps: receiving a data analysis request for a data set from a user in a dialog; determining, based on the data analysis request, heuristic information for directing the conversation that is different from a result for the data analysis request; and providing the heuristic information to the user to enable the user to provide supplemental information associated with the data analysis request based on the heuristic information.
In some embodiments, determining heuristic information comprises: extracting a content item from the data analysis request; determining whether at least one operation to be applied to the data set can be determined based on the identified content item; and in response to failing to determine the at least one operation based on the content item, generating a question for the data analysis request based on the identified content item, the question to motivate the user to provide clarification information about the content item.
In some embodiments, determining whether at least one operation to be applied to the data set can be determined based on the identified content item comprises: performing linguistic analysis on the data analysis request; detecting a context of the data analysis request; and attempting to determine the at least one operation based on the content item, the linguistic analysis result, the context, and a predefined data model.
In some embodiments, the method may further comprise: responsive to being able to determine the at least one operation based on the identified content item, determining a code snippet for implementing the at least one operation; and determining the heuristic information based on the code snippet.
In some embodiments, determining heuristic information comprises: generating the heuristic information by expanding the data analysis request based on at least one of: a content item in the data analysis request, a result for the data analysis request, and a predetermined expansion rule.
In some embodiments, the method may further comprise: in response to receiving the supplemental information from the user, determining a data analysis result associated with the supplemental information; and providing the determined data analysis results to the user.
In some embodiments, the dialog is a first dialog in a user interface, and the method may further include: establishing a second dialog different from the first dialog in a dragging manner in response to receiving a further data analysis request from the user; and providing data analysis results for the further data analysis request to the user in the second dialog.
In some embodiments, the method may further comprise: at least one of the data analysis request and the supplemental information is stored as a user profile to provide relevant data analysis results based on the user profile.
Embodiments of the present disclosure include an electronic device comprising: a processing unit; a memory coupled to the processing unit and storing instructions that, when executed by the processing unit, perform the following: receiving a data analysis request for a data set from a user in a dialog; determining heuristic information for directing the dialog based on the data analysis request that is different from a result for the data analysis request; and providing the heuristic information to the user to enable the user to provide supplemental information associated with the data analysis request based on the heuristic information.
In some embodiments, determining heuristic information may comprise: extracting a content item from the data analysis request; determining whether at least one operation to be applied to the data set can be determined based on the identified content item; and in response to failing to determine the at least one operation based on the content item, generating a question for the data analysis request based on the identified content item, the question to motivate the user to provide clarification information about the content item.
In some embodiments, determining whether at least one operation to be applied to the data set can be determined based on the identified content item may include: performing linguistic analysis on the data analysis request; detecting a context of the data analysis request; and attempting to determine the at least one operation based on the content item, the linguistic analysis result, the context, and a predefined data model.
In some embodiments, the actions may further include: responsive to being able to determine the at least one operation based on the identified content item, determining a code snippet for implementing the at least one operation; and determining the heuristic information based on the code snippet.
In some embodiments, determining heuristic information may comprise: generating the heuristic information by expanding the data analysis request based on at least one of: a content item in the data analysis request, a result for the data analysis request, and a predetermined expansion rule.
In some embodiments, the actions may further include: in response to receiving the supplemental information from the user, determining a data analysis result associated with the supplemental information; and providing the determined data analysis results to the user.
In some embodiments, the dialog is a first dialog in a user interface, and the actions may further include: establishing a second dialog different from the first dialog in a dragging manner in response to receiving a further data analysis request from the user; and providing data analysis results for the further data analysis request to the user in the second dialog.
In some embodiments, the actions may further include: storing at least one of the data analysis request and the supplemental information as a user profile to provide relevant data analysis results based on the user profile.
Embodiments of the present disclosure also provide a computer program product stored in a non-transitory computer storage medium and comprising machine executable instructions that, when run in a device, cause the device to: receiving a data analysis request for a data set from a user in a dialog; determining heuristic information for directing the dialog based on the data analysis request that is different from a result for the data analysis request; and providing the heuristic information to the user to enable the user to provide supplemental information associated with the data analysis request based on the heuristic information.
In some embodiments, the machine-executable instructions, when executed in a device, further cause the device to: extracting a content item from the data analysis request; determining whether at least one operation to be applied to the data set can be determined based on the identified content item; and in response to failing to determine the at least one operation based on the content item, generating a question for the data analysis request based on the identified content item, the question to motivate the user to provide clarification information about the content item.
In some embodiments, the machine-executable instructions, when executed in a device, further cause the device to: performing linguistic analysis on the data analysis request; detecting a context of the data analysis request; and attempting to determine the at least one operation based on the content item, the linguistic analysis result, the context, and a predefined data model.
In some embodiments, the machine-executable instructions, when executed in a device, further cause the device to: in response to being able to determine the at least one operation based on the identified content item, determining a code snippet to implement the at least one operation; and determining the heuristic information based on the code snippet.
In some embodiments, the machine-executable instructions, when executed in a device, further cause the device to: generating the heuristic information by expanding the data analysis request based on at least one of: a content item in the data analysis request, a result for the data analysis request, and a predetermined expansion rule.
In some embodiments, the machine-executable instructions, when executed in a device, further cause the device to: in response to receiving the supplemental information from the user, determining a data analysis result associated with the supplemental information; and providing the determined data analysis results to the user.
In some embodiments, the dialog is a first dialog in a user interface, and the machine-executable instructions, when executed in a device, further cause the device to: establishing a second dialog different from the first dialog in a dragging manner in response to receiving a further data analysis request from the user; and providing data analysis results for the further data analysis request to the user in the second dialog.
In some embodiments, the machine-executable instructions, when executed in a device, further cause the device to: storing at least one of the data analysis request and the supplemental information as a user profile to provide relevant data analysis results based on the user profile.
Although the disclosure has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (20)

1. A computer-implemented method, comprising:
receiving a data analysis request for a data set from a user in a dialog;
extracting a content item from the data analysis request;
comparing the extracted content items to a data model for the data set, the data model including a plurality of content items defined in the data model and one or more operations associated with each of the plurality of content items to be applied to the data set, the one or more operations based on at least one of: historical statistics, a profile or preference of the user, or a record of access by a plurality of users;
based on the comparison, determining that the extracted content item and associated operations to be applied to the data set are undefined in the data model;
in response to the determination, generating heuristic information for directing the dialog to prompt the user to provide clarification information associated with the extracted content item and different from a result of the request for data analysis; and
providing the heuristic information to the user to enable the user to provide supplemental information associated with the data analysis request based on the heuristic information.
2. The method of claim 1, wherein generating the heuristic information for directing the dialog to prompt the user to provide the clarity information associated with the extracted content item comprises:
generating a question for the data analysis request based on the extracted content item.
3. The method of claim 1, further comprising:
performing linguistic analysis on the data analysis request;
detecting a context of the data analysis request; and
generating the heuristic information in response to determining that the associated operation to be applied to the dataset cannot be determined based on the linguistic analysis results, the context, and the associated operation that is not defined in the data model.
4. The method of claim 1, further comprising:
extracting at least one additional content item from the data analysis request;
determining that the at least one additional content item is one of the plurality of content items and has at least one associated operation defined in the data model;
determining a code segment for implementing the at least one associated operation defined in the data model; and
determining the heuristic information based on the code snippet.
5. The method of claim 1, wherein the heuristic information is generated based on at least one of:
the extracted content item in the data analysis request, an
Predefined rules for extending the data analysis request.
6. The method of claim 1, further comprising:
receiving the supplemental information from the user;
in response to receiving the supplemental information from the user, determining a data analysis result associated with the supplemental information; and
providing the determined data analysis results to the user.
7. The method of claim 6, wherein the determined data analysis results are provided to the user as one of:
a graph;
a table;
a text;
audio frequency; and
and (6) video.
8. The method of claim 6, the supplemental information received from the user including at least one content item from the plurality of content items in the data model, and determining the data analysis results associated with the supplemental information includes applying corresponding one or more operations defined for the at least one content item in the data model to the data set.
9. The method of claim 1, further comprising:
receiving the supplemental information from the user; and
storing at least one of the data analysis request and the supplemental information as a user profile to provide relevant data analysis results based on the user profile.
10. The method of claim 1, wherein generating the heuristic information for directing the dialog to prompt the user to provide the clarity information associated with the extracted content item further comprises:
providing a list of options selectable by the user.
11. The method of claim 1, wherein the data set includes a table having a plurality of rows representing data records and a plurality of columns representing data dimensions of the data records, and the plurality of content items in the data model includes at least the data dimensions of the data records.
12. A computing device, comprising:
a processing unit; and
a memory coupled to the processing unit and storing instructions that, when executed by the processing unit, cause the computing device to perform a set of operations comprising:
receiving a data analysis request for a data set from a user in a dialog;
extracting a content item from the data analysis request;
comparing the extracted content items to a data model for the data set
In one embodiment, the data model includes a plurality of content items defined in the data model and one or more operations associated with each of the plurality of content items to be applied to the data set, the one or more operations based on at least one of: historical statistics, the user's profile or preferences, or access records of multiple users
Based on the comparison, determining that the extracted content item and associated operation to be applied to the data set are undefined in the data model;
in response to the determination, generating a dialog for directing the dialog to prompt the user to provide clarification information associated with the extracted content item and different than for the extracted content item
Heuristic information of the result of the data analysis request; and
providing the heuristic information to the user to enable the user to be based on the heuristic information
Heuristic information provides supplemental information associated with the data analysis request.
13. The computing device of claim 12, wherein generating the heuristic information for directing the dialog to prompt the user to provide the clarity information associated with the extracted content item comprises:
generating a question for the data analysis request based on the content item.
14. The computing device of claim 12, wherein the heuristic information is generated based on at least one of:
the extracted content item in the data analysis request, an
Predefined rules for extending the data analysis request.
15. The computing device of claim 12, the set of operations further comprising:
receiving the supplemental information from the user;
in response to receiving the supplemental information from the user, determining a data analysis result associated with the supplemental information; and
providing the determined data analysis results to the user.
16. The computing device of claim 12, wherein the conversation is a first conversation in a user interface, and the set of operations further comprises:
in response to receiving a second data analysis request from the user, establishing a second conversation different from the first conversation in a drag-and-drop manner; and
providing data analysis results for the second data analysis request to the user in the second dialog.
17. The computing device of claim 12, the set of operations further comprising:
receiving the supplemental information from the user; and
storing at least one of the data analysis request and the supplemental information as a user profile to provide relevant data analysis results based on the user profile.
18. The computing device of claim 12, further comprising:
performing linguistic analysis on the data analysis request;
detecting a context of the data analysis request; and
generating the heuristic information in response to determining that the associated operation to be applied to the dataset cannot be determined based on the linguistic analysis results, the context, and the associated operation that is not defined in the data model.
19. The computing device of claim 12, the set of operations further comprising:
extracting at least one additional content item from the data analysis request;
determining that the at least one additional content item is one of the plurality of content items and has at least one associated operation defined in the data model;
determining a code segment for implementing the at least one associated operation defined in the data model; and
determining the heuristic information based on the code snippet.
20. A non-transitory machine-readable medium storing machine-executable instructions that, when executed in a device, cause the device to:
receiving a data analysis request for a data set from a user in a dialog;
extracting a content item from the data analysis request;
comparing the extracted content items to a data model for the data set, the data model including a plurality of content items defined in the data model and one or more operations associated with each of the plurality of content items to be applied to the data set, the one or more operations based on at least one of: historical statistics, a user's profile or preferences, or access records for multiple users;
based on the comparison, determining that the extracted content item and associated operations to be applied to the data set are undefined in the data model;
in response to the determination, generating heuristic information for directing the dialog to prompt the user to provide clarity information associated with the extracted content item and different from a result of the request for data analysis; and
providing the heuristic information to the user to enable the user to provide supplemental information associated with the data analysis request based on the heuristic information.
CN201610867019.5A 2016-09-29 2016-09-29 Conversational data analysis Active CN107885744B (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
CN202211627592.0A CN115858730A (en) 2016-09-29 2016-09-29 Conversational data analysis
CN201610867019.5A CN107885744B (en) 2016-09-29 2016-09-29 Conversational data analysis
US16/338,061 US11423229B2 (en) 2016-09-29 2017-09-22 Conversational data analysis
EP17780278.2A EP3519988A1 (en) 2016-09-29 2017-09-22 Conversational data analysis
PCT/US2017/052839 WO2018063924A1 (en) 2016-09-29 2017-09-22 Conversational data analysis
US17/813,435 US20220405479A1 (en) 2016-09-29 2022-07-19 Conversational data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610867019.5A CN107885744B (en) 2016-09-29 2016-09-29 Conversational data analysis

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202211627592.0A Division CN115858730A (en) 2016-09-29 2016-09-29 Conversational data analysis

Publications (2)

Publication Number Publication Date
CN107885744A CN107885744A (en) 2018-04-06
CN107885744B true CN107885744B (en) 2023-01-03

Family

ID=60020626

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202211627592.0A Pending CN115858730A (en) 2016-09-29 2016-09-29 Conversational data analysis
CN201610867019.5A Active CN107885744B (en) 2016-09-29 2016-09-29 Conversational data analysis

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202211627592.0A Pending CN115858730A (en) 2016-09-29 2016-09-29 Conversational data analysis

Country Status (4)

Country Link
US (2) US11423229B2 (en)
EP (1) EP3519988A1 (en)
CN (2) CN115858730A (en)
WO (1) WO2018063924A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10896297B1 (en) 2017-12-13 2021-01-19 Tableau Software, Inc. Identifying intent in visual analytical conversations
US11055489B2 (en) * 2018-10-08 2021-07-06 Tableau Software, Inc. Determining levels of detail for data visualizations using natural language constructs
US11966568B2 (en) 2018-10-22 2024-04-23 Tableau Software, Inc. Generating data visualizations according to an object model of selected data sources
US11314817B1 (en) 2019-04-01 2022-04-26 Tableau Software, LLC Methods and systems for inferring intent and utilizing context for natural language expressions to modify data visualizations in a data visualization interface
US11455339B1 (en) 2019-09-06 2022-09-27 Tableau Software, LLC Incremental updates to natural language expressions in a data visualization user interface
US10997217B1 (en) 2019-11-10 2021-05-04 Tableau Software, Inc. Systems and methods for visualizing object models of database tables
US11714807B2 (en) * 2019-12-24 2023-08-01 Sap Se Platform for conversation-based insight search in analytics systems

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101124578A (en) * 2005-01-14 2008-02-13 国际商业机器公司 Sharable multi-tenant reference data utility and repository, including value enhancement and on-demand data delivery and methods of operation
CN101364229A (en) * 2008-10-06 2009-02-11 中国移动通信集团设计院有限公司 Database host resource prediction method based on time capability analysis
CN103295148A (en) * 2012-02-27 2013-09-11 埃森哲环球服务有限公司 Digital consumer data model and customer analytic record
CN104077347A (en) * 2013-03-26 2014-10-01 国际商业机器公司 Method and a system for profiling social trendsetters on a communications network

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294229A1 (en) 1998-05-28 2007-12-20 Q-Phrase Llc Chat conversation methods traversing a provisional scaffold of meanings
US7798417B2 (en) * 2000-01-03 2010-09-21 Snyder David M Method for data interchange
WO2002073331A2 (en) 2001-02-20 2002-09-19 Semantic Edge Gmbh Natural language context-sensitive and knowledge-based interaction environment for dynamic and flexible product, service and information search and presentation applications
US8015143B2 (en) 2002-05-22 2011-09-06 Estes Timothy W Knowledge discovery agent system and method
US7783486B2 (en) 2002-11-22 2010-08-24 Roy Jonathan Rosser Response generator for mimicking human-computer natural language conversation
WO2007134402A1 (en) 2006-05-24 2007-11-29 Mor(F) Dynamics Pty Ltd Instant messaging system
US8788517B2 (en) * 2006-06-28 2014-07-22 Microsoft Corporation Intelligently guiding search based on user dialog
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8949377B2 (en) 2008-05-21 2015-02-03 The Delfin Project, Inc. Management system for a conversational system
US8375014B1 (en) * 2008-06-19 2013-02-12 BioFortis, Inc. Database query builder
US9292577B2 (en) * 2010-09-17 2016-03-22 International Business Machines Corporation User accessibility to data analytics
WO2012135226A1 (en) 2011-03-31 2012-10-04 Microsoft Corporation Augmented conversational understanding architecture
US9842168B2 (en) * 2011-03-31 2017-12-12 Microsoft Technology Licensing, Llc Task driven user intents
US20120253789A1 (en) * 2011-03-31 2012-10-04 Microsoft Corporation Conversational Dialog Learning and Correction
US20120260263A1 (en) 2011-04-11 2012-10-11 Analytics Intelligence Limited Method, system and program for data delivering using chatbot
US20120306741A1 (en) 2011-06-06 2012-12-06 Gupta Kalyan M System and Method for Enhancing Locative Response Abilities of Autonomous and Semi-Autonomous Agents
KR101402506B1 (en) 2011-12-01 2014-06-03 라인 가부시키가이샤 System and method for providing information interactively by instant messaging application
US9020824B1 (en) 2012-03-09 2015-04-28 Google Inc. Using natural language processing to generate dynamic content
WO2013155619A1 (en) 2012-04-20 2013-10-24 Sam Pasupalak Conversational agent
US9424233B2 (en) * 2012-07-20 2016-08-23 Veveo, Inc. Method of and system for inferring user intent in search input in a conversational interaction system
US9465833B2 (en) * 2012-07-31 2016-10-11 Veveo, Inc. Disambiguating user intent in conversational interaction system for large corpus information retrieval
US9269354B2 (en) 2013-03-11 2016-02-23 Nuance Communications, Inc. Semantic re-ranking of NLU results in conversational dialogue applications
US10572473B2 (en) 2013-10-09 2020-02-25 International Business Machines Corporation Optimized data visualization according to natural language query
US9189742B2 (en) * 2013-11-20 2015-11-17 Justin London Adaptive virtual intelligent agent
WO2015100362A1 (en) 2013-12-23 2015-07-02 24/7 Customer, Inc. Systems and methods for facilitating dialogue mining
US9335911B1 (en) * 2014-12-29 2016-05-10 Palantir Technologies Inc. Interactive user interface for dynamic data analysis exploration and query processing
US10558688B1 (en) * 2015-04-15 2020-02-11 Arimo, LLC Natural language interface for data analysis
EP3142028A3 (en) * 2015-09-11 2017-07-12 Google, Inc. Handling failures in processing natural language queries through user interactions
CN105512228B (en) * 2015-11-30 2018-12-25 北京光年无限科技有限公司 A kind of two-way question and answer data processing method and system based on intelligent robot
EP3267374A1 (en) * 2016-07-04 2018-01-10 Mu Sigma Business Solutions Pvt. Ltd. Guided analytics system and method
US9807037B1 (en) * 2016-07-08 2017-10-31 Asapp, Inc. Automatically suggesting completions of text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101124578A (en) * 2005-01-14 2008-02-13 国际商业机器公司 Sharable multi-tenant reference data utility and repository, including value enhancement and on-demand data delivery and methods of operation
CN101364229A (en) * 2008-10-06 2009-02-11 中国移动通信集团设计院有限公司 Database host resource prediction method based on time capability analysis
CN103295148A (en) * 2012-02-27 2013-09-11 埃森哲环球服务有限公司 Digital consumer data model and customer analytic record
CN104077347A (en) * 2013-03-26 2014-10-01 国际商业机器公司 Method and a system for profiling social trendsetters on a communications network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
tools for analyzing qualitative data:the history and relevance of qualitative data analysis of software;Linda S等;《handbook of research on educational communications and technology》;20130101;221-236 *
商业银行小微业务应用大数据技术研究;熊福平;《商业银行经营管理》;20160910;53-55 *

Also Published As

Publication number Publication date
WO2018063924A1 (en) 2018-04-05
CN107885744A (en) 2018-04-06
CN115858730A (en) 2023-03-28
EP3519988A1 (en) 2019-08-07
US11423229B2 (en) 2022-08-23
US20190236144A1 (en) 2019-08-01
US20220405479A1 (en) 2022-12-22

Similar Documents

Publication Publication Date Title
CN107885744B (en) Conversational data analysis
Yu et al. FlowSense: A natural language interface for visual data exploration within a dataflow system
CN110888990B (en) Text recommendation method, device, equipment and medium
US10733197B2 (en) Method and apparatus for providing information based on artificial intelligence
US10061766B2 (en) Systems and methods for domain-specific machine-interpretation of input data
US20160140221A1 (en) Display apparatus and method for summarizing of document
US9817821B2 (en) Translation and dictionary selection by context
US20140298199A1 (en) User Collaboration for Answer Generation in Question and Answer System
US20150081277A1 (en) System and Method for Automatically Classifying Text using Discourse Analysis
US20170308571A1 (en) Techniques for utilizing a natural language interface to perform data analysis and retrieval
CN109948121A (en) Article similarity method for digging, system, equipment and storage medium
US10089366B2 (en) Topical analytics for online articles
US11269942B2 (en) Automatic keyphrase extraction from text using the cross-entropy method
US20160171063A1 (en) Modeling actions, consequences and goal achievement from social media and other digital traces
US10073828B2 (en) Updating language databases using crowd-sourced input
US11481733B2 (en) Automated interfaces with interactive keywords between employment postings and candidate profiles
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN113988057A (en) Title generation method, device, equipment and medium based on concept extraction
CN113010678A (en) Training method of classification model, text classification method and device
WO2019085118A1 (en) Topic model-based associated word analysis method, and electronic apparatus and storage medium
US20220092452A1 (en) Automated machine learning tool for explaining the effects of complex text on predictive results
US20180336242A1 (en) Apparatus and method for generating a multiple-event pattern query
EP2800014A1 (en) Method for searching curriculum vitae's on a job portal website, server and computer program product therefore
Karmaker et al. Performance analysis of frequency and graph theoretic based text summarization
Radu et al. Project initiation and project management approach-an expensive connection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant