CN110837545A - Interactive data analysis method, device, medium and electronic equipment - Google Patents

Interactive data analysis method, device, medium and electronic equipment Download PDF

Info

Publication number
CN110837545A
CN110837545A CN201911106220.1A CN201911106220A CN110837545A CN 110837545 A CN110837545 A CN 110837545A CN 201911106220 A CN201911106220 A CN 201911106220A CN 110837545 A CN110837545 A CN 110837545A
Authority
CN
China
Prior art keywords
data
user
target
data analysis
query statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911106220.1A
Other languages
Chinese (zh)
Inventor
杜鑫惠
赖昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Xinkai Life Technology Co Ltd
Guizhou Medical Duyun Technology Co Ltd
Original Assignee
Tianjin Xinkai Life Technology Co Ltd
Guizhou Medical Duyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Xinkai Life Technology Co Ltd, Guizhou Medical Duyun Technology Co Ltd filed Critical Tianjin Xinkai Life Technology Co Ltd
Priority to CN201911106220.1A priority Critical patent/CN110837545A/en
Publication of CN110837545A publication Critical patent/CN110837545A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Abstract

The embodiment of the disclosure provides an interactive data analysis method, an interactive data analysis device, an interactive data analysis medium and electronic equipment, and relates to the technical field of natural language processing. The interactive data analysis method comprises the following steps: receiving a query statement in natural language form of a user about a target data combination; extracting a keyword sequence from the query sentence according to a preset keyword extraction sequence; determining the configuration data of the data cube corresponding to the keyword sequence according to a preset incidence relation; and generating a target diagram according to the configuration data of the data cube so that a user can analyze the target data combination according to the target diagram. The technical scheme of the embodiment of the disclosure reduces the operation difficulty of the data analysis specification and is beneficial to improving the data analysis efficiency.

Description

Interactive data analysis method, device, medium and electronic equipment
Technical Field
The present disclosure relates to the field of natural language processing technologies, and in particular, to an interactive data analysis method, an interactive data analysis device, an interactive data analysis medium, and an electronic device.
Background
Natural Language Processing (NLP) is a method for extracting and applying valuable information in Natural Language texts, paragraphs, etc. that cannot be directly processed by a computer by technical means. Common natural language processing applications include word segmentation for sentences, key information induction, similar sentence search, language emotion analysis, and the like.
In the Online Analytical Processing (OLAP), a data analyst analyzes and processes data from multiple perspectives of the data by means of human-computer interaction, which is also generally referred to as multidimensional data analysis. Particularly, by providing operations of data scrolling, data drilling, slicing and the like, a data analyst can flexibly and real-timely analyze large-scale data so as to provide reference and support for decision. The method is mainly used for scenes such as decision support systems, business intelligence and data warehouses.
However, the interactive data analysis scheme provided by the related art has high operational difficulty, and the data analysis efficiency needs to be improved.
It should be noted that the information disclosed in the above related art section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide an interactive data analysis method, apparatus, system, medium, and electronic device, so as to reduce the operation difficulty at least to a certain extent and improve the data analysis efficiency.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the embodiments of the present disclosure, there is provided an interactive data analysis method, including:
receiving a query statement in natural language form of a user about a target data combination;
extracting a keyword sequence from the query sentence according to a preset keyword extraction sequence;
determining the configuration data of the data cube corresponding to the keyword sequence according to a preset incidence relation;
and generating a target diagram according to the configuration data of the data cube so that a user can analyze the target data combination according to the target diagram.
In some embodiments of the present disclosure, based on the foregoing solution, the receiving the query statement through a data receiving control, where the receiving the query statement in natural language form of the user about the target data combination includes:
and responding to the triggering of the data interface control, and displaying prompt words to the user in real time so that the user determines the query statement based on the prompt words.
In some embodiments of the present disclosure, based on the foregoing solution, the receiving the query statement through a data receiving control, where the receiving the query statement in natural language form of the user about the target data combination includes:
and responding to the triggering of the data interface control, displaying a plurality of prompt words to a user in real time according to the keyword extraction sequence, so that the user determines the query sentence based on the plurality of prompt words.
In some embodiments of the present disclosure, based on the foregoing scheme, after displaying a plurality of cue words to the user in real time according to the keyword extraction order, the method further includes:
and in response to the data interface control receiving the words to be completed, the similarity of which with the fixed collocation word group is greater than a preset threshold value, completing the words to be completed according to the fixed collocation word group.
In some embodiments of the present disclosure, based on the foregoing scheme, the keyword extraction order sequentially includes: the method comprises the steps of displaying type keywords, filtering keywords, index keywords, connection keywords and dimension keywords.
In some embodiments of the present disclosure, based on the foregoing scheme, the displaying keywords include: at least one of statistics, display and exhibition, wherein the index key words are used for data display of a row axis of the target diagram, and the dimension key words are used for data classification of a column axis of the target diagram.
In some embodiments of the present disclosure, based on the foregoing scheme, after receiving a query statement in natural language form of a user about a target data combination, the method further includes:
and checking the legality of the received query statement according to the keyword extraction sequence.
According to a second aspect of the embodiments of the present disclosure, there is provided an interactive data analysis apparatus, including:
a receiving module, the receiving module configured to: receiving a query statement in natural language form of a user about a target data combination;
an extraction module, the extraction module configured to: extracting a keyword sequence from the query sentence according to a preset keyword extraction sequence;
a determination module, the determination module configured to: determining the configuration data of the data cube corresponding to the keyword sequence according to a preset incidence relation;
a generating module, the generating module configured to: and generating a target diagram according to the configuration data of the data cube so that a user can analyze the target data combination according to the target diagram.
According to a third aspect of embodiments of the present disclosure, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the interactive data analysis method as described in the first aspect of the embodiments above.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the interactive data analysis method as described in the first aspect of the embodiments above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in some embodiments of the present disclosure, query statements in natural language form about a target data combination input by a user are received, and the query statements may be a sentence, for example: "show the last week's classification by visit type". Then, extracting a keyword sequence in the query sentence according to a preset keyword extraction order, for example: the method comprises the steps of [ display, last week, number of people, press, type of treatment and classification ], further determining configuration data of a data cube corresponding to the keyword sequence according to a preset incidence relation, generating a chart according to the configuration data of the data cube, and further enabling a user to perform data analysis on a target data combination according to the visualized chart. In the technical scheme, when a user analyzes data, the user does not need to understand and operate the obscure configuration data, and can realize the analysis of the related data directly through the query statement in the natural language form. Therefore, the operation difficulty of the data analysis specification is reduced, and the data analysis efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
FIG. 1 illustrates a system architecture diagram for implementing an interactive data analysis method in an exemplary embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram showing an interface for use of an OLAP product in the related art;
FIG. 3 shows a flow diagram of an interactive data analysis method according to an embodiment of the present disclosure;
FIG. 4 illustrates a schematic structural diagram of a usage interface of an OLAP product according to an embodiment of the present disclosure;
FIG. 5 shows a schematic structural diagram of a usage interface of an OLAP product according to another embodiment of the present disclosure;
FIG. 6 shows a schematic structural diagram of an interactive data analysis apparatus according to an embodiment of the present disclosure;
FIG. 7 shows a schematic diagram of a structure of a computer storage medium in an exemplary embodiment of the disclosure; and the number of the first and second groups,
fig. 8 shows a schematic structural diagram of an electronic device in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
In the present exemplary embodiment, the device to be tested refers to an electronic device having various operating systems such as IOS, Android, and firefox, for example, a server and a mobile phone.
The present exemplary embodiment first provides a system architecture for implementing an interactive data analysis method, which can be applied to various recognition scenarios, such as image recognition, behavior recognition, and the like. Referring to fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send request instructions or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a photo processing application, a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for shopping-like websites browsed by users using the terminal devices 101, 102, 103. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
With the widespread use of big data technologies, there is an increasing demand for data analysis. Among them, OLAP analysis is an important data analysis type, because it does not need to preset a data analysis template, data analysis can be performed immediately as required, and the use is flexible, fast, and very wide application is obtained.
Referring to the usage interface 200 of the OLAP product in the related art shown in fig. 2, in the usage process of the user, the user needs to define the filter setting area 21, the dimension setting area 22, and the index setting area 23, and then the OLAP obtains the well-defined configuration information of the dimension, the index, the filter, and the like to be selected. The OLAP engine then computes and returns the chart based on the configuration information for analysis by the user.
However, the related art requires a high level of expertise for the OLAP user, and the user generally needs to be trained to be proficient. For products with OLAP technology as the core, if the target user is not a data analyst or is engaged in data analysis work, there is no concept for the terms and operations involved in OLAP, and the first use is difficult to directly use, and may be more proficient to use after a long time.
Based on the system architecture 100 in fig. 1, an interactive data analysis method is provided in this example, which at least to some extent solves the above problems in the related art. Referring to fig. 3, the interactive data analysis method may include the steps of:
step S310, receiving a query statement of a user about a target data combination in a natural language form;
step S320, extracting a keyword sequence in the query sentence according to a preset keyword extraction order;
step S330, determining the configuration data of the data cube corresponding to the keyword sequence according to the preset incidence relation; and the number of the first and second groups,
step S340, generating a target diagram according to the configuration data of the data cube, so that a user can analyze the target data combination according to the target diagram.
In the technical solution of the embodiment shown in fig. 3, when a user performs data analysis, it is not necessary to understand and operate obscure configuration data, but the user directly interacts with an OLAP product through a query statement in a natural language form, so that the analysis of related data can be conveniently implemented. The technical scheme provided by the embodiment reduces the operation difficulty of the data analysis specification and is beneficial to improving the data analysis efficiency.
Implementation details of the various steps shown in FIG. 3 are set forth below:
in an exemplary embodiment, referring to fig. 4, a usage interface 400 of an OLAP product according to an embodiment of the present disclosure is shown, wherein the usage interface of the OLAP product has two main elements: a query sentence input box 41 and a chart presentation area 42.
The query sentence input box 41 is used to obtain a query text filled or entered by a user through a natural language, for example: "count present visit times and total cost" or "display department name by person and gender classification" etc. The system of the OLAP product analyzes the natural language text input by the user to obtain the data Cube of the relevant configuration information, and further outputs a corresponding trend chart or digital card in the chart display area 42 based on the data Cube.
Specifically, in the OLAP product, the related Cube is defined for different configurations (including dimensions, indexes, filters, etc.), and the query conditions supported by different cubes are different from each other. Since this data is typically multi-dimensional data, it is referred to visually as a data Cube or Cube. Wherein, dimension is the term of OLAP, which is used to classify data on the horizontal axis (or column axis) of the chart; the index is also a term of OLAP for displaying the value of data on the vertical axis (or row axis) of the chart; and, the filter is used as a term of OLAP for filtering data participating in OLAP analysis, and only data satisfying a filter condition participates in calculation.
In an exemplary embodiment, the present solution interacts with the OLAP product by way of natural language processing (including semantic analysis). The interaction mode includes but is not limited to the derivation interaction mode and the use of mutual substitution mode. Since the generalization ability of language analysis is strong, it can be applied to various language expression modes. For example, the query sentence may be input by a speech method or may be input by a text method.
In step S310, the OLAP product receives a query statement in natural language form from a user regarding a target data combination. For example, the OLAP product is provided in a device capable of receiving voice, and the user may send a language "display last week's classification by visit type" to the device so that the OLAP product receives a query statement in natural language form of the user about the target data combination. Furthermore, after receiving the query voice of the user, the OLAP product extracts the keywords by converting the voice into words and then performing natural language processing. For another example, the user may also directly input the query statement in the relevant input box, and the OLAP disassembles and understands the query statement to perform the next chart display in a natural language parsing manner.
In the above embodiment, there are a plurality of expression ways for the query statement regarding the same target data combination. Thus, there may be multiple possibilities for different user-provided query statements for the same target data combination. In order to further increase the analysis speed of the query statement, the embodiment also provides a prompt about the query statement, which can play a role in normalizing the expression form of the query statement, and can further reduce the use threshold of the user on the OLAP product. That is to say, in order to facilitate the use of the user and reduce the difficulty of natural language parsing, the technical scheme supports the instant input prompt function. Specifically, the method comprises the following steps:
in an exemplary embodiment, the OLAP product receives the query statement through a data receiving control (such as the query statement input box 41 described above), wherein a specific implementation manner of step S310 may be: and in response to the data interface control being triggered, presenting prompt words to the user in real time so that the user can determine related query sentences based on the prompt words.
Referring to fig. 3, in step S320, a keyword sequence is extracted in the query sentence according to a preset keyword extraction order.
According to the technical scheme, the keyword sequence is obtained by setting the keyword extraction sequence, so that the resolution accuracy of the query sentence is improved, and the display accuracy of the target icon is improved.
In an exemplary embodiment, in response to the data interface control being triggered, a plurality of cue words may also be presented to the user in real time according to the keyword extraction order, so that the user determines the query statement based on the plurality of cue words in the specific implementation manner of step S310. The query statement is beneficial to extracting the keyword sequence according to the keyword extraction order in step S320, so as to improve the extraction efficiency of the keyword sequence.
In an exemplary embodiment, the keyword extraction order sequentially includes: the method comprises the steps of displaying type keywords, filtering keywords, index keywords, connection keywords and dimension keywords. Specifically, the display keywords include: at least one of statistics, display and exhibition, wherein the index key words are used for data display of a row axis of the target diagram, and the dimension key words are used for data classification of a column axis of the target diagram.
For example, according to the keyword extraction order, the prompt input prompting scheme in the specific implementation of step S310 may be: and automatically prompting the user of the characters which the user possibly wants to input next according to the current input condition of the user. For example, when the user does not input any text, the prompts "statistics" and "display" may allow the user to directly select whether the user intends to perform statistics or display. If the user selects "statistics," the final result will present the search results in digital cards (total or percentage). If the user selects "show," then the data progression changes over a period of time may be presented in the form of a trend chart.
The following embodiments provide some common scenarios of hinting methods:
1) the scene that the target chart is displayed in the form of the digital card specifically comprises a prompt mode and a keyword sequence extraction mode. Specifically, the method comprises the following steps:
if the user is "statistical" in the presentation type keyword selected according to the prompt, a drop-down box can be further used for prompting the filtering type keyword, such as this year, last month, today, yesterday, etc. After the user selects the filtering keywords related to the time, the prompt box continues to prompt the selectable contents, including index (horizontal axis) keywords such as the times of people, the medical insurance cost, the total cost and the like. And the words of 'and' are used as language connection modes for selecting a plurality of indexes.
Referring to FIG. 4, at this point the query statement composition may be: "make statistics of the medical insurance cost and the number of people in this month", the user can finish the search by returning, and the chart display area 42 of the user interface 400 of the OLAP product displays a corresponding statistical number card, such as: the medical insurance cost is 2057 thousands, and the number of people is 115605. In addition, the OLAP product may also prompt the user to match the search phrase commonly used after "counting" the keywords, such as: the number of outpatients or the number of emergency visits in the month is convenient for the user to determine the query sentence. The graph is in two dimensions: and displaying the content corresponding to the query statement by the medical insurance cost and the personal number in a statistical sum mode.
2) In the scenario that the query result is a graph, the query result specifically includes a prompt mode and a keyword sequence extraction mode. Specifically, the method comprises the following steps:
if the user selects "show", further, the prompt words for prompting the drop-down box can be divided into four categories: filtering, dimension, index and common search term collocation. Since the natural language follows the word order, the extraction order of the keywords may be: prompt words (display or statistics) are selected first, then filter conditions are prompted, typically time (this year, last month, yesterday, etc.), and then indicator options such as number of people, number of visits, medical insurance costs, etc. are prompted. Finally, the word "press" is prompted as a connective word to accept dimension prompting options, such as gender (male or female) or type of visit (outpatient or emergency), and at the end, the word "classify" or "distribute" is prompted to end the query statement.
Referring to fig. 5, the query statement input box 51 in the OLAP user interface 500 includes: the department names are displayed distributed according to the number of people and the total cost. The graph display area 52 displays the relevant graph according to the query statement, including: department a's person and total cost distributions, department B's person and total cost distributions, and department C's person and total cost distributions. Specifically, the graph is plotted by one dimension (horizontal axis): "department name" for two indices (left and right vertical axis): and displaying the content corresponding to the query statement by the number of people and the total cost.
3) And eliminating the scenes that the query sentences contain unreasonable word sequences. Specifically, the method comprises the following steps:
in order to standardize the performability of the query statement, under the condition that the grammar of the query statement is unreasonable, the search box can also prompt warning statements such as 'invalid input' and the like, and a user is reminded to check whether the current query statement is reasonable or not.
Illustratively, dimension keywords are not distributable according to index keywords, such as: the distribution of the sexes according to the number of people is displayed, and after the sentence is input with 'press', whether index (number of people) words exist in the previous sentence or not can be checked, and whether the index (number of people) words exist in the previous sentence is judged. The syntax logic can be preset as: "index before, dimension after". If the received query statement does not accord with the grammar logic, prompting words such as 'input invalid and no index word' and the like are prompted so as to warn the user to input the query statement correctly according to the grammar logic.
4) Completing the scene of the phrase based on the cue words, specifically:
responding to the data interface control to receive the words to be perfected, wherein the similarity between the words to be perfected and the fixed collocation phrases is greater than a preset threshold value, and completing the words to be perfected according to the fixed collocation phrases; wherein, the fixed collocation phrases are preset.
Specifically, through the phrase prompting function of the prompt box, for example, some dimension words are long, for example, one of the registered types is a reserved expert number, and after the user selects "display", the user inputs "reserved two words" again, the prompt box prompts "the expert number, the auxiliary professor number, and the common number" to quickly complement the long words searched by the user.
In an exemplary embodiment, after receiving a query statement in a natural language form of a user with respect to a target data combination in step S310, the present technical solution also checks the legitimacy of the received query statement according to the above keyword extraction order.
Illustratively, after a user completes the input of a query statement step by step according to a cue word, the OLAP background system matches the query statement according to a prefabricated template to extract dimensions, indexes and filter words, so as to use the key words to display a target chart. For example, after the user inputs "display the number of outpatients and distribution of outpatient cost in each department of this year", the keyword sequence is extracted through step S320: "show [ filtration: index distribution of time dimension. Furthermore, the filtering condition of the year, the dimension of the department, the number of outpatients as an index and the outpatient cost as an index can be easily obtained through a regular expression or other modes. By comparing elements on Cube, whether the query is a legal query (that is, whether the specified dimension, index and filtering condition are all on Cube) is easily confirmed, and if the query is a legal query, the query is directly converted into a query statement which can be identified by OLAP. And if the query is illegal, reminding the user to input the query statement again.
In an exemplary embodiment, with continued reference to fig. 3, for the query statement that passes the above validity judgment, step S330 is performed: determining configuration data of a data cube corresponding to the keyword sequence according to a preset incidence relation; and, executing step S340: and generating a target diagram according to the configuration data of the data cube so that a user can analyze the target data combination according to the target diagram.
Illustratively, the back office of the OLAP product generates a chart from the configuration data Cube of the data Cube. In the technical scheme, the query statement of the natural language is acquired. In order to achieve the display of the graph through the query statement, in the technical scheme, an association relationship between a keyword sequence/keyword in the query statement and the configuration data Cube of the OLAP is preset. Accordingly, after the keyword sequence corresponding to the query sentence is extracted in step S320, the arrangement data Cube of the corresponding data Cube is determined based on the above-described association relationship. Further, in step S340, a target chart (refer to fig. 4 or 5) may be generated based on the configuration data of the data cube, and the user may analyze the target data combination based on the target chart.
In an exemplary embodiment, the technical scheme can support a plurality of different types of result display, and can automatically select the most appropriate display mode according to the query intention of a user. Or determining the presentation form of the target chart according to the selection of the presentation form by the user. For example, if the user only wants to search one or a few specific numbers, for example, "number of outpatients this year and how much the outpatient fee is", then in the result display, only simple and direct results of "number of outpatients this year is xxxxx person" and "number of outpatients this year is xxxxxx element" are displayed, and the form is mainly digital card. But if the user wants to inquire about a data distribution or data trend change, the data distribution or data trend change is directly displayed as a chart. For example, "distribution of outpatient cost among departments in the present hospital", "monthly trend of outpatient cost in the present hospital", and the like, the results are displayed as a bar graph with departments on the horizontal axis and outpatient cost on the vertical axis, and a line graph with monthly time on the horizontal axis and outpatient cost on the vertical axis (generally represented by line graphs in the trend graph).
Through diversified result display, the data analysis requirement of the user can be favorably met, and therefore the use experience of the OLAP of the user is improved.
The following describes embodiments of the apparatus of the present disclosure that may be used to perform the above-described interactive data analysis methods of the present disclosure.
Fig. 6 shows a schematic structural diagram of an interactive data analysis apparatus according to an embodiment of the present disclosure, and referring to fig. 6, the interactive data analysis apparatus 600 includes: a receiving module 601, an extracting module 602, a determining module 603, and a generating module 604.
The receiving module 601 is configured to: receiving a query statement in natural language form of a user about a target data combination;
the extracting module 602 is configured to: extracting a keyword sequence from the query sentence according to a preset keyword extraction sequence;
the determining module 603 is configured to: determining the configuration data of the data cube corresponding to the keyword sequence according to a preset incidence relation;
the generating module 604 is configured to: and generating a target diagram according to the configuration data of the data cube so that a user can analyze the target data combination according to the target diagram.
In some embodiments of the present disclosure, based on the foregoing scheme, the query statement is received through a data receiving control, where the receiving module 601 is specifically configured to: and responding to the triggering of the data interface control, and displaying prompt words to the user in real time so that the user determines the query statement based on the prompt words.
In some embodiments of the present disclosure, based on the foregoing scheme, the query statement is received through a data receiving control, where the receiving module 601 is specifically configured to: and responding to the triggering of the data interface control, displaying a plurality of prompt words to a user in real time according to the keyword extraction sequence, so that the user determines the query sentence based on the plurality of prompt words.
In some embodiments of the present disclosure, based on the foregoing solution, the interactive data analysis apparatus 600 includes: and (5) completing the module.
Wherein, above-mentioned perfect module is used for: after a plurality of prompt words are displayed to a user in real time according to the keyword extraction sequence, responding to the fact that the data interface control receives the words to be completed, the similarity of which with fixed collocation phrases is greater than a preset threshold value, and completing the words to be completed according to the fixed collocation phrases; wherein, the fixed collocation phrases are preset.
In some embodiments of the present disclosure, based on the foregoing scheme, the keyword extraction order sequentially includes: the method comprises the steps of displaying type keywords, filtering keywords, index keywords, connection keywords and dimension keywords.
In some embodiments of the present disclosure, based on the foregoing scheme, the displaying keywords include: at least one of statistics, display and exhibition, wherein the index key words are used for data display of a row axis of the target diagram, and the dimension key words are used for data classification of a column axis of the target diagram.
In some embodiments of the present disclosure, based on the foregoing solution, the interactive data analysis apparatus 600 includes: and a validity checking module.
Wherein, the validity checking module is used for: after receiving a query sentence in a natural language form of a user with respect to a target data combination, the legitimacy of the received query sentence is checked according to the above keyword extraction order.
For details that are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the interactive data analysis method of the present disclosure for the details that are not disclosed in the embodiments of the apparatus of the present disclosure.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer storage medium capable of implementing the above method. On which a program product capable of implementing the above-described method of the present specification is stored. In some possible embodiments, various aspects of the present disclosure may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above of this specification when the program product is run on the terminal device.
Referring to fig. 7, a program product 700 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product described above may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 800 according to this embodiment of the disclosure is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.
Wherein the storage unit stores program codes, and the program codes can be executed by the processing unit 810, so that the processing unit 810 executes the steps according to various exemplary embodiments of the present disclosure described in the "exemplary method" section above in this specification. For example, the processing unit 810 may perform the following as shown in fig. 3: step S310, receiving a query statement of a user about a target data combination in a natural language form; step S320, extracting a keyword sequence in the query sentence according to a preset keyword extraction order; step S330, determining the configuration data of the data cube corresponding to the keyword sequence according to the preset incidence relation; and step S340, generating a target diagram according to the configuration data of the data cube, so that a user can analyze the target data combination according to the target diagram.
The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.
The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. An interactive data analysis method, comprising:
receiving a query statement in natural language form of a user about a target data combination;
extracting a keyword sequence from the query sentence according to a preset keyword extraction order;
determining configuration data of a data cube corresponding to the keyword sequence according to a preset incidence relation;
and generating a target diagram according to the configuration data of the data cube so that a user can analyze the target data combination according to the target diagram.
2. The interactive data analysis method of claim 1, wherein receiving the query statement through a data receiving control, wherein receiving the query statement in natural language form of the user with respect to the target data combination comprises:
and in response to the data interface control being triggered, presenting prompt words to a user in real time so that the user determines the query statement based on the prompt words.
3. The interactive data analysis method of claim 1, wherein receiving the query statement through a data receiving control, wherein receiving the query statement in natural language form of the user with respect to the target data combination comprises:
and in response to the data interface control being triggered, displaying a plurality of prompt words to a user in real time according to the keyword extraction order, so that the user determines the query statement based on the plurality of prompt words.
4. The interactive data analysis method of claim 3, wherein after presenting a plurality of cue words to a user in real-time in the keyword extraction order, the method further comprises:
responding to the data interface control to receive the words to be perfected, wherein the similarity between the words to be perfected and the fixed collocation phrases is greater than a preset threshold value, and completing the words to be perfected according to the fixed collocation phrases; wherein, the fixed collocation phrases are preset.
5. The interactive data analysis method of any of claims 1 to 4, wherein the keyword extraction order comprises in order: the method comprises the steps of displaying type keywords, filtering keywords, index keywords, connection keywords and dimension keywords.
6. The interactive data analysis method of claim 5, wherein the presenting keywords comprises: at least one of statistics, display and presentation, the index key words are used for data display of a row axis of the target chart, and the dimension key words are used for data classification of a column axis of the target chart.
7. The interactive data analysis method of claim 1, wherein after receiving a query statement in natural language form by a user regarding a target data combination, the method further comprises:
and checking the legality of the received query statement according to the keyword extraction sequence.
8. An interactive data analysis device, comprising:
a receiving module to: receiving a query statement in natural language form of a user about a target data combination;
an extraction module to: extracting a keyword sequence from the query sentence according to a preset keyword extraction order;
a determination module to: determining configuration data of a data cube corresponding to the keyword sequence according to a preset incidence relation;
a generation module to: and generating a target diagram according to the configuration data of the data cube so that a user can analyze the target data combination according to the target diagram.
9. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the interactive data analysis method according to any one of claims 1 to 7.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the interactive data analysis method of any one of claims 1 to 7.
CN201911106220.1A 2019-11-13 2019-11-13 Interactive data analysis method, device, medium and electronic equipment Pending CN110837545A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911106220.1A CN110837545A (en) 2019-11-13 2019-11-13 Interactive data analysis method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911106220.1A CN110837545A (en) 2019-11-13 2019-11-13 Interactive data analysis method, device, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN110837545A true CN110837545A (en) 2020-02-25

Family

ID=69574956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911106220.1A Pending CN110837545A (en) 2019-11-13 2019-11-13 Interactive data analysis method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110837545A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460102A (en) * 2020-03-31 2020-07-28 成都数之联科技有限公司 Chart recommendation system and method based on natural language processing
CN113434568A (en) * 2021-06-01 2021-09-24 深圳市酷开网络科技股份有限公司 Multi-source data processing method and device, intelligent terminal and storage medium
CN114443692A (en) * 2022-02-15 2022-05-06 支付宝(杭州)信息技术有限公司 Data query method and device
CN116842240A (en) * 2023-08-30 2023-10-03 山东海博科技信息系统股份有限公司 Data management and control system based on full-link management and control

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789487A (en) * 2012-06-29 2012-11-21 用友软件股份有限公司 Data query and retrieval processing device and data query and retrieval processing method
US20130166498A1 (en) * 2011-12-25 2013-06-27 Microsoft Corporation Model Based OLAP Cube Framework
CN104361118A (en) * 2014-12-01 2015-02-18 中国人民大学 Mixed OLAP (on-line analytical processing) inquiring treating method adapting coprocessor
CN105205085A (en) * 2014-06-30 2015-12-30 中兴通讯股份有限公司 Multi-dimensional analysis method and device for mass data
CN106933845A (en) * 2015-12-30 2017-07-07 阿里巴巴集团控股有限公司 The method and apparatus that MDX inquires about effect are realized using SQL
CN108710652A (en) * 2018-05-09 2018-10-26 长城计算机软件与系统有限公司 A kind of data analysing method and system, storage medium based on statistics
CN108763240A (en) * 2018-03-22 2018-11-06 五八有限公司 Data query method, apparatus, equipment and storage medium based on OLAP
CN109710742A (en) * 2018-12-27 2019-05-03 清华大学 A kind of method, system and the equipment of the natural language querying processing of personal share bulletin
CN110222194A (en) * 2019-05-21 2019-09-10 深圳壹账通智能科技有限公司 Data drawing list generation method and relevant apparatus based on natural language processing
CN110287213A (en) * 2019-07-03 2019-09-27 中通智新(武汉)技术研发有限公司 Data query method, apparatus and system based on OLAP system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130166498A1 (en) * 2011-12-25 2013-06-27 Microsoft Corporation Model Based OLAP Cube Framework
CN102789487A (en) * 2012-06-29 2012-11-21 用友软件股份有限公司 Data query and retrieval processing device and data query and retrieval processing method
CN105205085A (en) * 2014-06-30 2015-12-30 中兴通讯股份有限公司 Multi-dimensional analysis method and device for mass data
CN104361118A (en) * 2014-12-01 2015-02-18 中国人民大学 Mixed OLAP (on-line analytical processing) inquiring treating method adapting coprocessor
CN106933845A (en) * 2015-12-30 2017-07-07 阿里巴巴集团控股有限公司 The method and apparatus that MDX inquires about effect are realized using SQL
CN108763240A (en) * 2018-03-22 2018-11-06 五八有限公司 Data query method, apparatus, equipment and storage medium based on OLAP
CN108710652A (en) * 2018-05-09 2018-10-26 长城计算机软件与系统有限公司 A kind of data analysing method and system, storage medium based on statistics
CN109710742A (en) * 2018-12-27 2019-05-03 清华大学 A kind of method, system and the equipment of the natural language querying processing of personal share bulletin
CN110222194A (en) * 2019-05-21 2019-09-10 深圳壹账通智能科技有限公司 Data drawing list generation method and relevant apparatus based on natural language processing
CN110287213A (en) * 2019-07-03 2019-09-27 中通智新(武汉)技术研发有限公司 Data query method, apparatus and system based on OLAP system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460102A (en) * 2020-03-31 2020-07-28 成都数之联科技有限公司 Chart recommendation system and method based on natural language processing
CN111460102B (en) * 2020-03-31 2022-09-09 成都数之联科技股份有限公司 Chart recommendation system and method based on natural language processing
CN113434568A (en) * 2021-06-01 2021-09-24 深圳市酷开网络科技股份有限公司 Multi-source data processing method and device, intelligent terminal and storage medium
CN114443692A (en) * 2022-02-15 2022-05-06 支付宝(杭州)信息技术有限公司 Data query method and device
CN114443692B (en) * 2022-02-15 2023-08-04 支付宝(杭州)信息技术有限公司 Data query method and device
CN116842240A (en) * 2023-08-30 2023-10-03 山东海博科技信息系统股份有限公司 Data management and control system based on full-link management and control
CN116842240B (en) * 2023-08-30 2023-12-01 山东海博科技信息系统股份有限公司 Data management and control system based on full-link management and control

Similar Documents

Publication Publication Date Title
JP6714024B2 (en) Automatic generation of N-grams and conceptual relationships from language input data
US10192545B2 (en) Language modeling based on spoken and unspeakable corpuses
US10558701B2 (en) Method and system to recommend images in a social application
CN110837545A (en) Interactive data analysis method, device, medium and electronic equipment
WO2019118007A1 (en) Domain-specific natural language understanding of customer intent in self-help
US10108698B2 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
TW200900967A (en) Multi-mode input method editor
US11651015B2 (en) Method and apparatus for presenting information
CN112733042A (en) Recommendation information generation method, related device and computer program product
CN110377750B (en) Comment generation method, comment generation device, comment generation model training device and storage medium
EP4035024A1 (en) Semantic parsing of natural language query
US11423219B2 (en) Generation and population of new application document utilizing historical application documents
US20240104302A1 (en) Minutes processing method and apparatus, device, and storage medium
CN109033082B (en) Learning training method and device of semantic model and computer readable storage medium
WO2019231635A1 (en) Method and apparatus for generating digest for broadcasting
CN114880498B (en) Event information display method and device, equipment and medium
JP2017151863A (en) Document summarization device
US20220092453A1 (en) Systems and methods for analysis explainability
CN111401009B (en) Digital expression character recognition conversion method, device, server and storage medium
JP2022095608A (en) Method and apparatus for constructing event library, electronic device, computer readable medium, and computer program
CN114047900A (en) Service processing method and device, electronic equipment and computer readable storage medium
CN112445959A (en) Retrieval method, retrieval device, computer-readable medium and electronic device
US11783112B1 (en) Framework agnostic summarization of multi-channel communication
US11074939B1 (en) Disambiguation of audio content using visual context
US20210109960A1 (en) Electronic apparatus and controlling method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200225

RJ01 Rejection of invention patent application after publication