WO2024082754A1

WO2024082754A1 - Insight data generation method and apparatus

Info

Publication number: WO2024082754A1
Application number: PCT/CN2023/109267
Authority: WO
Inventors: 杨昌和; 徐科
Original assignee: 华为云计算技术有限公司
Priority date: 2022-10-18
Filing date: 2023-07-26
Publication date: 2024-04-25
Also published as: CN117951186A

Abstract

Provided in the embodiments of the present application are an insight data generation method and apparatus. The method comprises: presenting a first chart, wherein the first chart comprises M chart visualization elements, and each chart visualization element corresponds to at least one data record in a data source; confirming N chart visualization elements selected from among the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N; determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1; and performing joint data analysis on the basis of all the K data records, so as to generate first insight data of the N chart visualization elements. By means of the technical solution provided in the present application, insight data of chart visualization elements selected in batches can be automatically generated, and subsequent exchange and analysis of the insight data are realized, thereby improving the accuracy of insight.

Description

Method and device for generating insight data

This application claims priority to the Chinese patent application filed with the China Patent Office on October 18, 2022, with application number 202211275756.8 and application name “Method and Device for Generating Insight Data”, the entire contents of which are incorporated by reference in this application.

Technical Field

Embodiments of the present application relate to the field of data intelligence, and more specifically, to a method and apparatus for generating insight data.

Background technique

Automatic insight data generation is a very important capability in business intelligence-assisted analysis and decision-making, and has gradually become one of the core competitive advantages of business intelligence products provided by various manufacturers. How to design appropriate front-end interaction processes based on user-provided data, ensure back-end data query performance, improve algorithm feature mining, related case analysis, abnormal pattern definition, cause analysis construction and other capabilities, and finally integrate simple and beautiful front-end display and easy-to-use interaction, and present feedback to users is a key factor in building the competitiveness of insight data generation technology.

In the application scenarios of automatic intelligent insight generation of charts in current business intelligence analysis platforms, the insight data analysis related to single-point data can help users to check, discover and gain in-depth understanding of individual chart visualization elements in visual charts when building, browsing and analyzing data. However, the accuracy of insight data generated based on the analysis granularity of single data points in technical products related to automatic insight data generation is poor, and the interaction and analysis freedom supported are still lacking, and there is still room for further improvement and optimization.

Summary of the invention

The embodiments of the present application provide a method and device for generating insight data, which can realize batch selection of multiple chart visualization elements in a chart to generate insight data, thereby improving the accuracy of the insight data and the degree of interactive freedom.

In a first aspect, a method for generating insight data is provided, comprising: presenting a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source; confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N; determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1; and performing joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.

According to the technical solution provided in the present application, it is possible to automatically generate insights into batch data that users interactively choose to focus on, analyze and interpret patterns composed of multiple chart visualization elements, and take into account the correlation and integrity of multiple chart visualization elements, thereby improving the accuracy of insights and reducing interaction costs.

In combination with the first aspect, in certain implementations of the first aspect, joint data analysis is performed based on all K data records, including: determining feature information common to L data records among all K data records, the L data records corresponding to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L; performing data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.

According to the technical solution provided in the present application, it is possible for users to analyze insight data of a local data subset containing multiple chart visualization elements, taking into account the correlation and integrity between the chart visualization elements, thereby improving the accuracy of the insight data.

In combination with the first aspect, in certain implementations of the first aspect, the first insight data includes at least one of the following insight data types: chart metric aggregation expansion analysis, used to analyze the original data distribution composition of the data records corresponding to the N chart visualization elements; external dimension valid record number analysis, used to analyze the distribution of the number of valid records of the K data records in the dimensions that do not participate in drawing the first chart; external dimension distribution contribution analysis, used to analyze the contribution of the K data records to the chart metrics in the dimensions that do not participate in drawing the first chart; external dimensional subspace internal feature analysis, the external dimensional subspace internal feature analysis is used to analyze the feature distribution inside the data records in the dimensions that do not participate in drawing the first chart; external high interpretability metric analysis, the external high interpretability metric analysis is used to analyze the association between the metrics and original data records that do not participate in drawing the first chart and the L data records.

According to the above technical solution, the method can guide users to explore the analysis content of the associated data, such as the composition of abnormal aggregate values, the potential reasons why the aggregate values of the visualization chart elements show a specific pattern, the potential high contribution dimensions, and the value distribution within the subspace. The impact of user-selected metric distributions and external metrics on highly correlated graphs.

In combination with the first aspect, in certain implementations of the first aspect, the first chart is an insight chart generated based on the second insight data, and the first insight data for generating N chart visualization elements includes generating the numerical distribution within the corresponding data records or data record traceability of the N chart visualization elements.

According to the above technical solution, the method supports secondary analysis and exploration of the focused feature subspace of the insight chart and derives insights, optimizes the multi-level subspace analysis and exploration process in the automatic insight generation auxiliary analysis process, and improves the analysis freedom from surface to point, from shallow to deep.

The numerical distribution within the data records corresponding to the N chart visualization elements helps users further explore the patterns of interest in the dimensional distribution chart of the algorithm recommendation insights and explore the reasons for the distribution characteristics; data record tracing can help users conveniently query the original data for abnormal parts of the distribution of the recommended insights and explore the reasons for the distribution characteristics.

In combination with the first aspect, in certain implementations of the first aspect, a priority order of P sub-insight data included in the first insight data is determined; and the P sub-insight data are recommended according to the priority order.

According to the above technical solution, the method can avoid generating a large number of disordered insight charts at one time and presenting them to the user, so that the user can quickly decide where to explore, thereby improving the efficiency of the user in obtaining and analyzing insights.

In combination with the first aspect, in certain implementations of the first aspect, determining the priority order of P sub-insight data also includes: determining a characteristic index value for each sub-insight data in the P sub-insight data, the characteristic index value being used to measure the confidence or significance of each sub-insight data in the P sub-insight data; confirming Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q; determining the number of feature types for each sub-insight data in the Q sub-insight data; and determining the priority order of the Q sub-insight data by sorting in descending order according to the number of feature types of each sub-insight data.

According to the above technical solution, the method realizes the comprehensive consideration of the confidence of all features possessed by each opinion within the same category of opinions and the feature richness of the opinions.

In combination with the first aspect, in certain implementations of the first aspect, determining all K data records corresponding to N chart visualization elements also includes: determining dimensions and metrics in a first chart corresponding to the N chart visualization elements; generating a query request based on the dimensions and metrics in the first chart, the query request being used to query data records in a data source.

According to the above technical solution, the method realizes rapid positioning of chart information contained in a chart visualization element, and the chart information can realize rapid query of data records corresponding to the chart visualization element and selection of focus dimensions/metrics for generating insight data.

In a second aspect, a device for generating insight data is provided, comprising: an interaction module, for presenting a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source; a processing module, for confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N, determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1, and performing joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.

In combination with the second aspect, in certain implementations of the second aspect, the processing module is further used to determine feature information common to L data records out of K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.

In combination with the second aspect, in certain implementations of the second aspect, the processing module is also used to generate the numerical distribution or data record traceability within the data records corresponding to N chart visualization elements based on the insight chart generated based on the second insight data.

In combination with the second aspect, in certain implementations of the second aspect, the processing module is also used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data in order of priority.

In combination with the second aspect, in certain implementations of the second aspect, the processing module is also used to determine a characteristic index value for each sub-insight data in the P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data in the P sub-insight data, confirm Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1 and P is greater than Q, determine the number of feature types for each sub-insight data in the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting them in descending order according to the number of feature types of each sub-insight data.

In combination with the second aspect, in certain implementations of the second aspect, the processing module is also used to determine the dimensions and metrics in the first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, which is used to query data records in the data source.

In a third aspect, a computing device is provided, comprising a processor and a memory, wherein the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory, so that the computing device executes the method in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, a computing device cluster is provided, comprising at least one computing device, each computing device comprising a processor and a memory, wherein the memory is used to store instructions, and the processor is used to call and execute the instructions from the memory, so that the computing device cluster executes the method in the first aspect or any possible implementation of the first aspect.

Optionally, the processor may be a general-purpose processor, which may be implemented by hardware or software. When implemented by hardware, the processor may be a logic circuit, an integrated circuit, etc.; when implemented by software, the processor may be a general-purpose processor, which is implemented by reading software codes stored in a memory, which may be integrated in the processor or may be located outside the processor and exist independently.

In a fifth aspect, a chip is provided, which obtains instructions and executes the instructions to implement the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.

Optionally, as an implementation, the chip includes a processor and a data interface, and the processor reads instructions stored in the memory through the data interface to execute the method in the above-mentioned first aspect or any possible implementation of the first aspect.

Optionally, as an implementation method, the chip may also include a memory, in which instructions are stored, and the processor is used to execute the instructions stored in the memory. When the instructions are executed, the processor is used to execute the method in the above-mentioned first aspect or any possible implementation method of the first aspect.

In a sixth aspect, a computer program product comprising instructions is provided. When the instructions are executed by a computing device cluster, the computing device cluster executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.

In a seventh aspect, a computer-readable storage medium is provided, comprising computer program instructions. When the computer instructions are executed by a computing device cluster, the computing device cluster executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.

As examples, these computer-readable storage media include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), Flash memory, electrically EPROM (EEPROM), and hard drive.

Optionally, as an implementation manner, the above-mentioned storage medium may specifically be a non-volatile storage medium.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a schematic diagram of an application scenario for generating insight data provided in an embodiment of the present application.

FIG. 2 is a schematic diagram of another application scenario for generating insight data provided in an embodiment of the present application.

FIG3 is a schematic diagram of a system architecture provided in an embodiment of the present application.

FIG4 is a schematic diagram of an insight data generation process provided in an embodiment of the present application.

FIG5 is a schematic diagram of a sorting strategy provided in an embodiment of the present application.

FIG6 is a schematic diagram of a case study of an insight data generation process provided in an embodiment of the present application.

FIG. 7 is a schematic diagram of another example of an insight data generation process provided in an embodiment of the present application.

FIG8 is a schematic diagram of an example of a sorting strategy provided in an embodiment of the present application.

FIG. 9 is a schematic structural block diagram of an apparatus for generating insight data provided in an embodiment of the present application.

FIG. 10 is a schematic structural block diagram of a computing device provided in an embodiment of the present application.

FIG. 11 is a schematic structural block diagram of a computing device cluster provided in an embodiment of the present application.

FIG. 12 is a schematic structural block diagram of another computing device cluster provided in an embodiment of the present application.

Detailed ways

The following will describe the technical solutions in the embodiments of the present application in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

Unless otherwise specified, all technical and scientific terms used in the embodiments of the present application have the same meaning as those commonly understood by those skilled in the art of the present application. The terms used in this application are only for the purpose of describing specific embodiments and are not intended to limit the scope of this application.

It should be understood that in the various embodiments of the present application, the size of the serial number of each process does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In addition, in the embodiments of the present application, words such as "exemplary" and "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" in the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Rather, use of the word exemplary is intended to present concepts in a concrete way.

In the embodiments of the present application, “corresponding” and “relevant” may sometimes be used interchangeably. It should be noted that when the distinction between them is not emphasized, the meanings they intend to express are consistent.

The network architecture and business scenarios described in the embodiments of the present application are intended to more clearly illustrate the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided in the embodiments of the present application. A person of ordinary skill in the art can appreciate that with the evolution of the network architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

References to "one embodiment" or "some embodiments" etc. described in this specification mean that a particular feature, structure or characteristic described in conjunction with the embodiment is included in one or more embodiments of the present application. Thus, the phrases "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification do not necessarily all refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways.

In this application, "at least one" means one or more, and "plurality" means two or more. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can mean: including the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A and B can be singular or plural. The character "/" generally indicates that the previous and next associated objects are in an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple.

In order to facilitate the understanding of the present application, the terms involved in the present application are first introduced below.

1. Dimension: Dimension is a classification method for fields in a data set. Fields that have a certain classification meaning for data are called dimensions. Usually, the data is in the form of enumerable values, such as "month", "ID", etc.

2. Metrics: Indicator fields with quantifiable data are called metrics, usually in numerical form.

3. Aggregate value: The aggregate value is the summary value or total value generated by a single field in a data set in a filtered data subset after some calculation operations, such as sum aggregation, mean aggregation, etc.

4. Record: refers to one or more rows in a database table that constitutes a data set.

5. Chart Visualization Element: A chart visualization element is a selectable data point in a visualization chart that summarizes some basic record values in the data. The data of a chart visualization element can consist of a single record or multiple records aggregated together. Chart visualization elements in a visualization chart can be displayed in a variety of ways such as points, lines, shapes, etc.

6. Internal and external: Internal refers to the dimensions and measurements involved in the analysis participating in the drawing of the chart that constitutes the user's current analysis; external refers to the dimensions and measurements involved in the analysis not participating in the drawing of the chart that constitutes the user's current analysis.

In order to better understand the solution of the embodiment of the present application, the possible application scenarios of the embodiment of the present application are briefly introduced below in conjunction with Figure 1.

FIG1 shows an insight data generation system, which may include a user device and a data processing device. The user device may include a smart terminal such as a mobile phone, a personal computer or an information processing center. In general, the user device may be used as the initiator of the insight data generation request.

Optionally, the above-mentioned data processing device can be a device or server with data processing function such as a cloud server, a network server, an application server or a management server. The data processing device receives the instruction of selecting the visualization element of the chart from the intelligent terminal through the interactive interface, and then performs data processing in the form of machine learning, deep learning, search, reasoning, decision-making, etc. through the memory storing the data and the processor link of the data processing. The memory in the data processing device can be a general term, which can be a local storage device storing historical data or a storage manager in the database.

Optionally, in the insight data generation system shown in Figure 1, the user device can receive an instruction from a user to select one or more chart visualization elements in a visualization chart, and then initiate a screening and query request to a data processing device to find out the fine-grained original records of the selected chart visualization elements, so that the data processing device performs data analysis on the original data records corresponding to the one or more chart visualization elements selected by the user device, thereby generating insight data for one or more chart visualization elements.

In Figure 1, the data processing device can execute the insight data generating method of the embodiment of the present application. It should be noted that although the user device and the data processing device are depicted as independent devices in Figure 1, in other embodiments of the present application, the two devices can be implemented by the same device.

FIG2 shows another insight data generation system. In FIG2 , the user device can be directly used as a data processing device. The user device can directly receive input from the user and process it directly by the hardware of the user device itself. The specific process is similar to that of FIG1 . Please refer to the above description and will not be repeated here.

The user device in FIG. 2 may be a server with data processing capabilities such as a cloud server, a network server, an application server or a management server, or may be an electronic device with data processing capabilities such as a desktop computer, a mobile computer, a tablet computing device or a mobile communication device.

In the insight data generation system shown in Figure 2, the user device can receive an instruction from the user to select one or more chart visualization elements in a visualization chart, and then the user device itself initiates a request to perform data analysis on the selected one or more chart visualization elements, thereby generating insight data for the one or more chart visualization elements.

In FIG. 2 , the user device itself can execute the insight data generating method of the embodiment of the present application.

In an embodiment of the present application, the processors in FIG. 1 and FIG. 2 can perform data analysis according to business needs. For example, according to business needs, the insight analysis of the chart is performed, and a variety of different analysis modes are supported, including statistical value feature analysis, distribution feature analysis, null value alarm analysis, zero value alarm analysis, high correlation measurement analysis, global-subset difference analysis, etc., and different types of insights can be detected from the statistical analysis and traditional machine learning levels. Features of different categories are generated, and a variety of feature descriptions are customized to obtain insight analysis of the interest data behind the visualization elements of the chart screened by the user.

As shown in FIG3 , an embodiment of the present application provides a system architecture 100. The system architecture 100 may include an execution device 110, a database 130, a client device 140, a data storage system 150, and a data acquisition device 160. It should be understood that FIG1 is only for illustration, and optionally, the system architecture may include more or fewer databases and execution devices, or other functional modules.

In Figure 3, the data acquisition device 160 can be used to collect chart data, and in the embodiment of the present application, the chart data can be used to generate a visual chart containing chart visualization elements. After collecting the chart data, the data acquisition device 160 stores the data in the database 130. It should be noted that in actual applications, the training data maintained in the database 130 does not necessarily all come from the collection of the data acquisition device 160, and may also be received from other devices, for example, it may also be directly obtained from the cloud or other places. It should also be noted that the execution device 110 does not necessarily generate insights based entirely on the training data maintained by the database 130, and it is also possible to obtain data from the cloud or other places to generate insights. The above description should not be used as a limitation on the embodiments of the present application.

Optionally, the database may be a hardware device, may be integrated in the execution device 110, or may be set up on a cloud or other network server.

The generation of visual charts and insight data can be applied to different systems or devices, such as being applied to the execution device 110 shown in FIG. 3 and presented on the application interface 120. The execution device 110 can be the data processing device in FIG. 1, can be a terminal, such as a mobile terminal, a tablet computer, a laptop computer, an AR/VR or a vehicle-mounted terminal, etc., can also be a server or a cloud, etc. In FIG. 3, the execution device 110 can be configured with an input/output (I/O) interface 112 for data interaction with an external device. The user can input data to the I/O interface 112 through the client device 140, and the input data can include: instructions for selecting one or more chart visualization elements and dimensions and metrics of the visualization charts corresponding to the chart visualization elements in the embodiment of the present application. The execution device 110 can call the data, code, etc. in the data storage system 150 for corresponding processing, and can also store the input data obtained by the corresponding processing in the data storage system 150.

Finally, the I/O interface 112 feeds back the processing result, for example, the generated insight data, to the client device 140. The client device may also be the execution device 110 in FIG3, and the fed-back insight data is presented on the application interface 120 of the execution device.

The execution device 110 includes an application interface 120. Optionally, the application interface 120 can be an interface of a client application stored locally on the execution device 110, or it can be an interface of a client application located on a remote server and accessible through a network (such as the Internet or an intranet). For example, it can be an application interface that is hosted in a browser-controlled environment or coded in a language supported by the browser and relies on a web browser to perform data calculations.

The application interface 120 may include a visualization chart interface 121 and an insight data interface 125 , or the visualization chart interface 121 and the insight data interface 125 may be presented through multiple application interfaces.

The visualization chart interface 121 may include one or more different types of charts and interface configuration information. The interface configuration information may include modules such as dimension options, measurement options, and chart interface setting modules, or elements such as axis configuration information for selecting charts to be drawn, and chart raw data. It should be understood that FIG. 3 is only an example. Optionally, the visualization chart interface 121 may also include more selection modules, such as Select the module as chart type.

The insight interface 125 may include one or more insight data 126, 127, and the insight data 126 and the insight data 127 may include insight charts or insight texts. The insight data is obtained according to the chart 122 or the chart 123 in the visualization chart interface 121, and the insight data interface may also include an insight mode selection module or an analysis type for selecting insight data generation. The analysis type may be distribution feature analysis, null value alarm analysis, zero value alarm analysis, high correlation metric analysis, global-subset difference analysis, etc., or may be a customized feature analysis. The analysis results formed may be described by different chart types and corresponding text insight information, and displayed in insight charts or insight texts. It should be understood that FIG. 3 is only an example, and optionally, the insight interface 125 includes more modules, such as an insight data sorting module.

It is worth noting that Figure 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 3, the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110. Optionally, other modules may also be included in the system architecture, such as a chart drawing module. Optionally, the visual chart interface and the insight interface may not be in the same application interface. The scenarios to which the embodiments of the present application can be applied are not limited to those shown in Figure 3.

In the current application scenario of automatic intelligent insight generation of charts in business intelligence analysis platforms, the insight data analysis related to single-point data can help users to inspect, discover and gain in-depth understanding of individual chart visualization elements in a visualization chart when building, browsing and analyzing data. However, when users want to analyze a local data subset containing multiple chart visualization elements, the data records of unselected chart visualization elements will interfere with the insight data analysis formed by the selected single chart visualization element, making it difficult to ensure the accuracy of the overall insight analysis formed by selecting multiple chart visualization elements, and the interaction cost is high.

For example, if there are three outliers in a chart that present the same or similar phenomena, these outliers are the user's interest data, and the user wants to obtain insight data on the causes of the three outliers. If the user only selects one of the outliers to generate insight data, the other two outliers also participate in the analysis process of the insight data. As a result, the insight data may be biased, for example, the selected outlier may be judged as a normal value due to the presence of the other two outliers. Therefore, the unselected chart visualization elements may interfere with the insight data formed by the selected single chart visualization element.

Therefore, the technical products related to automatic insight generation lack the auxiliary insight generation solution of batch selection of local data based on user-provided interactive attention, which is based on the analysis granularity of single data points. At the same time, the interaction and analysis freedom supported are still lacking, and there is still room for further improvement and optimization.

In view of this, an embodiment of the present application provides a solution for generating insight data. Figure 4 shows a schematic flowchart of a method 400 for generating insight data provided by an embodiment of the present application. The method of Figure 4 can be executed by the data processing device of Figure 1 or the user device of Figure 2.

Step 410: Present a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source.

Optionally, the first chart may be presented in the visualization chart interface 121 in the application interface 120, or in any visualization interface. The data records for drawing the first chart may be all or part of the data in the database 130, or all or part of the data in one or more tables in any data source.

The first chart includes M chart visualization elements, such as bars of a bar chart, discrete data points of a scatter plot, data points and adjacent lines of a line chart, sectors of a pie chart or a donut chart, and other graphical representations of data records. Each chart visualization element is drawn by a data record. A single chart visualization element may correspond to a single data record, or may correspond to an aggregate value of multiple data records, that is, a summary value or total value finally generated after some calculation operations are performed on multiple data records, such as sum aggregation, mean aggregation, etc.

Step 420 : confirming N chart visualization elements selected from the M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N.

The technical solution of this embodiment can support the analysis and insight data generation after batch selection of N chart visualization elements of the visualization chart to be analyzed. Different from only supporting the analysis of a single chart visualization element, step 420 can support the interactive mode of selecting multiple chart visualization elements and confirm the selected multiple chart visualization elements.

Optionally, step 420 in the technical solution of the present application may also support an interactive mode of selecting a single chart visualization element and confirming the selected single chart visualization element.

For example, the user can click and select one or N chart visualization elements through the visualization chart interface of the application interface. For example, the application interface is an interface of a spreadsheet application on a desktop computer, and the user can drag and select with the mouse to generate a selection box, and one or N chart visualization elements in the selection box are determined as the chart visualization elements selected by the user. The chart visualization elements can be It can be one or N bars in a bar chart, one or N data points in a line chart or scatter chart, or one or N sectors in a pie chart or donut chart.

Optionally, the user may select one or N chart visualization elements by clicking on a single selection or multiple selections. For example, the user may click on one or N chart visualization elements at the same time with a mouse, and the selected one or N chart visualization elements are determined as the chart visualization elements selected by the user.

Optionally, the selected chart visualization elements may be discontinuous in the dimension of the x-axis of the chart, and the selected chart visualization elements may be separated by one or more chart visualization elements.

Optionally, the focus data supported by this technical solution can be batch supported on a variety of different charts, and the highlights selected by the user can be retained when the chart type is switched, ensuring that the highlights are always used for user insight generation.

It should be understood that the batch selection method in step 420 can be performed on a variety of different charts, such as bar charts, line charts, or scatter charts, and the data that is brushed will be highlighted relative to the data that is not brushed, and will remain highlighted when the type of chart for analyzing the data is switched. For example, when drawing a chart for the same set of data, the chart visualization elements that the user is interested in are brushed on the bar chart, and the chart visualization elements are then highlighted. When the user switches the bar chart to a line chart, the chart visualization elements will still remain highlighted. When the user switches the chart type for the bar chart that has been generated and the visualization elements have been brushed, the visualization elements containing the data records corresponding to the brushed visualization elements will also be highlighted in the new chart.

Step 430: Determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1.

Optionally, the method of confirming all K data records corresponding to N chart visualization elements may be to determine the interactive form of the chart visualization elements of the user interactively selecting the visualization chart. For example, the interactive form of the user interaction may be the operation of the user performing a swiping interaction. When the user performs a batch swiping operation along the horizontal axis of the chart, the interactive form is to swipe the chart visualization element data when the horizontal axis dimension takes a specific value. The dimension data bound to the chart visualization element may be a specific value of the dimension corresponding to the chart visualization element data, and the analysis granularity may be a specific category of the dimension. Furthermore, according to this interactive form, the user is concerned about the characteristics of the chart visualization element when the dimension corresponding to the horizontal axis of the chart takes a specific value, and the subsequent insight data of this solution will also focus on the horizontal axis dimension of the chart.

For example, if the type of the first chart is a bar chart, the x-axis dimension is time, and the interactive operation performed by the user is to swipe the three bar chart visualization elements of the bar chart along the x-axis direction of the visualization chart. The time periods corresponding to the three bar chart visualization elements constitute field A, then the solution analyzes that the user's interactive operation is swiping along the x-axis, the dimension data field bound to the x-axis of the visualization chart is field A of a certain time type, and the analysis granularity is a specific time type, such as month, time period, etc. According to the dimension data field, all data records corresponding to the dimension data field A and the specific time type are filtered out from the data source or database.

Optionally, a method for confirming all K data records corresponding to N chart visualization elements may be to directly extract the enumeration value of the data point corresponding to the interactively selected chart visualization element, and the enumeration value of the data point may be the specific value of the field corresponding to the chart visualization element of the chart. For example, when the dimension in the chart is month, the enumeration value may be one or more different values of month 1 to month 12 corresponding to the selected chart visualization element, or it may be a combination of the value of the external dimension information of the data set bound to the horizontal axis or legend in the selected chart visualization element and the value of the month. Determining the dimension combination related to the chart visualization element in the chart may include the specific value of the external dimension that is not involved in drawing the first chart and the selected multiple enumeration values of the dimension that participates in drawing the first chart, and may also include the dimension combination and measurement related to the chart visualization element, or other relevant information of the chart visualization element.

The technical solution of the present application can integrate the information contained to generate filtering logic for filtering and searching data sets or original data records in databases. The subsequent insight data of this solution will also focus on the dimensions or dimension combinations related to the filtered fine-grained original records and chart visualization elements. Exemplarily, when the enumeration values of the dimensions corresponding to N chart visualization elements are three different fields of dimensions A, B, and C, the logic for filtering the first chart data can be a logical combination of A or B or C with different values.

It should be understood that the above process is only a very simple scenario. This solution can also support multiple different dimension fields to jointly generate the x-axis of the chart, and can also support the legend field and the configuration of the chart itself to superimpose complex filtering conditions. The filtering logic is generated by multiple nested filtering logic modules.

Optionally, this solution can convert one or N chart visualization elements interactively selected by the user on the first chart (corresponding to the aggregated results of one row or part of the rows in the original data set records) into query requests to further query all K data records of the interactively selected chart visualization elements in the original data set or database, and use all K data records for subsequent insights.

Optionally, the query request may be any combination of all information of the request field, a list of filter operators, a list of filter enumeration values, and filter logic, and the object of the request may be a backend module. For example, the Structured Query Language (SQL) supports the generation of a where clause, performs a query on the original table records, and returns the subset of interest data corresponding to the chart visualization element selected by the user to the algorithm module.

Exemplarily, in one embodiment of the present application, the user interest data filtering generated by the interactive operation guidance is all records that satisfy the dimension A field in the data set participating in the analysis. Then, this solution corresponds to the final generated SQL query statement, which is a composite implementation of multiple dimension A fields in the where clause based on the IN operator or logic.

Optionally, the solution is not limited to single-table analysis queries of a single data source, but can also support queries for multiple related data tables in the original data source, and can support federated queries and subsequent analysis. Exemplarily, in one embodiment of the present application, the functional bottom layer of the solution is based on a distributed SQL query engine, which then merges multiple tables into a data set level to obtain data.

Step 440: Perform joint data analysis based on the K data records to generate first insight data for N chart visualization elements.

It should be understood that the joint data analysis process of K data records is different from the data analysis process of selecting a single chart visualization element and the analysis process of selecting multiple chart visualization elements and then performing data analysis on a single chart visualization element and then integrating data analysis information. The joint data analysis can analyze the K data records as a whole and with other data in the data set, thereby generating N chart visualization elements or at least two of the N chart visualization elements compared with other data records to obtain insight data.

Optionally, the K data records are taken as a whole, and the association relationship of each data record in the K data records is determined, and the association relationship determines the characteristic information shared by the K data records. For example, the K data records may have the same or similar external dimensions, or may be data records with a correlation relationship, or may present the same or opposite measurement value phenomena, and the shared characteristic information is the external dimension or correlation relationship analysis data or measurement value phenomenon corresponding to the K data records. Based on the shared characteristic information, the data records with shared characteristic information in the data source are screened out, and the K data records and the data records with shared characteristic information are subjected to data analysis to form insight data of N chart visualization elements.

Optionally, L of the K data records may have common feature information, and the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than L. According to the L data records and their common feature information and the data records for comparison, insight data of at least two chart visualization elements corresponding to the L data records are generated. The data records for comparison may be data records corresponding to (M-N) chart visualization elements that are not selected from the M chart visualization elements in step 420, or may be (K-L) data records that are not selected from the K data records.

The technical solution of the present application can realize simultaneous analysis of multiple chart visualization elements and their original data records, and obtain insight data containing the association information of multiple chart visualization elements. The insight data generated by the technical solution of the present application is different from the insight data obtained by selecting a single chart visualization element for analysis and selecting a single chart visualization element for analysis multiple times and then integrating it, which reduces the interference of chart visualization elements that are not selected but have an associated relationship on the selected chart visualization elements during the analysis of insight data.

It should be understood that the number of times the insight data of at least two chart visualization elements corresponding to L data records are generated may be more than once, and the value of L may be different, and finally a plurality of different data candidate sets may be formed. These different data candidate sets may be analyzed by strategies or algorithms to form a plurality of different insight data, and the data candidate sets serve as subspaces for strategy or algorithm analysis.

The present application analyzes the data candidate set in the data subset according to the insight data generation strategy or algorithm to generate insight data that can demonstrate the characteristics of the subspace data.

In one embodiment of the present application, the insight data generation algorithm input may include the full amount of full table data records, the screening conditions corresponding to the visualization elements of the original visualization chart selected by the user interaction, the original records of the data of interest generated by the query request, the common feature information of the original data records, and the algorithm parameters configured by the front-end user interaction. The insight data results produced by the algorithm may include the original chart data, chart type information, axis configuration information, or text insight description information required to draw the insight chart.

Optionally, the insight data generation algorithm can support multiple analysis modes, such as statistical value feature analysis, distribution feature analysis, null value warning analysis, zero value warning analysis, high correlation measurement analysis, global-subset difference analysis, etc. It can also support the detection of different categories of features for different types of insights from the statistical analysis and traditional machine learning levels, and generate a variety of customized feature descriptions.

Optionally, for different types of insight data, the insight data generation algorithm can adaptively use different chart types for display, such as using a bar chart for distribution charts, and flexibly using a scatter plot logarithmic axis and a linear axis based on data distribution for correlation measurement charts. The textual insight description information of insight data also varies. The textual description can be a description of the characteristics of the insight, a possible priori pattern analysis composed of various characteristics, a combination of the two, or other descriptions that can explain the characteristics.

In one embodiment of the present application, the insight data generation algorithm can produce multiple different types of insights, support the contribution of metrics or dimensions inside and outside the analysis chart to the generation of patterns of user interest selection, and guide users to explore the analysis content of data records corresponding to the selected chart visualization elements and related data in the data set.

It should be understood that the concepts of internal and external refer to whether the dimensions and measures involved in the analysis are involved in the drawing of the chart that the user is currently analyzing. Considering the characteristics and contributions of external dimensions and measures can help users discover the relevant information related to the selected chart visualization elements. data aggregations or subspaces of interest.

Exemplarily, the types of insights generated by the algorithm may include chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, external dimension subspace internal feature analysis, and external high interpretability metric analysis, but are not limited to these types of insights.

Optionally, the chart metric aggregation expansion analysis can focus on decomposing the bound metric aggregation values of the characteristic visualization chart visualization elements into the original data distribution composition, helping users understand the composition of the aggregation value. For example, the vertical axis in a common analysis chart is the sum aggregation of metrics, and the user pays attention to the chart visualization elements with higher aggregation values. This type of insight data can help users understand the composition of abnormal aggregation values, such as a single original abnormal record, or the overall distribution has a certain bias.

Optionally, the analysis of the number of valid records in the external dimension can focus on exploring the data records selected by the user interactively, and the distribution of the number of valid records in other external dimensions (not involved in the chart drawing) to analyze the potential reasons why the aggregate values of the visualization elements of the visualization chart selected by the user show a specific pattern. If it is found that the original data records corresponding to a specific pattern are aggregated in a certain dimension in a certain value subspace, this method believes that the aggregation has a greater correlation with the presentation of this pattern.

Optionally, external dimension distribution contribution analysis can focus on exploring the contribution distribution of data records selected by users in other external dimensions (not involved in chart drawing) to the chart metrics that users are interested in. This type of explanation essentially disassembles the aggregated value in another direction outside the chart to find potential high-contribution dimension values for users to further explore. When users find a dimensional subspace of interest, they can further use the data explanation subspace distribution exploration function to view the detailed distribution within the subspace.

Optionally, the internal feature analysis of the external dimension subspace can be highly related to the above-mentioned explanation related to the external dimension, and can support automatic search and recommendation of some subspaces of dimensional values, where the numerical distribution inside such subspaces has certain characteristics for the metric distribution selected by the user. Based on the subspace distribution, the user can further use the traceability original data recording function to analyze the source of the feature pattern.

Optionally, external highly interpretable metric analysis can focus on data patterns measured in a subset of data that the user is interested in, perform highly correlated metric analysis on the full set of data and the subset of data respectively, obtain a batch of explanatory external metric candidates, and further analyze them to obtain metrics with higher surprise, and display the correlation between the metric and the chart metric that the user is interested in through a scatter plot pattern, in order to explore possible insights from the data.

The types of insights in the embodiments of the present application are not limited thereto. In another embodiment of the present application, the algorithm can also generate multiple different types of insights to support the analysis of associations within the data records corresponding to the selected chart visualization elements. Exemplarily, the types of insights generated by the algorithm may include chart visualization element trend analysis, chart visualization element cluster analysis, etc., but the embodiments of the present application are not limited to these types of insights.

Optionally, the trend analysis of the chart visualization element can focus on the trend pattern of the data record corresponding to the selected chart visualization element as the x-axis dimension changes. For example, the periodic change pattern that may exist in the data record is obtained from the numerical high points or numerical low points that may appear in the data record as a whole. The trend analysis of the chart visualization element can also be used for the prediction of data records. For another example, when there are some outliers in the data record showing a specific trend, only non-outliers can be selected for analysis in the process of selecting the chart visualization element, and the outliers can be skipped to improve the accuracy of the trend analysis.

Optionally, the cluster analysis of chart visualization elements can focus on the clustering patterns and differences of data records corresponding to multiple chart visualization elements selected in batches. For example, this insight type can classify data records corresponding to chart visualization elements in one or more charts into aggregate classes based on the intrinsic properties of the data, where data records in each aggregate class have the same characteristics, and data records in different aggregate classes have greatly different characteristics. This insight type can analyze data tables in multiple data sources and classify data records corresponding to multiple chart visualization elements as much as possible.

The automatically generated insight data presentation of the data interpretation function in the technical solution of the present application can be in the form of free expansion and contraction similar to an accordion, and is divided into two layers. The title of the first layer of the accordion marks the names of different insight categories. When the user expands the first layer, the second layer displays the specific insights recommended by all algorithms under this type of insight. When the user expands it again, the text description and chart drawing of this type of insight data will be displayed specifically. When the user expands a specific insight data, other insight data will be folded to ensure the neatness of the front-end interface.

Optionally, the present technical solution can support users to freely observe the charts and text results recommended by the algorithm in each type of different insights, where all charts generated by the algorithm also support basic interactive methods such as interactive selection, highlighting, and legend switching, thereby optimizing the user's exploration and analysis process experience, and also providing users with the possibility to conduct interactive analysis in the feature subspace of the insight chart.

Optionally, this technical solution can support users to export the insight charts of interest generated by data interpretation to the dashboard, and display them at the same level as the original charts, while displaying the insight text information on the right. This function supports associated highlighting, that is, when the user selects the insight chart exported to the dashboard, the chart that generated the insight data will be highlighted synchronously, and the user's filter information when the parent chart generated the insight data will be highlighted. Selected interest data.

Optionally, this technical solution can also be applied in cloud environment scenarios, and can be compatible with insight saving related functions in the microservices where it is located, and can be saved, previewed, and loaded like ordinary charts.

The technical solution of steps 410-440 can effectively produce accurate and inspiring insights, but when providing data interpretation operations, the local data subset observed by the user cannot be subsequently analyzed, which to some extent limits the user's interactive exploration method.

To avoid the above problems, another embodiment of the present application shows a method 450 for generating insight data, providing further generation of insight subspaces to achieve subsequent analysis of insight data to further insight data. The method includes steps 460-490, which are described in detail below.

Step 460: Present an insight chart in the second insight data, the insight chart comprising W chart visualization elements, each chart visualization element corresponding to at least one data record in the data source.

Optionally, the second insight data presented may be any insight data generated by selecting a chart visualization element, such as the first insight data in step 440, or may be insight data generated by further analyzing any generated insight chart.

Optionally, the type of the second insight data can be any of the insight types mentioned above, such as chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, external dimension subspace internal feature analysis, and external high interpretability metric analysis, etc., or it can be other types of insight analysis.

Optionally, the insight chart may further include statistical characteristic values or extreme values corresponding to the insight text descriptions in the insight chart.

Step 470 : confirming J chart visualization elements selected from W chart visualization elements, where W and J are positive integers greater than 1.

Step 480: Determine all H data records corresponding to the J chart visualization elements, where H is a positive integer greater than 1.

It should be understood that the processes of step 470 and step 480 are substantially the same as the processes of step 420 and step 430 and are not described in detail herein.

Step 490: Generate third insight data based on all H data records, where the third insight data includes data distribution analysis or data record tracing of the H data records.

It should be understood that the present technical solution can realize further interactive exploration and analysis functions configured for different types of insight data to generate further insight data. The first chart in step 410 can be an insight chart in any insight data in step 460, and the insight type of the third insight data in step 490 can be the insight type of the first insight data in step 440, or other insight types can be added based on the insight type of the first insight data.

Optionally, the generated third insight data may be a further analysis of the data record distribution within a subspace of the insight data, aiming to help users further explore patterns of interest in the dimensional distribution chart of the algorithm-recommended insights.

Exemplarily, if the insight chart of step 460 in the present technical solution is a derived insight chart of the external dimension valid record number analysis or external dimension distribution analysis type, when the user interactively swipes or clicks to select the dimensional subspace of interest in the insight feature and performs subspace distribution exploration, the technical solution will again generate a subspace metric distribution insight chart that also supports interaction. The selection of the metric for this insight is related to the metric associated with this type of insight and the metric of the chart that originally generated the data interpretation.

Optionally, the further insight data generated can be original data traceability, aiming to help users easily perform original data queries on abnormal parts of the recommended insight distribution and explore the reasons for the distribution characteristics.

Exemplarily, if the insight chart of step 460 in the present technical solution is an insight chart derived from the aggregation and expansion analysis of chart metrics, the internal feature analysis of the external dimension subspace, and the above-mentioned subspace distribution exploration, since multiple sufficiently fine-grained downward analyses have usually been performed when executing this functional operation, the original data records directly returned are often small in number, but have strong explanatory power. Similarly, for the statistical characteristic values of the textual insight descriptions in the insight data generated by the algorithm, the present technical solution supports convenient tracing of the original records, and both use a consistent display format. Optionally, the present technical solution uses a paginated table to display the original records.

Optionally, the type of the third insight data is not limited to the above two insight types, and may also be any of the insight data types mentioned above, such as the chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, etc.

This technology can support users to continue to carry out rich interactive operations on the data interpretation charts derived from the algorithm, and realize further focused analysis within the insight feature subspace. After constructing the feature subspace of interest, click the corresponding content in the function menu to ensure the logical continuity of the internal use of the data interpretation function in this technical solution, reducing the learning cost.

In the current application scenarios of automated intelligent insight generation of charts in business intelligence analysis platforms, the insight generation method based on automatic search and insight mining of global data will generate a large number of insight charts at one time and present them to users. This lacks the focus of user attention, making it difficult for users to decide where to start exploring, and there is a certain "cold start" problem.

In order to avoid the above problems, this application designs a sorting strategy to determine the priority order of multiple sub-insights in an insight. Multiple sub-insights are recommended in order of priority, and the final result is generated by sorting. Specifically, the sorting strategy is applied after generating the insight chart in steps 440 and 490 and before presenting the insight interface. Figure 5 shows a schematic flowchart of an embodiment of the sorting strategy 500 of the present application, which comprehensively considers two aspects: the confidence of the full amount of features possessed by each insight within the same type of insight and the feature richness of the insight. As shown in Figure 5, the method includes steps 510-540, and steps 510-540 are described in detail below. Assume that the insight data includes P sub-insight data.

Step 510: Determine a characteristic index value of each sub-insight data among the P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data.

Optionally, for different features, the present technical solution formulates different measurement methods respectively.

Exemplarily, statistical features can be described based on indicators such as the number of eigenvalues and the degree of outliers of outliers relative to the full data. Distribution analysis features can be described by indicators such as the unevenness of distribution and the proportion of maximum distribution. Alarm-related analysis features can be described by the corresponding proportion of alarm values. Correlation measurement analysis features can be described by the above-mentioned measurement indicators. Difference analysis can be described based on the KL divergence of discrete distribution after binning, etc.

Step 520: Confirm Q sub-insight data whose feature index values are higher than the threshold value of the feature index value, where Q is a positive integer greater than 1, and P is greater than Q.

Optionally, the technical solution filters feature insights whose feature index values are lower than the threshold value, which may be by placing the insights with lower feature indexes at the very end of the presentation interface queue, or by deleting the feature insights.

Step 530: Determine the number of feature types of each sub-insight data among the Q sub-insight data.

It should be understood that the number of feature types can be used to describe the feature richness of insight data.

Step 540: Determine the priority order of the Q sub-insight data by sorting in descending order according to the number of feature types of each sub-insight data.

Optionally, for each sub-insight data that has completed the high feature index filtering, the present technical solution can count the description of its feature richness and determine the recommendation priority of different sub-insight data by sorting in descending order.

Optionally, if multiple insights have the same number of feature types, the feature indicators of the features of different insights are sorted in descending order and then compared in sequence to determine the priorities.

The above method 400 and method 450 can be used alone or in combination. The following describes the method used in combination to achieve insight data generation with a specific example, and describes the implementation of the sorting strategy 500 with the example. Figures 6, 7 and 8 show detailed cases of using the technical solution of the present application to achieve insight data generation and the step-by-step exploration and in-depth process of insight data generation. The elements and data in the cases are all examples, and actual cases include but are not limited to the cases in Figures 6, 7 and 8.

FIG6 shows an application interface 600. The application in this case may be a table application or a data intelligence analysis application. The application interface 600 includes a data table 610, which includes multiple dimensions and metrics. In this case, the data table is the data source. In an actual case, multiple data sources may be included, and each data source may include multiple data tables.

Extract the time dimension 1-5, location dimension A, B and C, unit price dimension a, b and c and sales volume X in the data table 610. The location dimension A, B and C and the unit price dimension a, b and c are summed and aggregated to form the sales volume X, that is, the location dimension and the unit price dimension are external dimensions and do not participate in the drawing of the chart. Their data records are directly accumulated to obtain the summed aggregate value. Take the time dimension 1-5 as the x-axis and the sales volume X as the y-axis to draw a bar chart 620 and present it in the application interface 600. This case selects a bar chart as an example, and in actual cases, it can also be a line chart, a pie chart, etc. Chart 620 contains five chart visualization elements, that is, five columns, and the value of the data point of each chart visualization element is summed and aggregated by multiple data values of the location dimensions A, B and C and the unit price dimensions a, b and c, that is, corresponding to the data records of multiple related dimension values in the data table. In actual cases, the value of the data point of each chart visualization element can also correspond to only one data record. This part of the step process corresponds to step 410 above.

The dotted box in chart 620 is a selection box of application interface 600. The chart visualization elements within the selection box will be highlighted, that is, the two chart visualization elements in the dotted box in chart 620 are filled with diagonal lines. The two chart visualization elements are the objects that need to be analyzed for insights in this case. The selection box in this case is a continuous selection. In actual cases, multiple selection boxes can select multiple discontinuous data, or only one chart visualization element can be selected. This case is not limited. This part of the steps corresponds to step 420 above.

After determining the selected chart visualization element, the background of the application determines the data records in the data table 610 corresponding to the selected chart visualization element. In this case, the specific value of the x-axis dimension corresponding to the selected chart visualization element is time 1-3, that is, the operation of selecting the interaction is to batch select chart visualization elements with specific values 1-3 along the dimension of the x-axis of the chart. According to the operation of selecting the interaction and the dimensions that constitute the aggregate value of the y-axis, the filtering logic of the data records corresponding to the selected chart visualization element is generated, that is, the specific value of the filtering dimension combination is a logical combination of (time dimension 1 or 2 or 3) and (location dimension A or B or C) and (unit price dimension a or b or c). The generated filtering logic can be used to query all the original data records in the data table 610, that is, to determine the selected chart. Data records corresponding to the visualization elements. This part of the steps corresponds to step 430 above. In actual cases, each column may be divided into multiple sub-columns due to different legend values. When only some sub-columns are selected, the external dimensions in the obtained logical combination may also have partial values.

Based on the data records corresponding to the selected chart visualization elements determined above, joint data analysis is performed to generate insight data. The specific insight data depends on the analysis results of different data record subsets, and the following is an exemplary example:

Assume that the values of the sales volume corresponding to the two chart visualization elements selected in this case both present the same phenomenon, that is, they present abnormally high values. In the joint data analysis process, the sales volume corresponding to the two chart visualization elements are compared with the three chart visualization elements not selected in the chart 620 and the remaining data records in the data table at the same time. When the sales volume corresponding to the two chart visualization elements is compared with the three chart visualization elements not selected in the chart 620, if it is found that the data record with the location dimension value of A has a significant contribution to the high sales volume, that is, when the time dimension value is 1 or 2 or 3 and the location dimension value is A, the sales volume is abnormally higher than the sales volume under other dimension values, then the data record with the time dimension value of 1 or 2 or 3 is determined to have an association relationship. The association relationship is that the time dimension value of 1 or 2 or 3 is associated with the location dimension value of A, or the association relationship can be expressed as the data record with the time dimension value of 1 or 2 or 3 and the location dimension value of A has common feature information, that is, the sales volume is abnormally higher than the sales volume under other dimension values. Based on the data records with the aforementioned time dimension value of 1 or 2 or 3 and the location dimension value of A, the shared feature information and other data records in the data table, the content of the generated insight data can be the contribution of time, location or unit price to the outlier phenomenon, or it can be the analysis of the data records with the time dimension value of 1 or 2 or 3 and the location dimension value of A along the aggregate value of the unit price dimension, or it can be other insight data related to external dimensions. These insight data can correspond to the insight data 621, 622, etc. in Figure 6. However, more insight data can be generated in actual cases, and there can be more phenomena presented by the selected chart visualization elements, and there can be more insight data generated by each phenomenon, which will not be repeated in this case. The analysis process of these insight data uses the data records corresponding to multiple chart visualization elements, so that users can analyze the observed local data. The above steps correspond to step 440 above.

In this case, the insight data 621, 622, etc. generated based on chart 620 include insight charts 631, 632, etc. and text descriptions corresponding to the insight charts, wherein the drawing method of insight charts 631, 632, etc. is the same as that of chart 620, and the text descriptions corresponding to the insight charts may include characteristic values or extreme values in the insight data.

Based on the same method of selecting the chart visualization elements in chart 620 in the above text case, two chart visualization elements in the presented insight charts 631 and 632 are selected. Based on the same analysis steps mentioned above, insight data 641, 642 and other insight data are obtained. Insight data 641, 642 and other insight data are further analyzed insight data in this case, and the analysis steps are not repeated in this case. The insight type of insight data 641, 642 can be the same or similar insight type as the insight data 621, 622 and other insight data mentioned above, or it can be a subspace analysis of the insight data or the tracing of the insight data.

Assuming that the type of insight data 641 is the subspace analysis of insight chart 631, its content may be the analysis of the composition of the original data records corresponding to the aggregated values of the data points constituting the insight chart 631. The x-axis may be the specific value of the dimension of the original data record, and the y-axis is the sales volume. This part can be used to explain the possible outliers or data records with greater contribution in the original data records constituting the insight data 621.

Assuming that the type of insight data 642 is the original record traceability of insight chart 632, its content may be the numerical value of the specific original data record constituting insight data 642 and the specific value of its dimension. The original record traceability presents these original data records through a paginated table. At the same time, the feature values in the text description corresponding to the insight chart in the insight data can also be traced to the original record.

Based on the insight chart in insight data 641, this case can also generate insight data according to the aforementioned selection of chart visualization elements and data analysis steps, thereby achieving continuous further drill-down analysis of the insight data, which is not elaborated in this case.

FIG7 shows a schematic diagram of an interface of another detailed case. The detailed case in FIG7 is slightly different from the detailed case in FIG6. The difference is that the dimensions in FIG7 are changed to distance, ID, number, throughput, etc., and the insight data in FIG7 are presented in different application interfaces after being generated. The application interface may be an interface belonging to different applications, that is, the insight data corresponding to different data records may be generated in different applications or application interfaces.

The intermediate analysis process in the detailed case shown in FIG. 7 is similar to the detailed case shown in FIG. 6 , and will not be described in detail here. Only the situation in which the insight data is presented in different application interfaces after being generated will be described.

According to the data table 710 in the application interface 701, a chart 720 in the application interface 702 is generated, and the chart visualization element in the chart 720 corresponds to at least one data record in the data table 710. A selection box is selected in the chart 720, and two chart visualization elements in the selection box are used to generate insight data. The insight data 721, 722, etc. generated finally are presented in the application interface 703. Two chart visualization elements are selected in the insight chart 731 or the insight chart 723 in the application interface 703 to generate insight data 741, etc., and Presented in application interface 704.

By analogy, this case supports secondary analysis and exploration of the focused feature subspace of the insight chart and derives insight data, optimizes the multi-level subspace analysis and exploration process in the automatic insight data generation auxiliary analysis process, and improves the analysis freedom from surface to point, from shallow to deep.

Fig. 8 shows a sorting strategy 800 used for multiple insight data in this case. This case is an embodiment of a sorting strategy, and does not limit the process of determining the priority order of multiple insights.

Assume that in this case, in the insight data generation process shown in FIG6 or FIG7 , 10 insight data are obtained, namely insight data 810 to 819. The application background sorts the obtained insight data 810 to 819.

First, determine the feature index values used to arrange different types of insight data. For example, in this case, the confidence score can be selected as the feature index value. According to the different types of features mentioned above, different measurement methods are used. The application background determines the feature index values of insight data 810 to 819 and arranges them in descending order. The resulting arrangement list is shown in Figure 8.

Secondly, determine the threshold of the characteristic index value to filter out some insight data with lower characteristic index values. For example, in this case, the confidence score of 0.95 is selected as the threshold to filter out some insight data with confidence scores lower than 0.95 as shown in FIG8 .

Finally, determine the number of feature types for each insight data with a confidence score higher than 0.95 in Figure 8, and arrange the feature types in descending order to obtain the priority order of the insight data finally presented. The insight data 815, 818, 811 and 810, etc. arranged in descending order of priority in Figure 8 correspond to the insight data 721, 722, etc. presented in Figure 6 or the insight data 821, 822, etc. presented in Figure 7.

The sorting process of the insight data 841 , 842 , etc. shown in FIG. 6 and the insight data 841 , 842 , etc. shown in FIG. 7 is the same as the above arrangement process, and is not described in detail in this case.

The steps in this case to prioritize insight data allow users to quickly find the focus of attention and avoid being confused about where to start exploring.

The following describes the apparatus for generating insights according to an embodiment of the present application in conjunction with FIG9. It should be noted that the apparatus shown in FIG9 can execute the methods shown in FIG4 and FIG5. It should be understood that the apparatus described below can execute the methods of the aforementioned embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the apparatus of the embodiment of the present application.

FIG9 is a schematic diagram of a device for generating insights according to an embodiment of the present application. The device 900 shown in FIG9 includes: an interaction module 910 and a processing module 920 .

Specifically, the interaction module is used to: present a first chart, the first chart includes M chart visualization elements, and each chart visualization element corresponds to at least one data record in a data source.

Specifically, the processing module is used to: confirm N chart visualization elements selected from M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N, determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1, and perform joint data analysis based on the K data records to generate first insight data for the N chart visualization elements.

Optionally, as an embodiment, the processing module is also used to determine characteristic information common to L data records out of K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the characteristic information common to the L data records, and all data records in the data source.

Optionally, as an embodiment, the processing module is further used to generate a numerical distribution or data record traceability inside data records corresponding to N chart visualization elements according to an insight chart generated based on the second insight data.

Optionally, as an embodiment, the processing module is further used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data in order of priority.

Optionally, as an embodiment, the processing module is also used to determine a characteristic index value for each sub-insight data among P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data, confirm Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q, determine the number of characteristic types of each sub-insight data among the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting them in descending order according to the number of characteristic types of each sub-insight data.

Optionally, as an embodiment, the processing module is further used to determine dimensions and metrics in a first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, where the query request is used to query data records in a data source.

The above modules can be implemented by software or hardware. For example, the implementation of the processing module 920 is described below by taking the processing module 920 as an example. Similarly, the implementation of the interaction module 910 can refer to the implementation of the processing module 920.

As an example of a software functional unit, the processing module 920 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the computing instance may be one or more. For example, the processing module 920 may include code running on multiple hosts/virtual machines/containers. It should be noted that the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region or in different regions. Furthermore, the multiple hosts/virtual machines/containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs, each AZ including one data center or multiple data centers with close geographical locations. Among them, generally a region may include multiple AZs.

Similarly, multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs. Usually, a VPC is set up in a region. For cross-region communication between two VPCs in the same region and between VPCs in different regions, a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.

As an example of a hardware functional unit, the processing module 920 may include at least one computing device, such as a server, etc. Alternatively, the processing module 920 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.

The multiple computing devices included in the processing module 920 can be distributed in the same region or in different regions. The multiple computing devices included in the processing module 920 can be distributed in the same AZ or in different AZs. Similarly, the multiple computing devices included in the processing module 920 can be distributed in the same VPC or in multiple VPCs. The multiple computing devices included in the processing module 920 can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

The present application also provides a computing device 1000. As shown in FIG10 , the computing device 1000 includes: a bus 1002, a processor 1004, a memory 1006, and a communication interface 1008. The processor 1004, the memory 1006, and the communication interface 1008 communicate with each other through the bus 1002. The computing device 1000 may be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 1000.

The bus 1002 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG. 10 is represented by only one line, but does not mean that there is only one bus or one type of bus. The bus 1002 may include a path for transmitting information between various components of the computing device 1000 (e.g., the memory 1006, the processor 1004, and the communication interface 1008).

Processor 1004 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).

The memory 1006 may include a volatile memory, such as a random access memory (RAM). The processor 1004 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).

The memory 1006 stores executable program codes, and the processor 1004 executes the executable program codes to respectively implement the functions of the aforementioned interaction module 910 and the processing module 920, thereby implementing the aforementioned method for generating insight data. That is, the memory 1006 stores instructions for executing the aforementioned method for analyzing and generating insight data.

The communication interface 1008 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 1000 and other devices or communication networks.

The embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.

As shown in Fig. 11, the computing device cluster includes at least one computing device 1000. The memory 1006 in one or more computing devices 1000 in the computing device cluster may store the same instructions for executing the above-mentioned insight data generation method.

In some possible implementations, the memory 1006 of one or more computing devices 1000 in the computing device cluster may also respectively store some instructions for executing the above-mentioned method for generating insight data. In other words, the combination of one or more computing devices 1000 may jointly execute instructions for executing the above-mentioned method for generating insight data.

It should be noted that the memory 1006 in different computing devices 1000 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the above-mentioned apparatus. That is, the instructions stored in the memory 1006 in different computing devices 1000 can implement the functions of one or more modules in the interaction module and the processing module.

In some possible implementations, one or more computing devices in the computing device cluster can be connected via a network. Among them, the network can be a wide area network or a local area network, etc. Figure 12 shows a possible implementation. As shown in Figure 12, two computing devices 1000A and 1000B are connected via a network. Specifically, the network is connected through the communication interface in each computing device. In this type of possible implementation, the memory 1006 in the computing device 1000A stores instructions for the functions of the interaction module. At the same time, the memory 1006 in the computing device 1000B stores instructions for executing the functions of the processing module.

It should be understood that the functions of the computing device 1000A shown in FIG12 may also be completed by multiple computing devices 1000. Similarly, the functions of the computing device 1000B may also be completed by multiple computing devices 1000.

An embodiment of the present application also provides a chip, which includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface to execute the above-mentioned method for generating insight data.

The present application also provides a computer program product including instructions. The computer program product may be software or a program product including instructions that can be run on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, the at least one computing device executes the above-mentioned method for generating insight data.

The embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk). The computer-readable storage medium includes instructions that instruct the computing device to execute the above-mentioned method for generating insight data.

The technical features of the above embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

The above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, a person skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the protection scope of the technical solutions of the embodiments of the present application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and modules described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of modules is only a logical function division. There may be other division methods in actual implementation, such as multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or modules, which can be electrical, mechanical or other forms.

The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in one place or distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods of various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

The above are only specific implementations of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

A method for generating insight data, characterized by comprising:

Presenting a first chart, the first chart comprising M chart visualization elements, each of the chart visualization elements corresponding to at least one data record in a data source;

confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N;

Determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1;

Based on the all K data records, joint data analysis is performed to generate first insight data for the N chart visualization elements.
The method according to claim 1, characterized in that the performing joint data analysis based on all K data records comprises:

Determining feature information common to L data records of the K data records, the L data records corresponding to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L;

Data analysis is performed based on the L data records, the feature information common to the L data records, and all data records in the data source.
The method according to claim 1 or 2, characterized in that the first insight data includes at least one of the following insight data types:

A chart metric aggregation expansion analysis, wherein the chart metric aggregation expansion analysis is used to analyze the original data distribution composition of the data records corresponding to the N chart visualization elements;

An analysis of the number of valid records in an external dimension, wherein the analysis of the number of valid records in an external dimension is used to analyze the distribution of the number of valid records of the K data records in a dimension that is not involved in drawing the first chart;

External dimension distribution contribution analysis, the external dimension distribution contribution analysis is used to analyze the contribution of the K data records to the chart measurement on the dimension that does not participate in drawing the first chart;

Internal feature analysis of the external dimensional subspace, wherein the internal feature analysis of the external dimensional subspace is used to analyze the internal feature distribution of the data records in the dimension that is not involved in drawing the first chart;

External highly interpretable metric analysis, the external highly interpretable metric analysis is used to analyze the correlation between the metrics that are not involved in drawing the first chart and the original data records and the L data records.
The method according to any one of claims 1 to 3, characterized in that

The first chart is an insight chart generated based on the second insight data, and the first insight data for generating the N chart visualization elements includes the numerical distribution inside the data records corresponding to the N chart visualization elements or the data record tracing.
The method according to any one of claims 1 to 4, characterized in that the first insight data includes P sub-insight data, where P is a positive integer greater than 1, and the method further comprises:

Determining the priority order of the P sub-insight data;

The P sub-insight data are recommended according to the priority order.
The method according to claim 5, characterized in that the determining the priority order of the P sub-insight data comprises:

Determine a characteristic index value of each sub-insight data in the P sub-insight data, wherein the characteristic index value is used to measure the confidence or significance of each sub-insight data in the P sub-insight data;

Confirm Q sub-insight data whose characteristic index values are higher than the threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q;

Determining the number of feature types of each sub-insight data in the Q sub-insight data;

The priority order of the Q sub-insight data is determined by sorting in descending order according to the number of feature types of each sub-insight data.
The method according to any one of claims 1 to 6, characterized in that the step of determining all K data records corresponding to the N chart visualization elements comprises:

Determining dimensions and metrics in a first chart corresponding to the N chart visualization elements;

Generate a query request based on the dimensions and metrics in the first chart, the query request is used to query the data records in the data source record.
A device for generating insight data, comprising:

An interaction module, configured to present a first chart, wherein the first chart comprises M chart visualization elements, each of the chart visualization elements corresponding to at least one data record in a data source;

A processing module is used to confirm N chart visualization elements selected from the M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N, determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1, and perform joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.
The device according to claim 8 is characterized in that the processing module is also used to determine feature information common to L data records among the K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.
The device according to claim 8 or 9 is characterized in that the processing module is also used to generate the numerical distribution or data record traceability within the data records corresponding to the N chart visualization elements based on the insight chart generated based on the second insight data.
The device according to any one of claims 8 to 10 is characterized in that the processing module is also used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data according to the priority order.
The device according to claim 11 is characterized in that the processing module is also used to determine a characteristic index value for each sub-insight data among P sub-insight data, the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data, confirm Q sub-insight data whose characteristic index value is higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q, determine the number of characteristic types of each sub-insight data among the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting in descending order according to the number of characteristic types of each sub-insight data.
According to the device according to any one of claims 8 to 12, it is characterized in that the processing module is also used to determine the dimensions and metrics in the first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, and the query request is used to query the data records in the data source.
A computing device, comprising a processor and a memory, wherein the processor is used to execute instructions stored in the memory so that the computing device executes the method according to any one of claims 1 to 7.
A computing device cluster, characterized by comprising: at least one computing device, each computing device comprising a processor and a memory;

The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method according to any one of claims 1 to 7.
A computer program product comprising instructions, characterized in that when the instructions are executed by a computing device cluster, the computing device cluster executes the method according to any one of claims 1 to 7.
A computer-readable medium, characterized in that it comprises a computer program, and when the computer program is run on a computer, the computer is caused to execute the method according to any one of claims 1 to 7.