WO2024082754A1 - Procédé et appareil de génération de données d'aperçu - Google Patents

Procédé et appareil de génération de données d'aperçu Download PDF

Info

Publication number
WO2024082754A1
WO2024082754A1 PCT/CN2023/109267 CN2023109267W WO2024082754A1 WO 2024082754 A1 WO2024082754 A1 WO 2024082754A1 CN 2023109267 W CN2023109267 W CN 2023109267W WO 2024082754 A1 WO2024082754 A1 WO 2024082754A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
chart
insight
analysis
sub
Prior art date
Application number
PCT/CN2023/109267
Other languages
English (en)
Chinese (zh)
Inventor
杨昌和
徐科
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024082754A1 publication Critical patent/WO2024082754A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance

Definitions

  • Embodiments of the present application relate to the field of data intelligence, and more specifically, to a method and apparatus for generating insight data.
  • Automatic insight data generation is a very important capability in business intelligence-assisted analysis and decision-making, and has gradually become one of the core competitive advantages of business intelligence products provided by various manufacturers. How to design appropriate front-end interaction processes based on user-provided data, ensure back-end data query performance, improve algorithm feature mining, related case analysis, abnormal pattern definition, cause analysis construction and other capabilities, and finally integrate simple and beautiful front-end display and easy-to-use interaction, and present feedback to users is a key factor in building the competitiveness of insight data generation technology.
  • the insight data analysis related to single-point data can help users to check, discover and gain in-depth understanding of individual chart visualization elements in visual charts when building, browsing and analyzing data.
  • the accuracy of insight data generated based on the analysis granularity of single data points in technical products related to automatic insight data generation is poor, and the interaction and analysis freedom supported are still lacking, and there is still room for further improvement and optimization.
  • the embodiments of the present application provide a method and device for generating insight data, which can realize batch selection of multiple chart visualization elements in a chart to generate insight data, thereby improving the accuracy of the insight data and the degree of interactive freedom.
  • a method for generating insight data comprising: presenting a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source; confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N; determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1; and performing joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.
  • joint data analysis is performed based on all K data records, including: determining feature information common to L data records among all K data records, the L data records corresponding to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L; performing data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.
  • the first insight data includes at least one of the following insight data types: chart metric aggregation expansion analysis, used to analyze the original data distribution composition of the data records corresponding to the N chart visualization elements; external dimension valid record number analysis, used to analyze the distribution of the number of valid records of the K data records in the dimensions that do not participate in drawing the first chart; external dimension distribution contribution analysis, used to analyze the contribution of the K data records to the chart metrics in the dimensions that do not participate in drawing the first chart; external dimensional subspace internal feature analysis, the external dimensional subspace internal feature analysis is used to analyze the feature distribution inside the data records in the dimensions that do not participate in drawing the first chart; external high interpretability metric analysis, the external high interpretability metric analysis is used to analyze the association between the metrics and original data records that do not participate in drawing the first chart and the L data records.
  • chart metric aggregation expansion analysis used to analyze the original data distribution composition of the data records corresponding to the N chart visualization elements
  • external dimension valid record number analysis used to analyze the distribution of the number of valid records of the K
  • the method can guide users to explore the analysis content of the associated data, such as the composition of abnormal aggregate values, the potential reasons why the aggregate values of the visualization chart elements show a specific pattern, the potential high contribution dimensions, and the value distribution within the subspace.
  • the first chart is an insight chart generated based on the second insight data
  • the first insight data for generating N chart visualization elements includes generating the numerical distribution within the corresponding data records or data record traceability of the N chart visualization elements.
  • the method supports secondary analysis and exploration of the focused feature subspace of the insight chart and derives insights, optimizes the multi-level subspace analysis and exploration process in the automatic insight generation auxiliary analysis process, and improves the analysis freedom from surface to point, from shallow to deep.
  • the numerical distribution within the data records corresponding to the N chart visualization elements helps users further explore the patterns of interest in the dimensional distribution chart of the algorithm recommendation insights and explore the reasons for the distribution characteristics; data record tracing can help users conveniently query the original data for abnormal parts of the distribution of the recommended insights and explore the reasons for the distribution characteristics.
  • a priority order of P sub-insight data included in the first insight data is determined; and the P sub-insight data are recommended according to the priority order.
  • the method can avoid generating a large number of disordered insight charts at one time and presenting them to the user, so that the user can quickly decide where to explore, thereby improving the efficiency of the user in obtaining and analyzing insights.
  • determining the priority order of P sub-insight data also includes: determining a characteristic index value for each sub-insight data in the P sub-insight data, the characteristic index value being used to measure the confidence or significance of each sub-insight data in the P sub-insight data; confirming Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q; determining the number of feature types for each sub-insight data in the Q sub-insight data; and determining the priority order of the Q sub-insight data by sorting in descending order according to the number of feature types of each sub-insight data.
  • the method realizes the comprehensive consideration of the confidence of all features possessed by each opinion within the same category of opinions and the feature richness of the opinions.
  • determining all K data records corresponding to N chart visualization elements also includes: determining dimensions and metrics in a first chart corresponding to the N chart visualization elements; generating a query request based on the dimensions and metrics in the first chart, the query request being used to query data records in a data source.
  • the method realizes rapid positioning of chart information contained in a chart visualization element, and the chart information can realize rapid query of data records corresponding to the chart visualization element and selection of focus dimensions/metrics for generating insight data.
  • a device for generating insight data comprising: an interaction module, for presenting a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source; a processing module, for confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N, determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1, and performing joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.
  • the processing module is further used to determine feature information common to L data records out of K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.
  • the processing module is also used to generate the numerical distribution or data record traceability within the data records corresponding to N chart visualization elements based on the insight chart generated based on the second insight data.
  • the processing module is also used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data in order of priority.
  • the processing module is also used to determine a characteristic index value for each sub-insight data in the P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data in the P sub-insight data, confirm Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1 and P is greater than Q, determine the number of feature types for each sub-insight data in the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting them in descending order according to the number of feature types of each sub-insight data.
  • the processing module is also used to determine the dimensions and metrics in the first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, which is used to query data records in the data source.
  • a computing device comprising a processor and a memory, wherein the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory, so that the computing device executes the method in the first aspect or any possible implementation of the first aspect.
  • a computing device cluster comprising at least one computing device, each computing device comprising a processor and a memory, wherein the memory is used to store instructions, and the processor is used to call and execute the instructions from the memory, so that the computing device cluster executes the method in the first aspect or any possible implementation of the first aspect.
  • the processor may be a general-purpose processor, which may be implemented by hardware or software.
  • the processor may be a logic circuit, an integrated circuit, etc.; when implemented by software, the processor may be a general-purpose processor, which is implemented by reading software codes stored in a memory, which may be integrated in the processor or may be located outside the processor and exist independently.
  • a chip which obtains instructions and executes the instructions to implement the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
  • the chip includes a processor and a data interface, and the processor reads instructions stored in the memory through the data interface to execute the method in the above-mentioned first aspect or any possible implementation of the first aspect.
  • the chip may also include a memory, in which instructions are stored, and the processor is used to execute the instructions stored in the memory.
  • the processor is used to execute the method in the above-mentioned first aspect or any possible implementation method of the first aspect.
  • a computer program product comprising instructions is provided.
  • the computing device cluster executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
  • a computer-readable storage medium comprising computer program instructions.
  • the computing device cluster executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
  • these computer-readable storage media include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), Flash memory, electrically EPROM (EEPROM), and hard drive.
  • ROM read-only memory
  • PROM programmable ROM
  • EPROM erasable PROM
  • Flash memory electrically EPROM (EEPROM)
  • hard drive electrically EPROM
  • the above-mentioned storage medium may specifically be a non-volatile storage medium.
  • FIG1 is a schematic diagram of an application scenario for generating insight data provided in an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another application scenario for generating insight data provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of a system architecture provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of an insight data generation process provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a sorting strategy provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of a case study of an insight data generation process provided in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another example of an insight data generation process provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of an example of a sorting strategy provided in an embodiment of the present application.
  • FIG. 9 is a schematic structural block diagram of an apparatus for generating insight data provided in an embodiment of the present application.
  • FIG. 10 is a schematic structural block diagram of a computing device provided in an embodiment of the present application.
  • FIG. 11 is a schematic structural block diagram of a computing device cluster provided in an embodiment of the present application.
  • FIG. 12 is a schematic structural block diagram of another computing device cluster provided in an embodiment of the present application.
  • the size of the serial number of each process does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
  • the network architecture and business scenarios described in the embodiments of the present application are intended to more clearly illustrate the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided in the embodiments of the present application.
  • a person of ordinary skill in the art can appreciate that with the evolution of the network architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
  • references to "one embodiment” or “some embodiments” etc. described in this specification mean that a particular feature, structure or characteristic described in conjunction with the embodiment is included in one or more embodiments of the present application.
  • the phrases “in one embodiment”, “in some embodiments”, “in some other embodiments”, “in some other embodiments”, etc. appearing in different places in this specification do not necessarily all refer to the same embodiment, but mean “one or more but not all embodiments", unless otherwise specifically emphasized in other ways.
  • the terms “including”, “comprising”, “having” and their variations all mean “including but not limited to”, unless otherwise specifically emphasized in other ways.
  • At least one means one or more
  • plural means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can mean: including the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A and B can be singular or plural.
  • the character “/” generally indicates that the previous and next associated objects are in an “or” relationship.
  • “At least one of the following” or similar expressions refers to any combination of these items, including any combination of single or plural items.
  • At least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple.
  • Dimension is a classification method for fields in a data set. Fields that have a certain classification meaning for data are called dimensions. Usually, the data is in the form of enumerable values, such as "month”, "ID”, etc.
  • Metrics Indicator fields with quantifiable data are called metrics, usually in numerical form.
  • the aggregate value is the summary value or total value generated by a single field in a data set in a filtered data subset after some calculation operations, such as sum aggregation, mean aggregation, etc.
  • Record refers to one or more rows in a database table that constitutes a data set.
  • Chart Visualization Element is a selectable data point in a visualization chart that summarizes some basic record values in the data.
  • the data of a chart visualization element can consist of a single record or multiple records aggregated together.
  • Chart visualization elements in a visualization chart can be displayed in a variety of ways such as points, lines, shapes, etc.
  • Internal and external Internal refers to the dimensions and measurements involved in the analysis participating in the drawing of the chart that constitutes the user's current analysis; external refers to the dimensions and measurements involved in the analysis not participating in the drawing of the chart that constitutes the user's current analysis.
  • Automatic insight data generation is a very important capability in business intelligence-assisted analysis and decision-making, and has gradually become one of the core competitive advantages of business intelligence products provided by various manufacturers. How to design appropriate front-end interaction processes based on user-provided data, ensure back-end data query performance, improve algorithm feature mining, related case analysis, abnormal pattern definition, cause analysis construction and other capabilities, and finally integrate simple and beautiful front-end display and easy-to-use interaction, and present feedback to users is a key factor in building the competitiveness of insight data generation technology.
  • FIG1 shows an insight data generation system, which may include a user device and a data processing device.
  • the user device may include a smart terminal such as a mobile phone, a personal computer or an information processing center.
  • the user device may be used as the initiator of the insight data generation request.
  • the above-mentioned data processing device can be a device or server with data processing function such as a cloud server, a network server, an application server or a management server.
  • the data processing device receives the instruction of selecting the visualization element of the chart from the intelligent terminal through the interactive interface, and then performs data processing in the form of machine learning, deep learning, search, reasoning, decision-making, etc. through the memory storing the data and the processor link of the data processing.
  • the memory in the data processing device can be a general term, which can be a local storage device storing historical data or a storage manager in the database.
  • the user device can receive an instruction from a user to select one or more chart visualization elements in a visualization chart, and then initiate a screening and query request to a data processing device to find out the fine-grained original records of the selected chart visualization elements, so that the data processing device performs data analysis on the original data records corresponding to the one or more chart visualization elements selected by the user device, thereby generating insight data for one or more chart visualization elements.
  • the data processing device can execute the insight data generating method of the embodiment of the present application. It should be noted that although the user device and the data processing device are depicted as independent devices in Figure 1, in other embodiments of the present application, the two devices can be implemented by the same device.
  • FIG2 shows another insight data generation system.
  • the user device can be directly used as a data processing device.
  • the user device can directly receive input from the user and process it directly by the hardware of the user device itself.
  • the specific process is similar to that of FIG1 . Please refer to the above description and will not be repeated here.
  • the user device in FIG. 2 may be a server with data processing capabilities such as a cloud server, a network server, an application server or a management server, or may be an electronic device with data processing capabilities such as a desktop computer, a mobile computer, a tablet computing device or a mobile communication device.
  • a server with data processing capabilities such as a cloud server, a network server, an application server or a management server
  • an electronic device with data processing capabilities such as a desktop computer, a mobile computer, a tablet computing device or a mobile communication device.
  • the user device can receive an instruction from the user to select one or more chart visualization elements in a visualization chart, and then the user device itself initiates a request to perform data analysis on the selected one or more chart visualization elements, thereby generating insight data for the one or more chart visualization elements.
  • the user device itself can execute the insight data generating method of the embodiment of the present application.
  • the processors in FIG. 1 and FIG. 2 can perform data analysis according to business needs.
  • the insight analysis of the chart is performed, and a variety of different analysis modes are supported, including statistical value feature analysis, distribution feature analysis, null value alarm analysis, zero value alarm analysis, high correlation measurement analysis, global-subset difference analysis, etc., and different types of insights can be detected from the statistical analysis and traditional machine learning levels.
  • Features of different categories are generated, and a variety of feature descriptions are customized to obtain insight analysis of the interest data behind the visualization elements of the chart screened by the user.
  • the system architecture 100 may include an execution device 110, a database 130, a client device 140, a data storage system 150, and a data acquisition device 160. It should be understood that FIG1 is only for illustration, and optionally, the system architecture may include more or fewer databases and execution devices, or other functional modules.
  • the data acquisition device 160 can be used to collect chart data, and in the embodiment of the present application, the chart data can be used to generate a visual chart containing chart visualization elements. After collecting the chart data, the data acquisition device 160 stores the data in the database 130.
  • the training data maintained in the database 130 does not necessarily all come from the collection of the data acquisition device 160, and may also be received from other devices, for example, it may also be directly obtained from the cloud or other places.
  • the execution device 110 does not necessarily generate insights based entirely on the training data maintained by the database 130, and it is also possible to obtain data from the cloud or other places to generate insights. The above description should not be used as a limitation on the embodiments of the present application.
  • the database may be a hardware device, may be integrated in the execution device 110, or may be set up on a cloud or other network server.
  • the generation of visual charts and insight data can be applied to different systems or devices, such as being applied to the execution device 110 shown in FIG. 3 and presented on the application interface 120.
  • the execution device 110 can be the data processing device in FIG. 1, can be a terminal, such as a mobile terminal, a tablet computer, a laptop computer, an AR/VR or a vehicle-mounted terminal, etc., can also be a server or a cloud, etc.
  • the execution device 110 can be configured with an input/output (I/O) interface 112 for data interaction with an external device.
  • I/O input/output
  • the user can input data to the I/O interface 112 through the client device 140, and the input data can include: instructions for selecting one or more chart visualization elements and dimensions and metrics of the visualization charts corresponding to the chart visualization elements in the embodiment of the present application.
  • the execution device 110 can call the data, code, etc. in the data storage system 150 for corresponding processing, and can also store the input data obtained by the corresponding processing in the data storage system 150.
  • the I/O interface 112 feeds back the processing result, for example, the generated insight data, to the client device 140.
  • the client device may also be the execution device 110 in FIG3, and the fed-back insight data is presented on the application interface 120 of the execution device.
  • the execution device 110 includes an application interface 120.
  • the application interface 120 can be an interface of a client application stored locally on the execution device 110, or it can be an interface of a client application located on a remote server and accessible through a network (such as the Internet or an intranet).
  • a network such as the Internet or an intranet
  • it can be an application interface that is hosted in a browser-controlled environment or coded in a language supported by the browser and relies on a web browser to perform data calculations.
  • the application interface 120 may include a visualization chart interface 121 and an insight data interface 125 , or the visualization chart interface 121 and the insight data interface 125 may be presented through multiple application interfaces.
  • the visualization chart interface 121 may include one or more different types of charts and interface configuration information.
  • the interface configuration information may include modules such as dimension options, measurement options, and chart interface setting modules, or elements such as axis configuration information for selecting charts to be drawn, and chart raw data. It should be understood that FIG. 3 is only an example.
  • the visualization chart interface 121 may also include more selection modules, such as Select the module as chart type.
  • the insight interface 125 may include one or more insight data 126, 127, and the insight data 126 and the insight data 127 may include insight charts or insight texts.
  • the insight data is obtained according to the chart 122 or the chart 123 in the visualization chart interface 121, and the insight data interface may also include an insight mode selection module or an analysis type for selecting insight data generation.
  • the analysis type may be distribution feature analysis, null value alarm analysis, zero value alarm analysis, high correlation metric analysis, global-subset difference analysis, etc., or may be a customized feature analysis.
  • the analysis results formed may be described by different chart types and corresponding text insight information, and displayed in insight charts or insight texts. It should be understood that FIG. 3 is only an example, and optionally, the insight interface 125 includes more modules, such as an insight data sorting module.
  • Figure 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.
  • other modules may also be included in the system architecture, such as a chart drawing module.
  • the visual chart interface and the insight interface may not be in the same application interface.
  • the scenarios to which the embodiments of the present application can be applied are not limited to those shown in Figure 3.
  • the insight data analysis related to single-point data can help users to inspect, discover and gain in-depth understanding of individual chart visualization elements in a visualization chart when building, browsing and analyzing data.
  • the data records of unselected chart visualization elements will interfere with the insight data analysis formed by the selected single chart visualization element, making it difficult to ensure the accuracy of the overall insight analysis formed by selecting multiple chart visualization elements, and the interaction cost is high.
  • the outliers are the user's interest data, and the user wants to obtain insight data on the causes of the three outliers. If the user only selects one of the outliers to generate insight data, the other two outliers also participate in the analysis process of the insight data. As a result, the insight data may be biased, for example, the selected outlier may be judged as a normal value due to the presence of the other two outliers. Therefore, the unselected chart visualization elements may interfere with the insight data formed by the selected single chart visualization element.
  • the technical products related to automatic insight generation lack the auxiliary insight generation solution of batch selection of local data based on user-provided interactive attention, which is based on the analysis granularity of single data points.
  • the interaction and analysis freedom supported are still lacking, and there is still room for further improvement and optimization.
  • Figure 4 shows a schematic flowchart of a method 400 for generating insight data provided by an embodiment of the present application.
  • the method of Figure 4 can be executed by the data processing device of Figure 1 or the user device of Figure 2.
  • Step 410 Present a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source.
  • the first chart may be presented in the visualization chart interface 121 in the application interface 120, or in any visualization interface.
  • the data records for drawing the first chart may be all or part of the data in the database 130, or all or part of the data in one or more tables in any data source.
  • the first chart includes M chart visualization elements, such as bars of a bar chart, discrete data points of a scatter plot, data points and adjacent lines of a line chart, sectors of a pie chart or a donut chart, and other graphical representations of data records.
  • Each chart visualization element is drawn by a data record.
  • a single chart visualization element may correspond to a single data record, or may correspond to an aggregate value of multiple data records, that is, a summary value or total value finally generated after some calculation operations are performed on multiple data records, such as sum aggregation, mean aggregation, etc.
  • Step 420 confirming N chart visualization elements selected from the M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N.
  • step 420 can support the interactive mode of selecting multiple chart visualization elements and confirm the selected multiple chart visualization elements.
  • step 420 in the technical solution of the present application may also support an interactive mode of selecting a single chart visualization element and confirming the selected single chart visualization element.
  • the user can click and select one or N chart visualization elements through the visualization chart interface of the application interface.
  • the application interface is an interface of a spreadsheet application on a desktop computer, and the user can drag and select with the mouse to generate a selection box, and one or N chart visualization elements in the selection box are determined as the chart visualization elements selected by the user.
  • the chart visualization elements can be It can be one or N bars in a bar chart, one or N data points in a line chart or scatter chart, or one or N sectors in a pie chart or donut chart.
  • the user may select one or N chart visualization elements by clicking on a single selection or multiple selections.
  • the user may click on one or N chart visualization elements at the same time with a mouse, and the selected one or N chart visualization elements are determined as the chart visualization elements selected by the user.
  • the selected chart visualization elements may be discontinuous in the dimension of the x-axis of the chart, and the selected chart visualization elements may be separated by one or more chart visualization elements.
  • the focus data supported by this technical solution can be batch supported on a variety of different charts, and the highlights selected by the user can be retained when the chart type is switched, ensuring that the highlights are always used for user insight generation.
  • the batch selection method in step 420 can be performed on a variety of different charts, such as bar charts, line charts, or scatter charts, and the data that is brushed will be highlighted relative to the data that is not brushed, and will remain highlighted when the type of chart for analyzing the data is switched. For example, when drawing a chart for the same set of data, the chart visualization elements that the user is interested in are brushed on the bar chart, and the chart visualization elements are then highlighted. When the user switches the bar chart to a line chart, the chart visualization elements will still remain highlighted. When the user switches the chart type for the bar chart that has been generated and the visualization elements have been brushed, the visualization elements containing the data records corresponding to the brushed visualization elements will also be highlighted in the new chart.
  • charts such as bar charts, line charts, or scatter charts
  • Step 430 Determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1.
  • the method of confirming all K data records corresponding to N chart visualization elements may be to determine the interactive form of the chart visualization elements of the user interactively selecting the visualization chart.
  • the interactive form of the user interaction may be the operation of the user performing a swiping interaction.
  • the interactive form is to swipe the chart visualization element data when the horizontal axis dimension takes a specific value.
  • the dimension data bound to the chart visualization element may be a specific value of the dimension corresponding to the chart visualization element data, and the analysis granularity may be a specific category of the dimension.
  • the user is concerned about the characteristics of the chart visualization element when the dimension corresponding to the horizontal axis of the chart takes a specific value, and the subsequent insight data of this solution will also focus on the horizontal axis dimension of the chart.
  • the x-axis dimension is time
  • the interactive operation performed by the user is to swipe the three bar chart visualization elements of the bar chart along the x-axis direction of the visualization chart.
  • the time periods corresponding to the three bar chart visualization elements constitute field A
  • the solution analyzes that the user's interactive operation is swiping along the x-axis
  • the dimension data field bound to the x-axis of the visualization chart is field A of a certain time type
  • the analysis granularity is a specific time type, such as month, time period, etc.
  • the dimension data field all data records corresponding to the dimension data field A and the specific time type are filtered out from the data source or database.
  • a method for confirming all K data records corresponding to N chart visualization elements may be to directly extract the enumeration value of the data point corresponding to the interactively selected chart visualization element, and the enumeration value of the data point may be the specific value of the field corresponding to the chart visualization element of the chart.
  • the enumeration value may be one or more different values of month 1 to month 12 corresponding to the selected chart visualization element, or it may be a combination of the value of the external dimension information of the data set bound to the horizontal axis or legend in the selected chart visualization element and the value of the month.
  • Determining the dimension combination related to the chart visualization element in the chart may include the specific value of the external dimension that is not involved in drawing the first chart and the selected multiple enumeration values of the dimension that participates in drawing the first chart, and may also include the dimension combination and measurement related to the chart visualization element, or other relevant information of the chart visualization element.
  • the technical solution of the present application can integrate the information contained to generate filtering logic for filtering and searching data sets or original data records in databases.
  • the subsequent insight data of this solution will also focus on the dimensions or dimension combinations related to the filtered fine-grained original records and chart visualization elements.
  • the logic for filtering the first chart data can be a logical combination of A or B or C with different values.
  • this solution can convert one or N chart visualization elements interactively selected by the user on the first chart (corresponding to the aggregated results of one row or part of the rows in the original data set records) into query requests to further query all K data records of the interactively selected chart visualization elements in the original data set or database, and use all K data records for subsequent insights.
  • the query request may be any combination of all information of the request field, a list of filter operators, a list of filter enumeration values, and filter logic, and the object of the request may be a backend module.
  • SQL Structured Query Language
  • the Structured Query Language supports the generation of a where clause, performs a query on the original table records, and returns the subset of interest data corresponding to the chart visualization element selected by the user to the algorithm module.
  • the user interest data filtering generated by the interactive operation guidance is all records that satisfy the dimension A field in the data set participating in the analysis. Then, this solution corresponds to the final generated SQL query statement, which is a composite implementation of multiple dimension A fields in the where clause based on the IN operator or logic.
  • the solution is not limited to single-table analysis queries of a single data source, but can also support queries for multiple related data tables in the original data source, and can support federated queries and subsequent analysis.
  • the functional bottom layer of the solution is based on a distributed SQL query engine, which then merges multiple tables into a data set level to obtain data.
  • Step 440 Perform joint data analysis based on the K data records to generate first insight data for N chart visualization elements.
  • the joint data analysis process of K data records is different from the data analysis process of selecting a single chart visualization element and the analysis process of selecting multiple chart visualization elements and then performing data analysis on a single chart visualization element and then integrating data analysis information.
  • the joint data analysis can analyze the K data records as a whole and with other data in the data set, thereby generating N chart visualization elements or at least two of the N chart visualization elements compared with other data records to obtain insight data.
  • the K data records are taken as a whole, and the association relationship of each data record in the K data records is determined, and the association relationship determines the characteristic information shared by the K data records.
  • the K data records may have the same or similar external dimensions, or may be data records with a correlation relationship, or may present the same or opposite measurement value phenomena, and the shared characteristic information is the external dimension or correlation relationship analysis data or measurement value phenomenon corresponding to the K data records.
  • the data records with shared characteristic information in the data source are screened out, and the K data records and the data records with shared characteristic information are subjected to data analysis to form insight data of N chart visualization elements.
  • L of the K data records may have common feature information, and the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than L.
  • insight data of at least two chart visualization elements corresponding to the L data records are generated.
  • the data records for comparison may be data records corresponding to (M-N) chart visualization elements that are not selected from the M chart visualization elements in step 420, or may be (K-L) data records that are not selected from the K data records.
  • the technical solution of the present application can realize simultaneous analysis of multiple chart visualization elements and their original data records, and obtain insight data containing the association information of multiple chart visualization elements.
  • the insight data generated by the technical solution of the present application is different from the insight data obtained by selecting a single chart visualization element for analysis and selecting a single chart visualization element for analysis multiple times and then integrating it, which reduces the interference of chart visualization elements that are not selected but have an associated relationship on the selected chart visualization elements during the analysis of insight data.
  • the number of times the insight data of at least two chart visualization elements corresponding to L data records are generated may be more than once, and the value of L may be different, and finally a plurality of different data candidate sets may be formed.
  • These different data candidate sets may be analyzed by strategies or algorithms to form a plurality of different insight data, and the data candidate sets serve as subspaces for strategy or algorithm analysis.
  • the present application analyzes the data candidate set in the data subset according to the insight data generation strategy or algorithm to generate insight data that can demonstrate the characteristics of the subspace data.
  • the insight data generation algorithm input may include the full amount of full table data records, the screening conditions corresponding to the visualization elements of the original visualization chart selected by the user interaction, the original records of the data of interest generated by the query request, the common feature information of the original data records, and the algorithm parameters configured by the front-end user interaction.
  • the insight data results produced by the algorithm may include the original chart data, chart type information, axis configuration information, or text insight description information required to draw the insight chart.
  • the insight data generation algorithm can support multiple analysis modes, such as statistical value feature analysis, distribution feature analysis, null value warning analysis, zero value warning analysis, high correlation measurement analysis, global-subset difference analysis, etc. It can also support the detection of different categories of features for different types of insights from the statistical analysis and traditional machine learning levels, and generate a variety of customized feature descriptions.
  • the insight data generation algorithm can adaptively use different chart types for display, such as using a bar chart for distribution charts, and flexibly using a scatter plot logarithmic axis and a linear axis based on data distribution for correlation measurement charts.
  • the textual insight description information of insight data also varies.
  • the textual description can be a description of the characteristics of the insight, a possible priori pattern analysis composed of various characteristics, a combination of the two, or other descriptions that can explain the characteristics.
  • the insight data generation algorithm can produce multiple different types of insights, support the contribution of metrics or dimensions inside and outside the analysis chart to the generation of patterns of user interest selection, and guide users to explore the analysis content of data records corresponding to the selected chart visualization elements and related data in the data set.
  • the types of insights generated by the algorithm may include chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, external dimension subspace internal feature analysis, and external high interpretability metric analysis, but are not limited to these types of insights.
  • the chart metric aggregation expansion analysis can focus on decomposing the bound metric aggregation values of the characteristic visualization chart visualization elements into the original data distribution composition, helping users understand the composition of the aggregation value.
  • the vertical axis in a common analysis chart is the sum aggregation of metrics, and the user pays attention to the chart visualization elements with higher aggregation values.
  • This type of insight data can help users understand the composition of abnormal aggregation values, such as a single original abnormal record, or the overall distribution has a certain bias.
  • the analysis of the number of valid records in the external dimension can focus on exploring the data records selected by the user interactively, and the distribution of the number of valid records in other external dimensions (not involved in the chart drawing) to analyze the potential reasons why the aggregate values of the visualization elements of the visualization chart selected by the user show a specific pattern. If it is found that the original data records corresponding to a specific pattern are aggregated in a certain dimension in a certain value subspace, this method believes that the aggregation has a greater correlation with the presentation of this pattern.
  • external dimension distribution contribution analysis can focus on exploring the contribution distribution of data records selected by users in other external dimensions (not involved in chart drawing) to the chart metrics that users are interested in.
  • This type of explanation essentially disassembles the aggregated value in another direction outside the chart to find potential high-contribution dimension values for users to further explore.
  • users find a dimensional subspace of interest they can further use the data explanation subspace distribution exploration function to view the detailed distribution within the subspace.
  • the internal feature analysis of the external dimension subspace can be highly related to the above-mentioned explanation related to the external dimension, and can support automatic search and recommendation of some subspaces of dimensional values, where the numerical distribution inside such subspaces has certain characteristics for the metric distribution selected by the user. Based on the subspace distribution, the user can further use the traceability original data recording function to analyze the source of the feature pattern.
  • external highly interpretable metric analysis can focus on data patterns measured in a subset of data that the user is interested in, perform highly correlated metric analysis on the full set of data and the subset of data respectively, obtain a batch of explanatory external metric candidates, and further analyze them to obtain metrics with higher surprise, and display the correlation between the metric and the chart metric that the user is interested in through a scatter plot pattern, in order to explore possible insights from the data.
  • the algorithm can also generate multiple different types of insights to support the analysis of associations within the data records corresponding to the selected chart visualization elements.
  • the types of insights generated by the algorithm may include chart visualization element trend analysis, chart visualization element cluster analysis, etc., but the embodiments of the present application are not limited to these types of insights.
  • the trend analysis of the chart visualization element can focus on the trend pattern of the data record corresponding to the selected chart visualization element as the x-axis dimension changes.
  • the periodic change pattern that may exist in the data record is obtained from the numerical high points or numerical low points that may appear in the data record as a whole.
  • the trend analysis of the chart visualization element can also be used for the prediction of data records. For another example, when there are some outliers in the data record showing a specific trend, only non-outliers can be selected for analysis in the process of selecting the chart visualization element, and the outliers can be skipped to improve the accuracy of the trend analysis.
  • the cluster analysis of chart visualization elements can focus on the clustering patterns and differences of data records corresponding to multiple chart visualization elements selected in batches.
  • this insight type can classify data records corresponding to chart visualization elements in one or more charts into aggregate classes based on the intrinsic properties of the data, where data records in each aggregate class have the same characteristics, and data records in different aggregate classes have greatly different characteristics.
  • This insight type can analyze data tables in multiple data sources and classify data records corresponding to multiple chart visualization elements as much as possible.
  • the automatically generated insight data presentation of the data interpretation function in the technical solution of the present application can be in the form of free expansion and contraction similar to an accordion, and is divided into two layers.
  • the title of the first layer of the accordion marks the names of different insight categories.
  • the second layer displays the specific insights recommended by all algorithms under this type of insight.
  • the text description and chart drawing of this type of insight data will be displayed specifically.
  • other insight data will be folded to ensure the neatness of the front-end interface.
  • the present technical solution can support users to freely observe the charts and text results recommended by the algorithm in each type of different insights, where all charts generated by the algorithm also support basic interactive methods such as interactive selection, highlighting, and legend switching, thereby optimizing the user's exploration and analysis process experience, and also providing users with the possibility to conduct interactive analysis in the feature subspace of the insight chart.
  • this technical solution can support users to export the insight charts of interest generated by data interpretation to the dashboard, and display them at the same level as the original charts, while displaying the insight text information on the right.
  • This function supports associated highlighting, that is, when the user selects the insight chart exported to the dashboard, the chart that generated the insight data will be highlighted synchronously, and the user's filter information when the parent chart generated the insight data will be highlighted. Selected interest data.
  • this technical solution can also be applied in cloud environment scenarios, and can be compatible with insight saving related functions in the microservices where it is located, and can be saved, previewed, and loaded like ordinary charts.
  • steps 410-440 can effectively produce accurate and inspiring insights, but when providing data interpretation operations, the local data subset observed by the user cannot be subsequently analyzed, which to some extent limits the user's interactive exploration method.
  • another embodiment of the present application shows a method 450 for generating insight data, providing further generation of insight subspaces to achieve subsequent analysis of insight data to further insight data.
  • the method includes steps 460-490, which are described in detail below.
  • Step 460 Present an insight chart in the second insight data, the insight chart comprising W chart visualization elements, each chart visualization element corresponding to at least one data record in the data source.
  • the second insight data presented may be any insight data generated by selecting a chart visualization element, such as the first insight data in step 440, or may be insight data generated by further analyzing any generated insight chart.
  • the type of the second insight data can be any of the insight types mentioned above, such as chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, external dimension subspace internal feature analysis, and external high interpretability metric analysis, etc., or it can be other types of insight analysis.
  • the insight chart may further include statistical characteristic values or extreme values corresponding to the insight text descriptions in the insight chart.
  • Step 470 confirming J chart visualization elements selected from W chart visualization elements, where W and J are positive integers greater than 1.
  • Step 480 Determine all H data records corresponding to the J chart visualization elements, where H is a positive integer greater than 1.
  • step 470 and step 480 are substantially the same as the processes of step 420 and step 430 and are not described in detail herein.
  • Step 490 Generate third insight data based on all H data records, where the third insight data includes data distribution analysis or data record tracing of the H data records.
  • the first chart in step 410 can be an insight chart in any insight data in step 460
  • the insight type of the third insight data in step 490 can be the insight type of the first insight data in step 440, or other insight types can be added based on the insight type of the first insight data.
  • the generated third insight data may be a further analysis of the data record distribution within a subspace of the insight data, aiming to help users further explore patterns of interest in the dimensional distribution chart of the algorithm-recommended insights.
  • the insight chart of step 460 in the present technical solution is a derived insight chart of the external dimension valid record number analysis or external dimension distribution analysis type
  • the technical solution will again generate a subspace metric distribution insight chart that also supports interaction.
  • the selection of the metric for this insight is related to the metric associated with this type of insight and the metric of the chart that originally generated the data interpretation.
  • the further insight data generated can be original data traceability, aiming to help users easily perform original data queries on abnormal parts of the recommended insight distribution and explore the reasons for the distribution characteristics.
  • the insight chart of step 460 in the present technical solution is an insight chart derived from the aggregation and expansion analysis of chart metrics, the internal feature analysis of the external dimension subspace, and the above-mentioned subspace distribution exploration, since multiple sufficiently fine-grained downward analyses have usually been performed when executing this functional operation, the original data records directly returned are often small in number, but have strong explanatory power.
  • the present technical solution supports convenient tracing of the original records, and both use a consistent display format.
  • the present technical solution uses a paginated table to display the original records.
  • the type of the third insight data is not limited to the above two insight types, and may also be any of the insight data types mentioned above, such as the chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, etc.
  • This technology can support users to continue to carry out rich interactive operations on the data interpretation charts derived from the algorithm, and realize further focused analysis within the insight feature subspace. After constructing the feature subspace of interest, click the corresponding content in the function menu to ensure the logical continuity of the internal use of the data interpretation function in this technical solution, reducing the learning cost.
  • this application designs a sorting strategy to determine the priority order of multiple sub-insights in an insight. Multiple sub-insights are recommended in order of priority, and the final result is generated by sorting.
  • the sorting strategy is applied after generating the insight chart in steps 440 and 490 and before presenting the insight interface.
  • Figure 5 shows a schematic flowchart of an embodiment of the sorting strategy 500 of the present application, which comprehensively considers two aspects: the confidence of the full amount of features possessed by each insight within the same type of insight and the feature richness of the insight.
  • the method includes steps 510-540, and steps 510-540 are described in detail below. Assume that the insight data includes P sub-insight data.
  • Step 510 Determine a characteristic index value of each sub-insight data among the P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data.
  • the present technical solution formulates different measurement methods respectively.
  • statistical features can be described based on indicators such as the number of eigenvalues and the degree of outliers of outliers relative to the full data.
  • Distribution analysis features can be described by indicators such as the unevenness of distribution and the proportion of maximum distribution.
  • Alarm-related analysis features can be described by the corresponding proportion of alarm values.
  • Correlation measurement analysis features can be described by the above-mentioned measurement indicators. Difference analysis can be described based on the KL divergence of discrete distribution after binning, etc.
  • Step 520 Confirm Q sub-insight data whose feature index values are higher than the threshold value of the feature index value, where Q is a positive integer greater than 1, and P is greater than Q.
  • the technical solution filters feature insights whose feature index values are lower than the threshold value, which may be by placing the insights with lower feature indexes at the very end of the presentation interface queue, or by deleting the feature insights.
  • Step 530 Determine the number of feature types of each sub-insight data among the Q sub-insight data.
  • Step 540 Determine the priority order of the Q sub-insight data by sorting in descending order according to the number of feature types of each sub-insight data.
  • the present technical solution can count the description of its feature richness and determine the recommendation priority of different sub-insight data by sorting in descending order.
  • the feature indicators of the features of different insights are sorted in descending order and then compared in sequence to determine the priorities.
  • the above method 400 and method 450 can be used alone or in combination.
  • the following describes the method used in combination to achieve insight data generation with a specific example, and describes the implementation of the sorting strategy 500 with the example.
  • Figures 6, 7 and 8 show detailed cases of using the technical solution of the present application to achieve insight data generation and the step-by-step exploration and in-depth process of insight data generation.
  • the elements and data in the cases are all examples, and actual cases include but are not limited to the cases in Figures 6, 7 and 8.
  • FIG6 shows an application interface 600.
  • the application in this case may be a table application or a data intelligence analysis application.
  • the application interface 600 includes a data table 610, which includes multiple dimensions and metrics.
  • the data table is the data source.
  • multiple data sources may be included, and each data source may include multiple data tables.
  • the location dimension A, B and C and the unit price dimension a, b and c are summed and aggregated to form the sales volume X, that is, the location dimension and the unit price dimension are external dimensions and do not participate in the drawing of the chart.
  • Their data records are directly accumulated to obtain the summed aggregate value.
  • Chart 620 contains five chart visualization elements, that is, five columns, and the value of the data point of each chart visualization element is summed and aggregated by multiple data values of the location dimensions A, B and C and the unit price dimensions a, b and c, that is, corresponding to the data records of multiple related dimension values in the data table.
  • the value of the data point of each chart visualization element can also correspond to only one data record. This part of the step process corresponds to step 410 above.
  • the dotted box in chart 620 is a selection box of application interface 600.
  • the chart visualization elements within the selection box will be highlighted, that is, the two chart visualization elements in the dotted box in chart 620 are filled with diagonal lines.
  • the two chart visualization elements are the objects that need to be analyzed for insights in this case.
  • the selection box in this case is a continuous selection. In actual cases, multiple selection boxes can select multiple discontinuous data, or only one chart visualization element can be selected. This case is not limited. This part of the steps corresponds to step 420 above.
  • the background of the application determines the data records in the data table 610 corresponding to the selected chart visualization element.
  • the specific value of the x-axis dimension corresponding to the selected chart visualization element is time 1-3, that is, the operation of selecting the interaction is to batch select chart visualization elements with specific values 1-3 along the dimension of the x-axis of the chart.
  • the filtering logic of the data records corresponding to the selected chart visualization element is generated, that is, the specific value of the filtering dimension combination is a logical combination of (time dimension 1 or 2 or 3) and (location dimension A or B or C) and (unit price dimension a or b or c).
  • the generated filtering logic can be used to query all the original data records in the data table 610, that is, to determine the selected chart. Data records corresponding to the visualization elements. This part of the steps corresponds to step 430 above. In actual cases, each column may be divided into multiple sub-columns due to different legend values. When only some sub-columns are selected, the external dimensions in the obtained logical combination may also have partial values.
  • insight data Based on the data records corresponding to the selected chart visualization elements determined above, joint data analysis is performed to generate insight data.
  • the specific insight data depends on the analysis results of different data record subsets, and the following is an exemplary example:
  • the values of the sales volume corresponding to the two chart visualization elements selected in this case both present the same phenomenon, that is, they present abnormally high values.
  • the sales volume corresponding to the two chart visualization elements are compared with the three chart visualization elements not selected in the chart 620 and the remaining data records in the data table at the same time.
  • the sales volume corresponding to the two chart visualization elements is compared with the three chart visualization elements not selected in the chart 620, if it is found that the data record with the location dimension value of A has a significant contribution to the high sales volume, that is, when the time dimension value is 1 or 2 or 3 and the location dimension value is A, the sales volume is abnormally higher than the sales volume under other dimension values, then the data record with the time dimension value of 1 or 2 or 3 is determined to have an association relationship.
  • the association relationship is that the time dimension value of 1 or 2 or 3 is associated with the location dimension value of A, or the association relationship can be expressed as the data record with the time dimension value of 1 or 2 or 3 and the location dimension value of A has common feature information, that is, the sales volume is abnormally higher than the sales volume under other dimension values.
  • the content of the generated insight data can be the contribution of time, location or unit price to the outlier phenomenon, or it can be the analysis of the data records with the time dimension value of 1 or 2 or 3 and the location dimension value of A along the aggregate value of the unit price dimension, or it can be other insight data related to external dimensions.
  • insight data can correspond to the insight data 621, 622, etc. in Figure 6.
  • more insight data can be generated in actual cases, and there can be more phenomena presented by the selected chart visualization elements, and there can be more insight data generated by each phenomenon, which will not be repeated in this case.
  • the analysis process of these insight data uses the data records corresponding to multiple chart visualization elements, so that users can analyze the observed local data. The above steps correspond to step 440 above.
  • the insight data 621, 622, etc. generated based on chart 620 include insight charts 631, 632, etc. and text descriptions corresponding to the insight charts, wherein the drawing method of insight charts 631, 632, etc. is the same as that of chart 620, and the text descriptions corresponding to the insight charts may include characteristic values or extreme values in the insight data.
  • insight data 641, 642 and other insight data are obtained.
  • Insight data 641, 642 and other insight data are further analyzed insight data in this case, and the analysis steps are not repeated in this case.
  • the insight type of insight data 641, 642 can be the same or similar insight type as the insight data 621, 622 and other insight data mentioned above, or it can be a subspace analysis of the insight data or the tracing of the insight data.
  • the type of insight data 641 is the subspace analysis of insight chart 631
  • its content may be the analysis of the composition of the original data records corresponding to the aggregated values of the data points constituting the insight chart 631.
  • the x-axis may be the specific value of the dimension of the original data record, and the y-axis is the sales volume. This part can be used to explain the possible outliers or data records with greater contribution in the original data records constituting the insight data 621.
  • insight data 642 is the original record traceability of insight chart 632
  • its content may be the numerical value of the specific original data record constituting insight data 642 and the specific value of its dimension.
  • the original record traceability presents these original data records through a paginated table.
  • the feature values in the text description corresponding to the insight chart in the insight data can also be traced to the original record.
  • this case can also generate insight data according to the aforementioned selection of chart visualization elements and data analysis steps, thereby achieving continuous further drill-down analysis of the insight data, which is not elaborated in this case.
  • FIG7 shows a schematic diagram of an interface of another detailed case.
  • the detailed case in FIG7 is slightly different from the detailed case in FIG6. The difference is that the dimensions in FIG7 are changed to distance, ID, number, throughput, etc., and the insight data in FIG7 are presented in different application interfaces after being generated.
  • the application interface may be an interface belonging to different applications, that is, the insight data corresponding to different data records may be generated in different applications or application interfaces.
  • a chart 720 in the application interface 702 is generated, and the chart visualization element in the chart 720 corresponds to at least one data record in the data table 710.
  • a selection box is selected in the chart 720, and two chart visualization elements in the selection box are used to generate insight data.
  • the insight data 721, 722, etc. generated finally are presented in the application interface 703.
  • Two chart visualization elements are selected in the insight chart 731 or the insight chart 723 in the application interface 703 to generate insight data 741, etc., and Presented in application interface 704.
  • this case supports secondary analysis and exploration of the focused feature subspace of the insight chart and derives insight data, optimizes the multi-level subspace analysis and exploration process in the automatic insight data generation auxiliary analysis process, and improves the analysis freedom from surface to point, from shallow to deep.
  • Fig. 8 shows a sorting strategy 800 used for multiple insight data in this case.
  • This case is an embodiment of a sorting strategy, and does not limit the process of determining the priority order of multiple insights.
  • insight data generation process Assume that in this case, in the insight data generation process shown in FIG6 or FIG7 , 10 insight data are obtained, namely insight data 810 to 819.
  • the application background sorts the obtained insight data 810 to 819.
  • the confidence score can be selected as the feature index value.
  • different measurement methods are used.
  • the application background determines the feature index values of insight data 810 to 819 and arranges them in descending order. The resulting arrangement list is shown in Figure 8.
  • the threshold of the characteristic index value determines the threshold of the characteristic index value to filter out some insight data with lower characteristic index values. For example, in this case, the confidence score of 0.95 is selected as the threshold to filter out some insight data with confidence scores lower than 0.95 as shown in FIG8 .
  • the insight data 815, 818, 811 and 810, etc. arranged in descending order of priority in Figure 8 correspond to the insight data 721, 722, etc. presented in Figure 6 or the insight data 821, 822, etc. presented in Figure 7.
  • the sorting process of the insight data 841 , 842 , etc. shown in FIG. 6 and the insight data 841 , 842 , etc. shown in FIG. 7 is the same as the above arrangement process, and is not described in detail in this case.
  • FIG9 can execute the methods shown in FIG4 and FIG5. It should be understood that the apparatus described below can execute the methods of the aforementioned embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the apparatus of the embodiment of the present application.
  • FIG9 is a schematic diagram of a device for generating insights according to an embodiment of the present application.
  • the device 900 shown in FIG9 includes: an interaction module 910 and a processing module 920 .
  • the interaction module is used to: present a first chart, the first chart includes M chart visualization elements, and each chart visualization element corresponds to at least one data record in a data source.
  • the processing module is used to: confirm N chart visualization elements selected from M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N, determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1, and perform joint data analysis based on the K data records to generate first insight data for the N chart visualization elements.
  • the processing module is also used to determine characteristic information common to L data records out of K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the characteristic information common to the L data records, and all data records in the data source.
  • the processing module is further used to generate a numerical distribution or data record traceability inside data records corresponding to N chart visualization elements according to an insight chart generated based on the second insight data.
  • the processing module is further used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data in order of priority.
  • the processing module is also used to determine a characteristic index value for each sub-insight data among P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data, confirm Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q, determine the number of characteristic types of each sub-insight data among the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting them in descending order according to the number of characteristic types of each sub-insight data.
  • the processing module is further used to determine dimensions and metrics in a first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, where the query request is used to query data records in a data source.
  • the above modules can be implemented by software or hardware.
  • the implementation of the processing module 920 is described below by taking the processing module 920 as an example.
  • the implementation of the interaction module 910 can refer to the implementation of the processing module 920.
  • the processing module 920 may include code running on a computing instance.
  • the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the computing instance may be one or more.
  • the processing module 920 may include code running on multiple hosts/virtual machines/containers. It should be noted that the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region or in different regions. Furthermore, the multiple hosts/virtual machines/containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs, each AZ including one data center or multiple data centers with close geographical locations. Among them, generally a region may include multiple AZs.
  • AZ availability zone
  • VPC virtual private cloud
  • multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs.
  • VPC virtual private cloud
  • a VPC is set up in a region.
  • a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
  • the processing module 920 may include at least one computing device, such as a server, etc.
  • the processing module 920 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • the multiple computing devices included in the processing module 920 can be distributed in the same region or in different regions.
  • the multiple computing devices included in the processing module 920 can be distributed in the same AZ or in different AZs.
  • the multiple computing devices included in the processing module 920 can be distributed in the same VPC or in multiple VPCs.
  • the multiple computing devices included in the processing module 920 can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • the present application also provides a computing device 1000.
  • the computing device 1000 includes: a bus 1002, a processor 1004, a memory 1006, and a communication interface 1008.
  • the processor 1004, the memory 1006, and the communication interface 1008 communicate with each other through the bus 1002.
  • the computing device 1000 may be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 1000.
  • the bus 1002 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus may be divided into an address bus, a data bus, a control bus, etc.
  • FIG. 10 is represented by only one line, but does not mean that there is only one bus or one type of bus.
  • the bus 1002 may include a path for transmitting information between various components of the computing device 1000 (e.g., the memory 1006, the processor 1004, and the communication interface 1008).
  • Processor 1004 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).
  • processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • the memory 1006 may include a volatile memory, such as a random access memory (RAM).
  • the processor 1004 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 1006 stores executable program codes, and the processor 1004 executes the executable program codes to respectively implement the functions of the aforementioned interaction module 910 and the processing module 920, thereby implementing the aforementioned method for generating insight data. That is, the memory 1006 stores instructions for executing the aforementioned method for analyzing and generating insight data.
  • the communication interface 1008 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 1000 and other devices or communication networks.
  • a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 1000 and other devices or communication networks.
  • the embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device can be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.
  • the computing device cluster includes at least one computing device 1000.
  • the memory 1006 in one or more computing devices 1000 in the computing device cluster may store the same instructions for executing the above-mentioned insight data generation method.
  • the memory 1006 of one or more computing devices 1000 in the computing device cluster may also respectively store some instructions for executing the above-mentioned method for generating insight data.
  • the combination of one or more computing devices 1000 may jointly execute instructions for executing the above-mentioned method for generating insight data.
  • the memory 1006 in different computing devices 1000 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the above-mentioned apparatus. That is, the instructions stored in the memory 1006 in different computing devices 1000 can implement the functions of one or more modules in the interaction module and the processing module.
  • one or more computing devices in the computing device cluster can be connected via a network.
  • the network can be a wide area network or a local area network, etc.
  • Figure 12 shows a possible implementation. As shown in Figure 12, two computing devices 1000A and 1000B are connected via a network. Specifically, the network is connected through the communication interface in each computing device.
  • the memory 1006 in the computing device 1000A stores instructions for the functions of the interaction module.
  • the memory 1006 in the computing device 1000B stores instructions for executing the functions of the processing module.
  • the functions of the computing device 1000A shown in FIG12 may also be completed by multiple computing devices 1000.
  • the functions of the computing device 1000B may also be completed by multiple computing devices 1000.
  • An embodiment of the present application also provides a chip, which includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface to execute the above-mentioned method for generating insight data.
  • the present application also provides a computer program product including instructions.
  • the computer program product may be software or a program product including instructions that can be run on a computing device or stored in any available medium.
  • the at least one computing device executes the above-mentioned method for generating insight data.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media.
  • the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk).
  • the computer-readable storage medium includes instructions that instruct the computing device to execute the above-mentioned method for generating insight data.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of modules is only a logical function division. There may be other division methods in actual implementation, such as multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or modules, which can be electrical, mechanical or other forms.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in one place or distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art.
  • the computer software product is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods of various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Les modes de réalisation de la présente demande concernent un procédé et un appareil de génération de données d'aperçu. Le procédé comprend : la présentation d'un premier graphique, le premier graphique comprenant M éléments de visualisation de graphique, et chaque élément de visualisation de graphique correspondant à au moins un enregistrement de données dans une source de données ; la confirmation de N éléments de visualisation de graphique sélectionnés parmi les M éléments de visualisation de graphique, M et N étant des nombres entiers positifs supérieurs à 1, et M étant supérieur ou égal à N ; la détermination de tous les K enregistrements de données correspondant aux N éléments de visualisation de graphique, K étant un nombre entier positif supérieur à 1 ; et la réalisation d'une analyse de données conjointes sur la base de tous les K enregistrements de données de façon à générer des premières données d'aperçu des N éléments de visualisation de graphique. Au moyen de la solution technique fournie dans la présente demande, des données d'aperçu d'éléments de visualisation de graphique sélectionnés par lot peuvent être automatiquement générées, et un échange et une analyse ultérieurs des données d'aperçu sont réalisés, ce qui permet d'améliorer la précision de l'aperçu.
PCT/CN2023/109267 2022-10-18 2023-07-26 Procédé et appareil de génération de données d'aperçu WO2024082754A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211275756.8A CN117951186A (zh) 2022-10-18 2022-10-18 见解数据生成的方法和装置
CN202211275756.8 2022-10-18

Publications (1)

Publication Number Publication Date
WO2024082754A1 true WO2024082754A1 (fr) 2024-04-25

Family

ID=90736922

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109267 WO2024082754A1 (fr) 2022-10-18 2023-07-26 Procédé et appareil de génération de données d'aperçu

Country Status (2)

Country Link
CN (1) CN117951186A (fr)
WO (1) WO2024082754A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200051293A1 (en) * 2015-06-29 2020-02-13 Microsoft Technology Licensing, Llc Multi-dimensional data insight interaction
CN110795458A (zh) * 2019-10-08 2020-02-14 北京百分点信息科技有限公司 交互式数据分析方法、装置、电子设备和计算机可读存储介质
US20210240702A1 (en) * 2020-02-05 2021-08-05 Microstrategy Incorporated Systems and methods for data insight generation and display

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200051293A1 (en) * 2015-06-29 2020-02-13 Microsoft Technology Licensing, Llc Multi-dimensional data insight interaction
CN110795458A (zh) * 2019-10-08 2020-02-14 北京百分点信息科技有限公司 交互式数据分析方法、装置、电子设备和计算机可读存储介质
US20210240702A1 (en) * 2020-02-05 2021-08-05 Microstrategy Incorporated Systems and methods for data insight generation and display

Also Published As

Publication number Publication date
CN117951186A (zh) 2024-04-30

Similar Documents

Publication Publication Date Title
US11670021B1 (en) Enhanced graphical user interface for representing events
US10866991B1 (en) Monitoring service-level performance using defined searches of machine data
US11531679B1 (en) Incident review interface for a service monitoring system
CA2751295C (fr) Analyse de structures d'objets comme des avantages et des contrats de fournisseur
JP6063053B2 (ja) ネットワークデータセットを提示し、視覚的にナビゲートするためのシステム及び方法
Brundage et al. Using graph-based visualizations to explore key performance indicator relationships for manufacturing production systems
US7890519B2 (en) Summarizing data removed from a query result set based on a data quality standard
US10353958B2 (en) Discriminative clustering
CN106605222B (zh) 有指导的数据探索
Blumenschein et al. Evaluating reordering strategies for cluster identification in parallel coordinates
WO2024082754A1 (fr) Procédé et appareil de génération de données d'aperçu
US10373058B1 (en) Unstructured database analytics processing
CN114490833B (zh) 一种图计算结果可视化方法和系统
US11449513B2 (en) Data analysis system
Rahman et al. NOAH: interactive spreadsheet exploration with dynamic hierarchical overviews
US20170199911A1 (en) Method and Query Processing Server for Optimizing Query Execution
CN113297040A (zh) 洞察数据的确定方法和装置,计算机存储介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23878750

Country of ref document: EP

Kind code of ref document: A1