WO2024082754A1 - Insight data generation method and apparatus - Google Patents
Insight data generation method and apparatus Download PDFInfo
- Publication number
- WO2024082754A1 WO2024082754A1 PCT/CN2023/109267 CN2023109267W WO2024082754A1 WO 2024082754 A1 WO2024082754 A1 WO 2024082754A1 CN 2023109267 W CN2023109267 W CN 2023109267W WO 2024082754 A1 WO2024082754 A1 WO 2024082754A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- chart
- insight
- analysis
- sub
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 101
- 238000012800 visualization Methods 0.000 claims abstract description 232
- 238000004458 analytical method Methods 0.000 claims abstract description 143
- 238000007405 data analysis Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims description 55
- 238000009826 distribution Methods 0.000 claims description 52
- 230000015654 memory Effects 0.000 claims description 38
- 230000003993 interaction Effects 0.000 claims description 24
- 230000002776 aggregation Effects 0.000 claims description 19
- 238000004220 aggregation Methods 0.000 claims description 19
- 238000005259 measurement Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 8
- 239000000203 mixture Substances 0.000 claims description 7
- 230000000875 corresponding effect Effects 0.000 description 71
- 230000008569 process Effects 0.000 description 34
- 238000004422 calculation algorithm Methods 0.000 description 21
- 230000006870 function Effects 0.000 description 19
- 230000002452 interceptive effect Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 15
- 238000004891 communication Methods 0.000 description 12
- 238000003860 storage Methods 0.000 description 12
- 238000001914 filtration Methods 0.000 description 11
- 230000002159 abnormal effect Effects 0.000 description 7
- 238000013500 data storage Methods 0.000 description 6
- 206010047289 Ventricular extrasystoles Diseases 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 238000005129 volume perturbation calorimetry Methods 0.000 description 5
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 4
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 238000007621 cluster analysis Methods 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 235000012489 doughnuts Nutrition 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000000691 measurement method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000003999 initiator Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000004549 pulsed laser deposition Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
Definitions
- Embodiments of the present application relate to the field of data intelligence, and more specifically, to a method and apparatus for generating insight data.
- Automatic insight data generation is a very important capability in business intelligence-assisted analysis and decision-making, and has gradually become one of the core competitive advantages of business intelligence products provided by various manufacturers. How to design appropriate front-end interaction processes based on user-provided data, ensure back-end data query performance, improve algorithm feature mining, related case analysis, abnormal pattern definition, cause analysis construction and other capabilities, and finally integrate simple and beautiful front-end display and easy-to-use interaction, and present feedback to users is a key factor in building the competitiveness of insight data generation technology.
- the insight data analysis related to single-point data can help users to check, discover and gain in-depth understanding of individual chart visualization elements in visual charts when building, browsing and analyzing data.
- the accuracy of insight data generated based on the analysis granularity of single data points in technical products related to automatic insight data generation is poor, and the interaction and analysis freedom supported are still lacking, and there is still room for further improvement and optimization.
- the embodiments of the present application provide a method and device for generating insight data, which can realize batch selection of multiple chart visualization elements in a chart to generate insight data, thereby improving the accuracy of the insight data and the degree of interactive freedom.
- a method for generating insight data comprising: presenting a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source; confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N; determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1; and performing joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.
- joint data analysis is performed based on all K data records, including: determining feature information common to L data records among all K data records, the L data records corresponding to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L; performing data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.
- the first insight data includes at least one of the following insight data types: chart metric aggregation expansion analysis, used to analyze the original data distribution composition of the data records corresponding to the N chart visualization elements; external dimension valid record number analysis, used to analyze the distribution of the number of valid records of the K data records in the dimensions that do not participate in drawing the first chart; external dimension distribution contribution analysis, used to analyze the contribution of the K data records to the chart metrics in the dimensions that do not participate in drawing the first chart; external dimensional subspace internal feature analysis, the external dimensional subspace internal feature analysis is used to analyze the feature distribution inside the data records in the dimensions that do not participate in drawing the first chart; external high interpretability metric analysis, the external high interpretability metric analysis is used to analyze the association between the metrics and original data records that do not participate in drawing the first chart and the L data records.
- chart metric aggregation expansion analysis used to analyze the original data distribution composition of the data records corresponding to the N chart visualization elements
- external dimension valid record number analysis used to analyze the distribution of the number of valid records of the K
- the method can guide users to explore the analysis content of the associated data, such as the composition of abnormal aggregate values, the potential reasons why the aggregate values of the visualization chart elements show a specific pattern, the potential high contribution dimensions, and the value distribution within the subspace.
- the first chart is an insight chart generated based on the second insight data
- the first insight data for generating N chart visualization elements includes generating the numerical distribution within the corresponding data records or data record traceability of the N chart visualization elements.
- the method supports secondary analysis and exploration of the focused feature subspace of the insight chart and derives insights, optimizes the multi-level subspace analysis and exploration process in the automatic insight generation auxiliary analysis process, and improves the analysis freedom from surface to point, from shallow to deep.
- the numerical distribution within the data records corresponding to the N chart visualization elements helps users further explore the patterns of interest in the dimensional distribution chart of the algorithm recommendation insights and explore the reasons for the distribution characteristics; data record tracing can help users conveniently query the original data for abnormal parts of the distribution of the recommended insights and explore the reasons for the distribution characteristics.
- a priority order of P sub-insight data included in the first insight data is determined; and the P sub-insight data are recommended according to the priority order.
- the method can avoid generating a large number of disordered insight charts at one time and presenting them to the user, so that the user can quickly decide where to explore, thereby improving the efficiency of the user in obtaining and analyzing insights.
- determining the priority order of P sub-insight data also includes: determining a characteristic index value for each sub-insight data in the P sub-insight data, the characteristic index value being used to measure the confidence or significance of each sub-insight data in the P sub-insight data; confirming Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q; determining the number of feature types for each sub-insight data in the Q sub-insight data; and determining the priority order of the Q sub-insight data by sorting in descending order according to the number of feature types of each sub-insight data.
- the method realizes the comprehensive consideration of the confidence of all features possessed by each opinion within the same category of opinions and the feature richness of the opinions.
- determining all K data records corresponding to N chart visualization elements also includes: determining dimensions and metrics in a first chart corresponding to the N chart visualization elements; generating a query request based on the dimensions and metrics in the first chart, the query request being used to query data records in a data source.
- the method realizes rapid positioning of chart information contained in a chart visualization element, and the chart information can realize rapid query of data records corresponding to the chart visualization element and selection of focus dimensions/metrics for generating insight data.
- a device for generating insight data comprising: an interaction module, for presenting a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source; a processing module, for confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N, determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1, and performing joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.
- the processing module is further used to determine feature information common to L data records out of K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.
- the processing module is also used to generate the numerical distribution or data record traceability within the data records corresponding to N chart visualization elements based on the insight chart generated based on the second insight data.
- the processing module is also used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data in order of priority.
- the processing module is also used to determine a characteristic index value for each sub-insight data in the P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data in the P sub-insight data, confirm Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1 and P is greater than Q, determine the number of feature types for each sub-insight data in the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting them in descending order according to the number of feature types of each sub-insight data.
- the processing module is also used to determine the dimensions and metrics in the first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, which is used to query data records in the data source.
- a computing device comprising a processor and a memory, wherein the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory, so that the computing device executes the method in the first aspect or any possible implementation of the first aspect.
- a computing device cluster comprising at least one computing device, each computing device comprising a processor and a memory, wherein the memory is used to store instructions, and the processor is used to call and execute the instructions from the memory, so that the computing device cluster executes the method in the first aspect or any possible implementation of the first aspect.
- the processor may be a general-purpose processor, which may be implemented by hardware or software.
- the processor may be a logic circuit, an integrated circuit, etc.; when implemented by software, the processor may be a general-purpose processor, which is implemented by reading software codes stored in a memory, which may be integrated in the processor or may be located outside the processor and exist independently.
- a chip which obtains instructions and executes the instructions to implement the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
- the chip includes a processor and a data interface, and the processor reads instructions stored in the memory through the data interface to execute the method in the above-mentioned first aspect or any possible implementation of the first aspect.
- the chip may also include a memory, in which instructions are stored, and the processor is used to execute the instructions stored in the memory.
- the processor is used to execute the method in the above-mentioned first aspect or any possible implementation method of the first aspect.
- a computer program product comprising instructions is provided.
- the computing device cluster executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
- a computer-readable storage medium comprising computer program instructions.
- the computing device cluster executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
- these computer-readable storage media include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), Flash memory, electrically EPROM (EEPROM), and hard drive.
- ROM read-only memory
- PROM programmable ROM
- EPROM erasable PROM
- Flash memory electrically EPROM (EEPROM)
- hard drive electrically EPROM
- the above-mentioned storage medium may specifically be a non-volatile storage medium.
- FIG1 is a schematic diagram of an application scenario for generating insight data provided in an embodiment of the present application.
- FIG. 2 is a schematic diagram of another application scenario for generating insight data provided in an embodiment of the present application.
- FIG3 is a schematic diagram of a system architecture provided in an embodiment of the present application.
- FIG4 is a schematic diagram of an insight data generation process provided in an embodiment of the present application.
- FIG5 is a schematic diagram of a sorting strategy provided in an embodiment of the present application.
- FIG6 is a schematic diagram of a case study of an insight data generation process provided in an embodiment of the present application.
- FIG. 7 is a schematic diagram of another example of an insight data generation process provided in an embodiment of the present application.
- FIG8 is a schematic diagram of an example of a sorting strategy provided in an embodiment of the present application.
- FIG. 9 is a schematic structural block diagram of an apparatus for generating insight data provided in an embodiment of the present application.
- FIG. 10 is a schematic structural block diagram of a computing device provided in an embodiment of the present application.
- FIG. 11 is a schematic structural block diagram of a computing device cluster provided in an embodiment of the present application.
- FIG. 12 is a schematic structural block diagram of another computing device cluster provided in an embodiment of the present application.
- the size of the serial number of each process does not mean the order of execution.
- the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
- the network architecture and business scenarios described in the embodiments of the present application are intended to more clearly illustrate the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided in the embodiments of the present application.
- a person of ordinary skill in the art can appreciate that with the evolution of the network architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
- references to "one embodiment” or “some embodiments” etc. described in this specification mean that a particular feature, structure or characteristic described in conjunction with the embodiment is included in one or more embodiments of the present application.
- the phrases “in one embodiment”, “in some embodiments”, “in some other embodiments”, “in some other embodiments”, etc. appearing in different places in this specification do not necessarily all refer to the same embodiment, but mean “one or more but not all embodiments", unless otherwise specifically emphasized in other ways.
- the terms “including”, “comprising”, “having” and their variations all mean “including but not limited to”, unless otherwise specifically emphasized in other ways.
- At least one means one or more
- plural means two or more.
- “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
- a and/or B can mean: including the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A and B can be singular or plural.
- the character “/” generally indicates that the previous and next associated objects are in an “or” relationship.
- “At least one of the following” or similar expressions refers to any combination of these items, including any combination of single or plural items.
- At least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple.
- Dimension is a classification method for fields in a data set. Fields that have a certain classification meaning for data are called dimensions. Usually, the data is in the form of enumerable values, such as "month”, "ID”, etc.
- Metrics Indicator fields with quantifiable data are called metrics, usually in numerical form.
- the aggregate value is the summary value or total value generated by a single field in a data set in a filtered data subset after some calculation operations, such as sum aggregation, mean aggregation, etc.
- Record refers to one or more rows in a database table that constitutes a data set.
- Chart Visualization Element is a selectable data point in a visualization chart that summarizes some basic record values in the data.
- the data of a chart visualization element can consist of a single record or multiple records aggregated together.
- Chart visualization elements in a visualization chart can be displayed in a variety of ways such as points, lines, shapes, etc.
- Internal and external Internal refers to the dimensions and measurements involved in the analysis participating in the drawing of the chart that constitutes the user's current analysis; external refers to the dimensions and measurements involved in the analysis not participating in the drawing of the chart that constitutes the user's current analysis.
- Automatic insight data generation is a very important capability in business intelligence-assisted analysis and decision-making, and has gradually become one of the core competitive advantages of business intelligence products provided by various manufacturers. How to design appropriate front-end interaction processes based on user-provided data, ensure back-end data query performance, improve algorithm feature mining, related case analysis, abnormal pattern definition, cause analysis construction and other capabilities, and finally integrate simple and beautiful front-end display and easy-to-use interaction, and present feedback to users is a key factor in building the competitiveness of insight data generation technology.
- FIG1 shows an insight data generation system, which may include a user device and a data processing device.
- the user device may include a smart terminal such as a mobile phone, a personal computer or an information processing center.
- the user device may be used as the initiator of the insight data generation request.
- the above-mentioned data processing device can be a device or server with data processing function such as a cloud server, a network server, an application server or a management server.
- the data processing device receives the instruction of selecting the visualization element of the chart from the intelligent terminal through the interactive interface, and then performs data processing in the form of machine learning, deep learning, search, reasoning, decision-making, etc. through the memory storing the data and the processor link of the data processing.
- the memory in the data processing device can be a general term, which can be a local storage device storing historical data or a storage manager in the database.
- the user device can receive an instruction from a user to select one or more chart visualization elements in a visualization chart, and then initiate a screening and query request to a data processing device to find out the fine-grained original records of the selected chart visualization elements, so that the data processing device performs data analysis on the original data records corresponding to the one or more chart visualization elements selected by the user device, thereby generating insight data for one or more chart visualization elements.
- the data processing device can execute the insight data generating method of the embodiment of the present application. It should be noted that although the user device and the data processing device are depicted as independent devices in Figure 1, in other embodiments of the present application, the two devices can be implemented by the same device.
- FIG2 shows another insight data generation system.
- the user device can be directly used as a data processing device.
- the user device can directly receive input from the user and process it directly by the hardware of the user device itself.
- the specific process is similar to that of FIG1 . Please refer to the above description and will not be repeated here.
- the user device in FIG. 2 may be a server with data processing capabilities such as a cloud server, a network server, an application server or a management server, or may be an electronic device with data processing capabilities such as a desktop computer, a mobile computer, a tablet computing device or a mobile communication device.
- a server with data processing capabilities such as a cloud server, a network server, an application server or a management server
- an electronic device with data processing capabilities such as a desktop computer, a mobile computer, a tablet computing device or a mobile communication device.
- the user device can receive an instruction from the user to select one or more chart visualization elements in a visualization chart, and then the user device itself initiates a request to perform data analysis on the selected one or more chart visualization elements, thereby generating insight data for the one or more chart visualization elements.
- the user device itself can execute the insight data generating method of the embodiment of the present application.
- the processors in FIG. 1 and FIG. 2 can perform data analysis according to business needs.
- the insight analysis of the chart is performed, and a variety of different analysis modes are supported, including statistical value feature analysis, distribution feature analysis, null value alarm analysis, zero value alarm analysis, high correlation measurement analysis, global-subset difference analysis, etc., and different types of insights can be detected from the statistical analysis and traditional machine learning levels.
- Features of different categories are generated, and a variety of feature descriptions are customized to obtain insight analysis of the interest data behind the visualization elements of the chart screened by the user.
- the system architecture 100 may include an execution device 110, a database 130, a client device 140, a data storage system 150, and a data acquisition device 160. It should be understood that FIG1 is only for illustration, and optionally, the system architecture may include more or fewer databases and execution devices, or other functional modules.
- the data acquisition device 160 can be used to collect chart data, and in the embodiment of the present application, the chart data can be used to generate a visual chart containing chart visualization elements. After collecting the chart data, the data acquisition device 160 stores the data in the database 130.
- the training data maintained in the database 130 does not necessarily all come from the collection of the data acquisition device 160, and may also be received from other devices, for example, it may also be directly obtained from the cloud or other places.
- the execution device 110 does not necessarily generate insights based entirely on the training data maintained by the database 130, and it is also possible to obtain data from the cloud or other places to generate insights. The above description should not be used as a limitation on the embodiments of the present application.
- the database may be a hardware device, may be integrated in the execution device 110, or may be set up on a cloud or other network server.
- the generation of visual charts and insight data can be applied to different systems or devices, such as being applied to the execution device 110 shown in FIG. 3 and presented on the application interface 120.
- the execution device 110 can be the data processing device in FIG. 1, can be a terminal, such as a mobile terminal, a tablet computer, a laptop computer, an AR/VR or a vehicle-mounted terminal, etc., can also be a server or a cloud, etc.
- the execution device 110 can be configured with an input/output (I/O) interface 112 for data interaction with an external device.
- I/O input/output
- the user can input data to the I/O interface 112 through the client device 140, and the input data can include: instructions for selecting one or more chart visualization elements and dimensions and metrics of the visualization charts corresponding to the chart visualization elements in the embodiment of the present application.
- the execution device 110 can call the data, code, etc. in the data storage system 150 for corresponding processing, and can also store the input data obtained by the corresponding processing in the data storage system 150.
- the I/O interface 112 feeds back the processing result, for example, the generated insight data, to the client device 140.
- the client device may also be the execution device 110 in FIG3, and the fed-back insight data is presented on the application interface 120 of the execution device.
- the execution device 110 includes an application interface 120.
- the application interface 120 can be an interface of a client application stored locally on the execution device 110, or it can be an interface of a client application located on a remote server and accessible through a network (such as the Internet or an intranet).
- a network such as the Internet or an intranet
- it can be an application interface that is hosted in a browser-controlled environment or coded in a language supported by the browser and relies on a web browser to perform data calculations.
- the application interface 120 may include a visualization chart interface 121 and an insight data interface 125 , or the visualization chart interface 121 and the insight data interface 125 may be presented through multiple application interfaces.
- the visualization chart interface 121 may include one or more different types of charts and interface configuration information.
- the interface configuration information may include modules such as dimension options, measurement options, and chart interface setting modules, or elements such as axis configuration information for selecting charts to be drawn, and chart raw data. It should be understood that FIG. 3 is only an example.
- the visualization chart interface 121 may also include more selection modules, such as Select the module as chart type.
- the insight interface 125 may include one or more insight data 126, 127, and the insight data 126 and the insight data 127 may include insight charts or insight texts.
- the insight data is obtained according to the chart 122 or the chart 123 in the visualization chart interface 121, and the insight data interface may also include an insight mode selection module or an analysis type for selecting insight data generation.
- the analysis type may be distribution feature analysis, null value alarm analysis, zero value alarm analysis, high correlation metric analysis, global-subset difference analysis, etc., or may be a customized feature analysis.
- the analysis results formed may be described by different chart types and corresponding text insight information, and displayed in insight charts or insight texts. It should be understood that FIG. 3 is only an example, and optionally, the insight interface 125 includes more modules, such as an insight data sorting module.
- Figure 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation.
- the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.
- other modules may also be included in the system architecture, such as a chart drawing module.
- the visual chart interface and the insight interface may not be in the same application interface.
- the scenarios to which the embodiments of the present application can be applied are not limited to those shown in Figure 3.
- the insight data analysis related to single-point data can help users to inspect, discover and gain in-depth understanding of individual chart visualization elements in a visualization chart when building, browsing and analyzing data.
- the data records of unselected chart visualization elements will interfere with the insight data analysis formed by the selected single chart visualization element, making it difficult to ensure the accuracy of the overall insight analysis formed by selecting multiple chart visualization elements, and the interaction cost is high.
- the outliers are the user's interest data, and the user wants to obtain insight data on the causes of the three outliers. If the user only selects one of the outliers to generate insight data, the other two outliers also participate in the analysis process of the insight data. As a result, the insight data may be biased, for example, the selected outlier may be judged as a normal value due to the presence of the other two outliers. Therefore, the unselected chart visualization elements may interfere with the insight data formed by the selected single chart visualization element.
- the technical products related to automatic insight generation lack the auxiliary insight generation solution of batch selection of local data based on user-provided interactive attention, which is based on the analysis granularity of single data points.
- the interaction and analysis freedom supported are still lacking, and there is still room for further improvement and optimization.
- Figure 4 shows a schematic flowchart of a method 400 for generating insight data provided by an embodiment of the present application.
- the method of Figure 4 can be executed by the data processing device of Figure 1 or the user device of Figure 2.
- Step 410 Present a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source.
- the first chart may be presented in the visualization chart interface 121 in the application interface 120, or in any visualization interface.
- the data records for drawing the first chart may be all or part of the data in the database 130, or all or part of the data in one or more tables in any data source.
- the first chart includes M chart visualization elements, such as bars of a bar chart, discrete data points of a scatter plot, data points and adjacent lines of a line chart, sectors of a pie chart or a donut chart, and other graphical representations of data records.
- Each chart visualization element is drawn by a data record.
- a single chart visualization element may correspond to a single data record, or may correspond to an aggregate value of multiple data records, that is, a summary value or total value finally generated after some calculation operations are performed on multiple data records, such as sum aggregation, mean aggregation, etc.
- Step 420 confirming N chart visualization elements selected from the M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N.
- step 420 can support the interactive mode of selecting multiple chart visualization elements and confirm the selected multiple chart visualization elements.
- step 420 in the technical solution of the present application may also support an interactive mode of selecting a single chart visualization element and confirming the selected single chart visualization element.
- the user can click and select one or N chart visualization elements through the visualization chart interface of the application interface.
- the application interface is an interface of a spreadsheet application on a desktop computer, and the user can drag and select with the mouse to generate a selection box, and one or N chart visualization elements in the selection box are determined as the chart visualization elements selected by the user.
- the chart visualization elements can be It can be one or N bars in a bar chart, one or N data points in a line chart or scatter chart, or one or N sectors in a pie chart or donut chart.
- the user may select one or N chart visualization elements by clicking on a single selection or multiple selections.
- the user may click on one or N chart visualization elements at the same time with a mouse, and the selected one or N chart visualization elements are determined as the chart visualization elements selected by the user.
- the selected chart visualization elements may be discontinuous in the dimension of the x-axis of the chart, and the selected chart visualization elements may be separated by one or more chart visualization elements.
- the focus data supported by this technical solution can be batch supported on a variety of different charts, and the highlights selected by the user can be retained when the chart type is switched, ensuring that the highlights are always used for user insight generation.
- the batch selection method in step 420 can be performed on a variety of different charts, such as bar charts, line charts, or scatter charts, and the data that is brushed will be highlighted relative to the data that is not brushed, and will remain highlighted when the type of chart for analyzing the data is switched. For example, when drawing a chart for the same set of data, the chart visualization elements that the user is interested in are brushed on the bar chart, and the chart visualization elements are then highlighted. When the user switches the bar chart to a line chart, the chart visualization elements will still remain highlighted. When the user switches the chart type for the bar chart that has been generated and the visualization elements have been brushed, the visualization elements containing the data records corresponding to the brushed visualization elements will also be highlighted in the new chart.
- charts such as bar charts, line charts, or scatter charts
- Step 430 Determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1.
- the method of confirming all K data records corresponding to N chart visualization elements may be to determine the interactive form of the chart visualization elements of the user interactively selecting the visualization chart.
- the interactive form of the user interaction may be the operation of the user performing a swiping interaction.
- the interactive form is to swipe the chart visualization element data when the horizontal axis dimension takes a specific value.
- the dimension data bound to the chart visualization element may be a specific value of the dimension corresponding to the chart visualization element data, and the analysis granularity may be a specific category of the dimension.
- the user is concerned about the characteristics of the chart visualization element when the dimension corresponding to the horizontal axis of the chart takes a specific value, and the subsequent insight data of this solution will also focus on the horizontal axis dimension of the chart.
- the x-axis dimension is time
- the interactive operation performed by the user is to swipe the three bar chart visualization elements of the bar chart along the x-axis direction of the visualization chart.
- the time periods corresponding to the three bar chart visualization elements constitute field A
- the solution analyzes that the user's interactive operation is swiping along the x-axis
- the dimension data field bound to the x-axis of the visualization chart is field A of a certain time type
- the analysis granularity is a specific time type, such as month, time period, etc.
- the dimension data field all data records corresponding to the dimension data field A and the specific time type are filtered out from the data source or database.
- a method for confirming all K data records corresponding to N chart visualization elements may be to directly extract the enumeration value of the data point corresponding to the interactively selected chart visualization element, and the enumeration value of the data point may be the specific value of the field corresponding to the chart visualization element of the chart.
- the enumeration value may be one or more different values of month 1 to month 12 corresponding to the selected chart visualization element, or it may be a combination of the value of the external dimension information of the data set bound to the horizontal axis or legend in the selected chart visualization element and the value of the month.
- Determining the dimension combination related to the chart visualization element in the chart may include the specific value of the external dimension that is not involved in drawing the first chart and the selected multiple enumeration values of the dimension that participates in drawing the first chart, and may also include the dimension combination and measurement related to the chart visualization element, or other relevant information of the chart visualization element.
- the technical solution of the present application can integrate the information contained to generate filtering logic for filtering and searching data sets or original data records in databases.
- the subsequent insight data of this solution will also focus on the dimensions or dimension combinations related to the filtered fine-grained original records and chart visualization elements.
- the logic for filtering the first chart data can be a logical combination of A or B or C with different values.
- this solution can convert one or N chart visualization elements interactively selected by the user on the first chart (corresponding to the aggregated results of one row or part of the rows in the original data set records) into query requests to further query all K data records of the interactively selected chart visualization elements in the original data set or database, and use all K data records for subsequent insights.
- the query request may be any combination of all information of the request field, a list of filter operators, a list of filter enumeration values, and filter logic, and the object of the request may be a backend module.
- SQL Structured Query Language
- the Structured Query Language supports the generation of a where clause, performs a query on the original table records, and returns the subset of interest data corresponding to the chart visualization element selected by the user to the algorithm module.
- the user interest data filtering generated by the interactive operation guidance is all records that satisfy the dimension A field in the data set participating in the analysis. Then, this solution corresponds to the final generated SQL query statement, which is a composite implementation of multiple dimension A fields in the where clause based on the IN operator or logic.
- the solution is not limited to single-table analysis queries of a single data source, but can also support queries for multiple related data tables in the original data source, and can support federated queries and subsequent analysis.
- the functional bottom layer of the solution is based on a distributed SQL query engine, which then merges multiple tables into a data set level to obtain data.
- Step 440 Perform joint data analysis based on the K data records to generate first insight data for N chart visualization elements.
- the joint data analysis process of K data records is different from the data analysis process of selecting a single chart visualization element and the analysis process of selecting multiple chart visualization elements and then performing data analysis on a single chart visualization element and then integrating data analysis information.
- the joint data analysis can analyze the K data records as a whole and with other data in the data set, thereby generating N chart visualization elements or at least two of the N chart visualization elements compared with other data records to obtain insight data.
- the K data records are taken as a whole, and the association relationship of each data record in the K data records is determined, and the association relationship determines the characteristic information shared by the K data records.
- the K data records may have the same or similar external dimensions, or may be data records with a correlation relationship, or may present the same or opposite measurement value phenomena, and the shared characteristic information is the external dimension or correlation relationship analysis data or measurement value phenomenon corresponding to the K data records.
- the data records with shared characteristic information in the data source are screened out, and the K data records and the data records with shared characteristic information are subjected to data analysis to form insight data of N chart visualization elements.
- L of the K data records may have common feature information, and the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than L.
- insight data of at least two chart visualization elements corresponding to the L data records are generated.
- the data records for comparison may be data records corresponding to (M-N) chart visualization elements that are not selected from the M chart visualization elements in step 420, or may be (K-L) data records that are not selected from the K data records.
- the technical solution of the present application can realize simultaneous analysis of multiple chart visualization elements and their original data records, and obtain insight data containing the association information of multiple chart visualization elements.
- the insight data generated by the technical solution of the present application is different from the insight data obtained by selecting a single chart visualization element for analysis and selecting a single chart visualization element for analysis multiple times and then integrating it, which reduces the interference of chart visualization elements that are not selected but have an associated relationship on the selected chart visualization elements during the analysis of insight data.
- the number of times the insight data of at least two chart visualization elements corresponding to L data records are generated may be more than once, and the value of L may be different, and finally a plurality of different data candidate sets may be formed.
- These different data candidate sets may be analyzed by strategies or algorithms to form a plurality of different insight data, and the data candidate sets serve as subspaces for strategy or algorithm analysis.
- the present application analyzes the data candidate set in the data subset according to the insight data generation strategy or algorithm to generate insight data that can demonstrate the characteristics of the subspace data.
- the insight data generation algorithm input may include the full amount of full table data records, the screening conditions corresponding to the visualization elements of the original visualization chart selected by the user interaction, the original records of the data of interest generated by the query request, the common feature information of the original data records, and the algorithm parameters configured by the front-end user interaction.
- the insight data results produced by the algorithm may include the original chart data, chart type information, axis configuration information, or text insight description information required to draw the insight chart.
- the insight data generation algorithm can support multiple analysis modes, such as statistical value feature analysis, distribution feature analysis, null value warning analysis, zero value warning analysis, high correlation measurement analysis, global-subset difference analysis, etc. It can also support the detection of different categories of features for different types of insights from the statistical analysis and traditional machine learning levels, and generate a variety of customized feature descriptions.
- the insight data generation algorithm can adaptively use different chart types for display, such as using a bar chart for distribution charts, and flexibly using a scatter plot logarithmic axis and a linear axis based on data distribution for correlation measurement charts.
- the textual insight description information of insight data also varies.
- the textual description can be a description of the characteristics of the insight, a possible priori pattern analysis composed of various characteristics, a combination of the two, or other descriptions that can explain the characteristics.
- the insight data generation algorithm can produce multiple different types of insights, support the contribution of metrics or dimensions inside and outside the analysis chart to the generation of patterns of user interest selection, and guide users to explore the analysis content of data records corresponding to the selected chart visualization elements and related data in the data set.
- the types of insights generated by the algorithm may include chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, external dimension subspace internal feature analysis, and external high interpretability metric analysis, but are not limited to these types of insights.
- the chart metric aggregation expansion analysis can focus on decomposing the bound metric aggregation values of the characteristic visualization chart visualization elements into the original data distribution composition, helping users understand the composition of the aggregation value.
- the vertical axis in a common analysis chart is the sum aggregation of metrics, and the user pays attention to the chart visualization elements with higher aggregation values.
- This type of insight data can help users understand the composition of abnormal aggregation values, such as a single original abnormal record, or the overall distribution has a certain bias.
- the analysis of the number of valid records in the external dimension can focus on exploring the data records selected by the user interactively, and the distribution of the number of valid records in other external dimensions (not involved in the chart drawing) to analyze the potential reasons why the aggregate values of the visualization elements of the visualization chart selected by the user show a specific pattern. If it is found that the original data records corresponding to a specific pattern are aggregated in a certain dimension in a certain value subspace, this method believes that the aggregation has a greater correlation with the presentation of this pattern.
- external dimension distribution contribution analysis can focus on exploring the contribution distribution of data records selected by users in other external dimensions (not involved in chart drawing) to the chart metrics that users are interested in.
- This type of explanation essentially disassembles the aggregated value in another direction outside the chart to find potential high-contribution dimension values for users to further explore.
- users find a dimensional subspace of interest they can further use the data explanation subspace distribution exploration function to view the detailed distribution within the subspace.
- the internal feature analysis of the external dimension subspace can be highly related to the above-mentioned explanation related to the external dimension, and can support automatic search and recommendation of some subspaces of dimensional values, where the numerical distribution inside such subspaces has certain characteristics for the metric distribution selected by the user. Based on the subspace distribution, the user can further use the traceability original data recording function to analyze the source of the feature pattern.
- external highly interpretable metric analysis can focus on data patterns measured in a subset of data that the user is interested in, perform highly correlated metric analysis on the full set of data and the subset of data respectively, obtain a batch of explanatory external metric candidates, and further analyze them to obtain metrics with higher surprise, and display the correlation between the metric and the chart metric that the user is interested in through a scatter plot pattern, in order to explore possible insights from the data.
- the algorithm can also generate multiple different types of insights to support the analysis of associations within the data records corresponding to the selected chart visualization elements.
- the types of insights generated by the algorithm may include chart visualization element trend analysis, chart visualization element cluster analysis, etc., but the embodiments of the present application are not limited to these types of insights.
- the trend analysis of the chart visualization element can focus on the trend pattern of the data record corresponding to the selected chart visualization element as the x-axis dimension changes.
- the periodic change pattern that may exist in the data record is obtained from the numerical high points or numerical low points that may appear in the data record as a whole.
- the trend analysis of the chart visualization element can also be used for the prediction of data records. For another example, when there are some outliers in the data record showing a specific trend, only non-outliers can be selected for analysis in the process of selecting the chart visualization element, and the outliers can be skipped to improve the accuracy of the trend analysis.
- the cluster analysis of chart visualization elements can focus on the clustering patterns and differences of data records corresponding to multiple chart visualization elements selected in batches.
- this insight type can classify data records corresponding to chart visualization elements in one or more charts into aggregate classes based on the intrinsic properties of the data, where data records in each aggregate class have the same characteristics, and data records in different aggregate classes have greatly different characteristics.
- This insight type can analyze data tables in multiple data sources and classify data records corresponding to multiple chart visualization elements as much as possible.
- the automatically generated insight data presentation of the data interpretation function in the technical solution of the present application can be in the form of free expansion and contraction similar to an accordion, and is divided into two layers.
- the title of the first layer of the accordion marks the names of different insight categories.
- the second layer displays the specific insights recommended by all algorithms under this type of insight.
- the text description and chart drawing of this type of insight data will be displayed specifically.
- other insight data will be folded to ensure the neatness of the front-end interface.
- the present technical solution can support users to freely observe the charts and text results recommended by the algorithm in each type of different insights, where all charts generated by the algorithm also support basic interactive methods such as interactive selection, highlighting, and legend switching, thereby optimizing the user's exploration and analysis process experience, and also providing users with the possibility to conduct interactive analysis in the feature subspace of the insight chart.
- this technical solution can support users to export the insight charts of interest generated by data interpretation to the dashboard, and display them at the same level as the original charts, while displaying the insight text information on the right.
- This function supports associated highlighting, that is, when the user selects the insight chart exported to the dashboard, the chart that generated the insight data will be highlighted synchronously, and the user's filter information when the parent chart generated the insight data will be highlighted. Selected interest data.
- this technical solution can also be applied in cloud environment scenarios, and can be compatible with insight saving related functions in the microservices where it is located, and can be saved, previewed, and loaded like ordinary charts.
- steps 410-440 can effectively produce accurate and inspiring insights, but when providing data interpretation operations, the local data subset observed by the user cannot be subsequently analyzed, which to some extent limits the user's interactive exploration method.
- another embodiment of the present application shows a method 450 for generating insight data, providing further generation of insight subspaces to achieve subsequent analysis of insight data to further insight data.
- the method includes steps 460-490, which are described in detail below.
- Step 460 Present an insight chart in the second insight data, the insight chart comprising W chart visualization elements, each chart visualization element corresponding to at least one data record in the data source.
- the second insight data presented may be any insight data generated by selecting a chart visualization element, such as the first insight data in step 440, or may be insight data generated by further analyzing any generated insight chart.
- the type of the second insight data can be any of the insight types mentioned above, such as chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, external dimension subspace internal feature analysis, and external high interpretability metric analysis, etc., or it can be other types of insight analysis.
- the insight chart may further include statistical characteristic values or extreme values corresponding to the insight text descriptions in the insight chart.
- Step 470 confirming J chart visualization elements selected from W chart visualization elements, where W and J are positive integers greater than 1.
- Step 480 Determine all H data records corresponding to the J chart visualization elements, where H is a positive integer greater than 1.
- step 470 and step 480 are substantially the same as the processes of step 420 and step 430 and are not described in detail herein.
- Step 490 Generate third insight data based on all H data records, where the third insight data includes data distribution analysis or data record tracing of the H data records.
- the first chart in step 410 can be an insight chart in any insight data in step 460
- the insight type of the third insight data in step 490 can be the insight type of the first insight data in step 440, or other insight types can be added based on the insight type of the first insight data.
- the generated third insight data may be a further analysis of the data record distribution within a subspace of the insight data, aiming to help users further explore patterns of interest in the dimensional distribution chart of the algorithm-recommended insights.
- the insight chart of step 460 in the present technical solution is a derived insight chart of the external dimension valid record number analysis or external dimension distribution analysis type
- the technical solution will again generate a subspace metric distribution insight chart that also supports interaction.
- the selection of the metric for this insight is related to the metric associated with this type of insight and the metric of the chart that originally generated the data interpretation.
- the further insight data generated can be original data traceability, aiming to help users easily perform original data queries on abnormal parts of the recommended insight distribution and explore the reasons for the distribution characteristics.
- the insight chart of step 460 in the present technical solution is an insight chart derived from the aggregation and expansion analysis of chart metrics, the internal feature analysis of the external dimension subspace, and the above-mentioned subspace distribution exploration, since multiple sufficiently fine-grained downward analyses have usually been performed when executing this functional operation, the original data records directly returned are often small in number, but have strong explanatory power.
- the present technical solution supports convenient tracing of the original records, and both use a consistent display format.
- the present technical solution uses a paginated table to display the original records.
- the type of the third insight data is not limited to the above two insight types, and may also be any of the insight data types mentioned above, such as the chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, etc.
- This technology can support users to continue to carry out rich interactive operations on the data interpretation charts derived from the algorithm, and realize further focused analysis within the insight feature subspace. After constructing the feature subspace of interest, click the corresponding content in the function menu to ensure the logical continuity of the internal use of the data interpretation function in this technical solution, reducing the learning cost.
- this application designs a sorting strategy to determine the priority order of multiple sub-insights in an insight. Multiple sub-insights are recommended in order of priority, and the final result is generated by sorting.
- the sorting strategy is applied after generating the insight chart in steps 440 and 490 and before presenting the insight interface.
- Figure 5 shows a schematic flowchart of an embodiment of the sorting strategy 500 of the present application, which comprehensively considers two aspects: the confidence of the full amount of features possessed by each insight within the same type of insight and the feature richness of the insight.
- the method includes steps 510-540, and steps 510-540 are described in detail below. Assume that the insight data includes P sub-insight data.
- Step 510 Determine a characteristic index value of each sub-insight data among the P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data.
- the present technical solution formulates different measurement methods respectively.
- statistical features can be described based on indicators such as the number of eigenvalues and the degree of outliers of outliers relative to the full data.
- Distribution analysis features can be described by indicators such as the unevenness of distribution and the proportion of maximum distribution.
- Alarm-related analysis features can be described by the corresponding proportion of alarm values.
- Correlation measurement analysis features can be described by the above-mentioned measurement indicators. Difference analysis can be described based on the KL divergence of discrete distribution after binning, etc.
- Step 520 Confirm Q sub-insight data whose feature index values are higher than the threshold value of the feature index value, where Q is a positive integer greater than 1, and P is greater than Q.
- the technical solution filters feature insights whose feature index values are lower than the threshold value, which may be by placing the insights with lower feature indexes at the very end of the presentation interface queue, or by deleting the feature insights.
- Step 530 Determine the number of feature types of each sub-insight data among the Q sub-insight data.
- Step 540 Determine the priority order of the Q sub-insight data by sorting in descending order according to the number of feature types of each sub-insight data.
- the present technical solution can count the description of its feature richness and determine the recommendation priority of different sub-insight data by sorting in descending order.
- the feature indicators of the features of different insights are sorted in descending order and then compared in sequence to determine the priorities.
- the above method 400 and method 450 can be used alone or in combination.
- the following describes the method used in combination to achieve insight data generation with a specific example, and describes the implementation of the sorting strategy 500 with the example.
- Figures 6, 7 and 8 show detailed cases of using the technical solution of the present application to achieve insight data generation and the step-by-step exploration and in-depth process of insight data generation.
- the elements and data in the cases are all examples, and actual cases include but are not limited to the cases in Figures 6, 7 and 8.
- FIG6 shows an application interface 600.
- the application in this case may be a table application or a data intelligence analysis application.
- the application interface 600 includes a data table 610, which includes multiple dimensions and metrics.
- the data table is the data source.
- multiple data sources may be included, and each data source may include multiple data tables.
- the location dimension A, B and C and the unit price dimension a, b and c are summed and aggregated to form the sales volume X, that is, the location dimension and the unit price dimension are external dimensions and do not participate in the drawing of the chart.
- Their data records are directly accumulated to obtain the summed aggregate value.
- Chart 620 contains five chart visualization elements, that is, five columns, and the value of the data point of each chart visualization element is summed and aggregated by multiple data values of the location dimensions A, B and C and the unit price dimensions a, b and c, that is, corresponding to the data records of multiple related dimension values in the data table.
- the value of the data point of each chart visualization element can also correspond to only one data record. This part of the step process corresponds to step 410 above.
- the dotted box in chart 620 is a selection box of application interface 600.
- the chart visualization elements within the selection box will be highlighted, that is, the two chart visualization elements in the dotted box in chart 620 are filled with diagonal lines.
- the two chart visualization elements are the objects that need to be analyzed for insights in this case.
- the selection box in this case is a continuous selection. In actual cases, multiple selection boxes can select multiple discontinuous data, or only one chart visualization element can be selected. This case is not limited. This part of the steps corresponds to step 420 above.
- the background of the application determines the data records in the data table 610 corresponding to the selected chart visualization element.
- the specific value of the x-axis dimension corresponding to the selected chart visualization element is time 1-3, that is, the operation of selecting the interaction is to batch select chart visualization elements with specific values 1-3 along the dimension of the x-axis of the chart.
- the filtering logic of the data records corresponding to the selected chart visualization element is generated, that is, the specific value of the filtering dimension combination is a logical combination of (time dimension 1 or 2 or 3) and (location dimension A or B or C) and (unit price dimension a or b or c).
- the generated filtering logic can be used to query all the original data records in the data table 610, that is, to determine the selected chart. Data records corresponding to the visualization elements. This part of the steps corresponds to step 430 above. In actual cases, each column may be divided into multiple sub-columns due to different legend values. When only some sub-columns are selected, the external dimensions in the obtained logical combination may also have partial values.
- insight data Based on the data records corresponding to the selected chart visualization elements determined above, joint data analysis is performed to generate insight data.
- the specific insight data depends on the analysis results of different data record subsets, and the following is an exemplary example:
- the values of the sales volume corresponding to the two chart visualization elements selected in this case both present the same phenomenon, that is, they present abnormally high values.
- the sales volume corresponding to the two chart visualization elements are compared with the three chart visualization elements not selected in the chart 620 and the remaining data records in the data table at the same time.
- the sales volume corresponding to the two chart visualization elements is compared with the three chart visualization elements not selected in the chart 620, if it is found that the data record with the location dimension value of A has a significant contribution to the high sales volume, that is, when the time dimension value is 1 or 2 or 3 and the location dimension value is A, the sales volume is abnormally higher than the sales volume under other dimension values, then the data record with the time dimension value of 1 or 2 or 3 is determined to have an association relationship.
- the association relationship is that the time dimension value of 1 or 2 or 3 is associated with the location dimension value of A, or the association relationship can be expressed as the data record with the time dimension value of 1 or 2 or 3 and the location dimension value of A has common feature information, that is, the sales volume is abnormally higher than the sales volume under other dimension values.
- the content of the generated insight data can be the contribution of time, location or unit price to the outlier phenomenon, or it can be the analysis of the data records with the time dimension value of 1 or 2 or 3 and the location dimension value of A along the aggregate value of the unit price dimension, or it can be other insight data related to external dimensions.
- insight data can correspond to the insight data 621, 622, etc. in Figure 6.
- more insight data can be generated in actual cases, and there can be more phenomena presented by the selected chart visualization elements, and there can be more insight data generated by each phenomenon, which will not be repeated in this case.
- the analysis process of these insight data uses the data records corresponding to multiple chart visualization elements, so that users can analyze the observed local data. The above steps correspond to step 440 above.
- the insight data 621, 622, etc. generated based on chart 620 include insight charts 631, 632, etc. and text descriptions corresponding to the insight charts, wherein the drawing method of insight charts 631, 632, etc. is the same as that of chart 620, and the text descriptions corresponding to the insight charts may include characteristic values or extreme values in the insight data.
- insight data 641, 642 and other insight data are obtained.
- Insight data 641, 642 and other insight data are further analyzed insight data in this case, and the analysis steps are not repeated in this case.
- the insight type of insight data 641, 642 can be the same or similar insight type as the insight data 621, 622 and other insight data mentioned above, or it can be a subspace analysis of the insight data or the tracing of the insight data.
- the type of insight data 641 is the subspace analysis of insight chart 631
- its content may be the analysis of the composition of the original data records corresponding to the aggregated values of the data points constituting the insight chart 631.
- the x-axis may be the specific value of the dimension of the original data record, and the y-axis is the sales volume. This part can be used to explain the possible outliers or data records with greater contribution in the original data records constituting the insight data 621.
- insight data 642 is the original record traceability of insight chart 632
- its content may be the numerical value of the specific original data record constituting insight data 642 and the specific value of its dimension.
- the original record traceability presents these original data records through a paginated table.
- the feature values in the text description corresponding to the insight chart in the insight data can also be traced to the original record.
- this case can also generate insight data according to the aforementioned selection of chart visualization elements and data analysis steps, thereby achieving continuous further drill-down analysis of the insight data, which is not elaborated in this case.
- FIG7 shows a schematic diagram of an interface of another detailed case.
- the detailed case in FIG7 is slightly different from the detailed case in FIG6. The difference is that the dimensions in FIG7 are changed to distance, ID, number, throughput, etc., and the insight data in FIG7 are presented in different application interfaces after being generated.
- the application interface may be an interface belonging to different applications, that is, the insight data corresponding to different data records may be generated in different applications or application interfaces.
- a chart 720 in the application interface 702 is generated, and the chart visualization element in the chart 720 corresponds to at least one data record in the data table 710.
- a selection box is selected in the chart 720, and two chart visualization elements in the selection box are used to generate insight data.
- the insight data 721, 722, etc. generated finally are presented in the application interface 703.
- Two chart visualization elements are selected in the insight chart 731 or the insight chart 723 in the application interface 703 to generate insight data 741, etc., and Presented in application interface 704.
- this case supports secondary analysis and exploration of the focused feature subspace of the insight chart and derives insight data, optimizes the multi-level subspace analysis and exploration process in the automatic insight data generation auxiliary analysis process, and improves the analysis freedom from surface to point, from shallow to deep.
- Fig. 8 shows a sorting strategy 800 used for multiple insight data in this case.
- This case is an embodiment of a sorting strategy, and does not limit the process of determining the priority order of multiple insights.
- insight data generation process Assume that in this case, in the insight data generation process shown in FIG6 or FIG7 , 10 insight data are obtained, namely insight data 810 to 819.
- the application background sorts the obtained insight data 810 to 819.
- the confidence score can be selected as the feature index value.
- different measurement methods are used.
- the application background determines the feature index values of insight data 810 to 819 and arranges them in descending order. The resulting arrangement list is shown in Figure 8.
- the threshold of the characteristic index value determines the threshold of the characteristic index value to filter out some insight data with lower characteristic index values. For example, in this case, the confidence score of 0.95 is selected as the threshold to filter out some insight data with confidence scores lower than 0.95 as shown in FIG8 .
- the insight data 815, 818, 811 and 810, etc. arranged in descending order of priority in Figure 8 correspond to the insight data 721, 722, etc. presented in Figure 6 or the insight data 821, 822, etc. presented in Figure 7.
- the sorting process of the insight data 841 , 842 , etc. shown in FIG. 6 and the insight data 841 , 842 , etc. shown in FIG. 7 is the same as the above arrangement process, and is not described in detail in this case.
- FIG9 can execute the methods shown in FIG4 and FIG5. It should be understood that the apparatus described below can execute the methods of the aforementioned embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the apparatus of the embodiment of the present application.
- FIG9 is a schematic diagram of a device for generating insights according to an embodiment of the present application.
- the device 900 shown in FIG9 includes: an interaction module 910 and a processing module 920 .
- the interaction module is used to: present a first chart, the first chart includes M chart visualization elements, and each chart visualization element corresponds to at least one data record in a data source.
- the processing module is used to: confirm N chart visualization elements selected from M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N, determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1, and perform joint data analysis based on the K data records to generate first insight data for the N chart visualization elements.
- the processing module is also used to determine characteristic information common to L data records out of K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the characteristic information common to the L data records, and all data records in the data source.
- the processing module is further used to generate a numerical distribution or data record traceability inside data records corresponding to N chart visualization elements according to an insight chart generated based on the second insight data.
- the processing module is further used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data in order of priority.
- the processing module is also used to determine a characteristic index value for each sub-insight data among P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data, confirm Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q, determine the number of characteristic types of each sub-insight data among the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting them in descending order according to the number of characteristic types of each sub-insight data.
- the processing module is further used to determine dimensions and metrics in a first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, where the query request is used to query data records in a data source.
- the above modules can be implemented by software or hardware.
- the implementation of the processing module 920 is described below by taking the processing module 920 as an example.
- the implementation of the interaction module 910 can refer to the implementation of the processing module 920.
- the processing module 920 may include code running on a computing instance.
- the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the computing instance may be one or more.
- the processing module 920 may include code running on multiple hosts/virtual machines/containers. It should be noted that the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region or in different regions. Furthermore, the multiple hosts/virtual machines/containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs, each AZ including one data center or multiple data centers with close geographical locations. Among them, generally a region may include multiple AZs.
- AZ availability zone
- VPC virtual private cloud
- multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs.
- VPC virtual private cloud
- a VPC is set up in a region.
- a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
- the processing module 920 may include at least one computing device, such as a server, etc.
- the processing module 920 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
- ASIC application-specific integrated circuit
- PLD programmable logic device
- the PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
- CPLD complex programmable logical device
- FPGA field-programmable gate array
- GAL generic array logic
- the multiple computing devices included in the processing module 920 can be distributed in the same region or in different regions.
- the multiple computing devices included in the processing module 920 can be distributed in the same AZ or in different AZs.
- the multiple computing devices included in the processing module 920 can be distributed in the same VPC or in multiple VPCs.
- the multiple computing devices included in the processing module 920 can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
- the present application also provides a computing device 1000.
- the computing device 1000 includes: a bus 1002, a processor 1004, a memory 1006, and a communication interface 1008.
- the processor 1004, the memory 1006, and the communication interface 1008 communicate with each other through the bus 1002.
- the computing device 1000 may be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 1000.
- the bus 1002 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
- the bus may be divided into an address bus, a data bus, a control bus, etc.
- FIG. 10 is represented by only one line, but does not mean that there is only one bus or one type of bus.
- the bus 1002 may include a path for transmitting information between various components of the computing device 1000 (e.g., the memory 1006, the processor 1004, and the communication interface 1008).
- Processor 1004 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).
- processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).
- CPU central processing unit
- GPU graphics processing unit
- MP microprocessor
- DSP digital signal processor
- the memory 1006 may include a volatile memory, such as a random access memory (RAM).
- the processor 1004 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).
- ROM read-only memory
- HDD hard disk drive
- SSD solid state drive
- the memory 1006 stores executable program codes, and the processor 1004 executes the executable program codes to respectively implement the functions of the aforementioned interaction module 910 and the processing module 920, thereby implementing the aforementioned method for generating insight data. That is, the memory 1006 stores instructions for executing the aforementioned method for analyzing and generating insight data.
- the communication interface 1008 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 1000 and other devices or communication networks.
- a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 1000 and other devices or communication networks.
- the embodiment of the present application also provides a computing device cluster.
- the computing device cluster includes at least one computing device.
- the computing device can be a server, such as a central server, an edge server, or a local server in a local data center.
- the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.
- the computing device cluster includes at least one computing device 1000.
- the memory 1006 in one or more computing devices 1000 in the computing device cluster may store the same instructions for executing the above-mentioned insight data generation method.
- the memory 1006 of one or more computing devices 1000 in the computing device cluster may also respectively store some instructions for executing the above-mentioned method for generating insight data.
- the combination of one or more computing devices 1000 may jointly execute instructions for executing the above-mentioned method for generating insight data.
- the memory 1006 in different computing devices 1000 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the above-mentioned apparatus. That is, the instructions stored in the memory 1006 in different computing devices 1000 can implement the functions of one or more modules in the interaction module and the processing module.
- one or more computing devices in the computing device cluster can be connected via a network.
- the network can be a wide area network or a local area network, etc.
- Figure 12 shows a possible implementation. As shown in Figure 12, two computing devices 1000A and 1000B are connected via a network. Specifically, the network is connected through the communication interface in each computing device.
- the memory 1006 in the computing device 1000A stores instructions for the functions of the interaction module.
- the memory 1006 in the computing device 1000B stores instructions for executing the functions of the processing module.
- the functions of the computing device 1000A shown in FIG12 may also be completed by multiple computing devices 1000.
- the functions of the computing device 1000B may also be completed by multiple computing devices 1000.
- An embodiment of the present application also provides a chip, which includes a processor and a data interface.
- the processor reads instructions stored in a memory through the data interface to execute the above-mentioned method for generating insight data.
- the present application also provides a computer program product including instructions.
- the computer program product may be software or a program product including instructions that can be run on a computing device or stored in any available medium.
- the at least one computing device executes the above-mentioned method for generating insight data.
- the embodiment of the present application also provides a computer-readable storage medium.
- the computer-readable storage medium can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media.
- the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk).
- the computer-readable storage medium includes instructions that instruct the computing device to execute the above-mentioned method for generating insight data.
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of modules is only a logical function division. There may be other division methods in actual implementation, such as multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or modules, which can be electrical, mechanical or other forms.
- modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in one place or distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art.
- the computer software product is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods of various embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Provided in the embodiments of the present application are an insight data generation method and apparatus. The method comprises: presenting a first chart, wherein the first chart comprises M chart visualization elements, and each chart visualization element corresponds to at least one data record in a data source; confirming N chart visualization elements selected from among the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N; determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1; and performing joint data analysis on the basis of all the K data records, so as to generate first insight data of the N chart visualization elements. By means of the technical solution provided in the present application, insight data of chart visualization elements selected in batches can be automatically generated, and subsequent exchange and analysis of the insight data are realized, thereby improving the accuracy of insight.
Description
本申请要求于2022年10月18日提交中国专利局、申请号为202211275756.8、申请名称为“见解数据生成的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on October 18, 2022, with application number 202211275756.8 and application name “Method and Device for Generating Insight Data”, the entire contents of which are incorporated by reference in this application.
本申请实施例涉及数据智能领域,并且更具体地,涉及一种见解数据生成的方法和装置。Embodiments of the present application relate to the field of data intelligence, and more specifically, to a method and apparatus for generating insight data.
自动化的见解数据生成是商业智能辅助分析决策中非常重要的能力,逐渐成为各厂商提供的商业智能产品中的核心竞争力之一。如何基于用户提供的数据,设计恰当的前端交互流程,保证后端数据查询性能,提升算法特征挖掘、关联案例分析、异常模式定义、成因分析构建等能力,最终整合以简洁美观的前端展示和易用的交互,反馈呈现给用户是见解数据生成类技术的竞争力构建的关键因素。Automatic insight data generation is a very important capability in business intelligence-assisted analysis and decision-making, and has gradually become one of the core competitive advantages of business intelligence products provided by various manufacturers. How to design appropriate front-end interaction processes based on user-provided data, ensure back-end data query performance, improve algorithm feature mining, related case analysis, abnormal pattern definition, cause analysis construction and other capabilities, and finally integrate simple and beautiful front-end display and easy-to-use interaction, and present feedback to users is a key factor in building the competitiveness of insight data generation technology.
当下各商业智能分析平台中图表的自动化智能见解生成应用场景中,单点数据相关的见解数据分析能够帮助用户构建、浏览和分析数据时,可以检查、发现和深入了解可视化图表中的单个图表可视化元素。但是见解数据自动生成相关技术产品中基于数据单点的分析粒度生成的见解数据的准确性较差,同时其中支持的交互和分析自由度均仍有所欠缺,仍有进一步改良优化的空间。In the application scenarios of automatic intelligent insight generation of charts in current business intelligence analysis platforms, the insight data analysis related to single-point data can help users to check, discover and gain in-depth understanding of individual chart visualization elements in visual charts when building, browsing and analyzing data. However, the accuracy of insight data generated based on the analysis granularity of single data points in technical products related to automatic insight data generation is poor, and the interaction and analysis freedom supported are still lacking, and there is still room for further improvement and optimization.
发明内容Summary of the invention
本申请实施例提供一种见解数据生成的方法和装置,可以实现批量选取图表中的多个图表可视化元素进而生成见解数据,提升见解数据的准确性以及交互自由度。The embodiments of the present application provide a method and device for generating insight data, which can realize batch selection of multiple chart visualization elements in a chart to generate insight data, thereby improving the accuracy of the insight data and the degree of interactive freedom.
第一方面,提供了一种见解数据的生成方法,包括:呈现第一图表,第一图表包括M个图表可视化元素,每个图表可视化元素对应于数据源中至少一个数据记录;确认从M个图表可视化元素中选择的N个图表可视化元素,其中M和N为大于1的正整数,且M大于或等于N;确定N个图表可视化元素对应的所有K个数据记录,其中K为大于1的正整数;基于所有K个数据记录,进行联合数据分析,以生成N个图表可视化元素的第一见解数据。In a first aspect, a method for generating insight data is provided, comprising: presenting a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source; confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N; determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1; and performing joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.
根据本申请提供的技术方案,能够实现自动生成用户交互选择关注的批量数据的见解,能够对于由多图表可视化元素构成的模式进行分析解释,并考虑了多个图表可视化元素的关联性和整体性,提高了见解的准确性,降低了交互代价。According to the technical solution provided in the present application, it is possible to automatically generate insights into batch data that users interactively choose to focus on, analyze and interpret patterns composed of multiple chart visualization elements, and take into account the correlation and integrity of multiple chart visualization elements, thereby improving the accuracy of insights and reducing interaction costs.
结合第一方面,在第一方面的某些实现方式中,基于所有K个数据记录,进行联合数据分析,包括:确定所有K个数据记录中的L个数据记录共有的特征信息,L个数据记录对应于至少两个图表可视化元素,其中L为大于1的正整数,且K大于或等于L;基于L个数据记录、L个数据记录共有的特征信息以及数据源中的所有数据记录,进行数据分析。In combination with the first aspect, in certain implementations of the first aspect, joint data analysis is performed based on all K data records, including: determining feature information common to L data records among all K data records, the L data records corresponding to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L; performing data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.
根据本申请提供的技术方案,能够实现用户分析含有多个图表可视化元素的局部数据子集的见解数据,考虑了图表可视化元素间的关联性和整体性,提高了见解数据的准确性。According to the technical solution provided in the present application, it is possible for users to analyze insight data of a local data subset containing multiple chart visualization elements, taking into account the correlation and integrity between the chart visualization elements, thereby improving the accuracy of the insight data.
结合第一方面,在第一方面的某些实现方式中,第一见解数据包括以下见解数据类型的至少一种:图表度量聚合展开分析,用于分析所述N个图表可视化元素对应的数据记录的原始数据分布构成;外部维度有效记录数分析,用于分析所述K个数据记录在未参与绘制第一图表的维度上的有效记录数分布情况;外部维度分布贡献分析,用于分析所述K个数据记录在未参与绘制第一图表的维度上对图表度量的贡献度;外部维度子空间内部特征分析,所述外部维度子空间内部特征分析用于分析未参与绘制第一图表的维度中数据记录内部的特征分布情况;外部高可解释度度量分析,所述外部高可解释度度量分析用于分析未参与绘制第一图表的度量及原始数据记录与所述L个数据记录的关联情况。In combination with the first aspect, in certain implementations of the first aspect, the first insight data includes at least one of the following insight data types: chart metric aggregation expansion analysis, used to analyze the original data distribution composition of the data records corresponding to the N chart visualization elements; external dimension valid record number analysis, used to analyze the distribution of the number of valid records of the K data records in the dimensions that do not participate in drawing the first chart; external dimension distribution contribution analysis, used to analyze the contribution of the K data records to the chart metrics in the dimensions that do not participate in drawing the first chart; external dimensional subspace internal feature analysis, the external dimensional subspace internal feature analysis is used to analyze the feature distribution inside the data records in the dimensions that do not participate in drawing the first chart; external high interpretability metric analysis, the external high interpretability metric analysis is used to analyze the association between the metrics and original data records that do not participate in drawing the first chart and the L data records.
根据上述技术方案,该方法能够引导用户探索关联数据的分析内容,例如异常聚合值的构成、可视化图表元素的聚合值表现出特定的模式的潜在原因、潜在的高贡献维度、子空间内部的数值分布对
于用户选择的度量分布的影响以及高关联度的图表的外部度量。According to the above technical solution, the method can guide users to explore the analysis content of the associated data, such as the composition of abnormal aggregate values, the potential reasons why the aggregate values of the visualization chart elements show a specific pattern, the potential high contribution dimensions, and the value distribution within the subspace. The impact of user-selected metric distributions and external metrics on highly correlated graphs.
结合第一方面,在第一方面的某些实现方式中,第一图表为基于第二见解数据生成的见解图表,生成N个图表可视化元素的第一见解数据包括生成N个图表可视化元素的对应的数据记录内部的数值分布情况或者数据记录溯源。In combination with the first aspect, in certain implementations of the first aspect, the first chart is an insight chart generated based on the second insight data, and the first insight data for generating N chart visualization elements includes generating the numerical distribution within the corresponding data records or data record traceability of the N chart visualization elements.
根据上述技术方案,该方法支持对见解图表的聚焦的特征子空间二次分析探索并派生见解,优化自动见解生成辅助分析过程中的多层级子空间分析探索流程,提升分析自由度,由面到点,由浅入深。According to the above technical solution, the method supports secondary analysis and exploration of the focused feature subspace of the insight chart and derives insights, optimizes the multi-level subspace analysis and exploration process in the automatic insight generation auxiliary analysis process, and improves the analysis freedom from surface to point, from shallow to deep.
N个图表可视化元素的对应的数据记录内部的数值分布情况帮助用户对于算法推荐见解的维度分布图表中感兴趣模式的进一步深入发掘,探寻分布特征的原因;数据记录溯源能够帮助用户对于推荐展示的见解分布中异常的局部进行便捷的原始数据查询,探寻分布特征的原因。The numerical distribution within the data records corresponding to the N chart visualization elements helps users further explore the patterns of interest in the dimensional distribution chart of the algorithm recommendation insights and explore the reasons for the distribution characteristics; data record tracing can help users conveniently query the original data for abnormal parts of the distribution of the recommended insights and explore the reasons for the distribution characteristics.
结合第一方面,在第一方面的某些实现方式中,确定第一见解数据包括的P个子见解数据的优先级顺序;按照该优先级顺序推荐该P个子见解数据。In combination with the first aspect, in certain implementations of the first aspect, a priority order of P sub-insight data included in the first insight data is determined; and the P sub-insight data are recommended according to the priority order.
根据上述技术方案,该方法能够避免一次性产生大量的、无序的见解图表呈现给用户,使得用户能够很快抉择从何处探索,提高了用户获取并分析见解的效率。According to the above technical solution, the method can avoid generating a large number of disordered insight charts at one time and presenting them to the user, so that the user can quickly decide where to explore, thereby improving the efficiency of the user in obtaining and analyzing insights.
结合第一方面,在第一方面的某些实现方式中,确定P个子见解数据的优先级顺序,还包括:确定P个子见解数据中每个子见解数据的特征指标值,特征指标值用于度量P个子见解数据中每个子见解数据的置信度或显著度;确认特征指标值高于特征指标值的阈值的Q个子见解数据,其中Q为大于1的正整数,且P大于Q;确定Q个子见解数据中每个子见解数据的特征种类数量;根据每个子见解数据的特征种类数量,降序排序确定该Q个子见解数据的优先级顺序。In combination with the first aspect, in certain implementations of the first aspect, determining the priority order of P sub-insight data also includes: determining a characteristic index value for each sub-insight data in the P sub-insight data, the characteristic index value being used to measure the confidence or significance of each sub-insight data in the P sub-insight data; confirming Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q; determining the number of feature types for each sub-insight data in the Q sub-insight data; and determining the priority order of the Q sub-insight data by sorting in descending order according to the number of feature types of each sub-insight data.
根据上述技术方案,该方法实现了统筹考虑同一类见解内部的各见解具有的全量特征的置信度和见解所具有的特征丰富度两个方面。According to the above technical solution, the method realizes the comprehensive consideration of the confidence of all features possessed by each opinion within the same category of opinions and the feature richness of the opinions.
结合第一方面,在第一方面的某些实现方式中,确定N个图表可视化元素对应的所有K个数据记录,还包括:确定N个图表可视化元素对应的第一图表中的维度和度量;根据第一图表中的维度和度量生成查询请求,该查询请求用于查询数据源中的数据记录。In combination with the first aspect, in certain implementations of the first aspect, determining all K data records corresponding to N chart visualization elements also includes: determining dimensions and metrics in a first chart corresponding to the N chart visualization elements; generating a query request based on the dimensions and metrics in the first chart, the query request being used to query data records in a data source.
根据上述技术方案,该方法实现快速定位图表可视化元素所包含的图表信息,该图表信息能够实现图表可视化元素对应的数据记录的快速查询以及见解数据生成的关注维度/度量选择。According to the above technical solution, the method realizes rapid positioning of chart information contained in a chart visualization element, and the chart information can realize rapid query of data records corresponding to the chart visualization element and selection of focus dimensions/metrics for generating insight data.
第二方面,提供一种生成见解数据的装置,包括:交互模块,用于呈现第一图表,第一图表包括M个图表可视化元素,每个图表可视化元素对应于数据源中至少一个数据记录;处理模块,用于确认从M个图表可视化元素中选择的N个图表可视化元素,其中M和N为大于1的正整数,且M大于或等于N,确定N个图表可视化元素对应的所有K个数据记录,其中K为大于1的正整数,并基于所有K个数据记录,进行联合数据分析,以生成N个图表可视化元素的第一见解数据。In a second aspect, a device for generating insight data is provided, comprising: an interaction module, for presenting a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source; a processing module, for confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N, determining all K data records corresponding to the N chart visualization elements, wherein K is a positive integer greater than 1, and performing joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.
结合第二方面,在第二方面的某些实现方式中,处理模块还用于确定K个数据记录中的L个数据记录共有的特征信息,L个数据记录对应于至少两个图表可视化元素,其中L为大于1的正整数,且K大于或等于L,并基于L个数据记录、L个数据记录共有的特征信息以及数据源中的所有数据记录,进行数据分析。In combination with the second aspect, in certain implementations of the second aspect, the processing module is further used to determine feature information common to L data records out of K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.
结合第二方面,在第二方面的某些实现方式中,处理模块还用于根据基于第二见解数据生成的见解图表,生成N个图表可视化元素的对应的数据记录内部的数值分布情况或者数据记录溯源。In combination with the second aspect, in certain implementations of the second aspect, the processing module is also used to generate the numerical distribution or data record traceability within the data records corresponding to N chart visualization elements based on the insight chart generated based on the second insight data.
结合第二方面,在第二方面的某些实现方式中,处理模块还用于确定第一见解数据包括的P个子见解数据的优先级顺序,其中P为大于1的正整数,并按照优先级顺序推荐P个子见解数据。In combination with the second aspect, in certain implementations of the second aspect, the processing module is also used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data in order of priority.
结合第二方面,在第二方面的某些实现方式中,处理模块还用于确定P个子见解数据中每个子见解数据的特征指标值,该特征指标值用于度量P个子见解数据中每个子见解数据的置信度或显著度,确认特征指标值高于特征指标值的阈值的Q个子见解数据,其中Q为大于1的正整数,且P大于Q,确定Q个子见解数据中每个子见解数据的特征种类数量,并根据每个子见解数据的特征种类数量,降序排序确定该Q个子见解数据的优先级顺序。In combination with the second aspect, in certain implementations of the second aspect, the processing module is also used to determine a characteristic index value for each sub-insight data in the P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data in the P sub-insight data, confirm Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1 and P is greater than Q, determine the number of feature types for each sub-insight data in the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting them in descending order according to the number of feature types of each sub-insight data.
结合第二方面,在第二方面的某些实现方式中,处理模块还用于确定N个图表可视化元素对应的第一图表中的维度和度量,并根据第一图表中的维度和度量生成查询请求,该查询请求用于查询数据源中的数据记录。In combination with the second aspect, in certain implementations of the second aspect, the processing module is also used to determine the dimensions and metrics in the first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, which is used to query data records in the data source.
第三方面,提供一种计算设备,包括处理器和存储器,其中存储器用于存储指令,处理器用于执行存储器中存储的指令,使得计算设备执行第一方面或第一方面任意一种可能的实现方式中的方法。
In a third aspect, a computing device is provided, comprising a processor and a memory, wherein the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory, so that the computing device executes the method in the first aspect or any possible implementation of the first aspect.
第四方面,提供一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器,其中,存储器用于存储指令,处理器用于从存储器中调用并运行该指令,使得该计算设备集群执行第一方面或第一方面任意一种可能的实现方式中的方法。In a fourth aspect, a computing device cluster is provided, comprising at least one computing device, each computing device comprising a processor and a memory, wherein the memory is used to store instructions, and the processor is used to call and execute the instructions from the memory, so that the computing device cluster executes the method in the first aspect or any possible implementation of the first aspect.
可选地,该处理器可以是通用处理器,可以通过硬件来实现也可以通过软件来实现。当通过硬件来实现时,该处理器可以是逻辑电路、集成电路等;当通过软件来实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现,该存储器可以集成在处理器中,可以位于该处理器之外独立存在。Optionally, the processor may be a general-purpose processor, which may be implemented by hardware or software. When implemented by hardware, the processor may be a logic circuit, an integrated circuit, etc.; when implemented by software, the processor may be a general-purpose processor, which is implemented by reading software codes stored in a memory, which may be integrated in the processor or may be located outside the processor and exist independently.
第五方面,提供了一种芯片,该芯片获取指令并执行该指令来实现上述第一方面或第一方面任意一种可能的实现方式中的方法。In a fifth aspect, a chip is provided, which obtains instructions and executes the instructions to implement the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
可选地,作为一种实现方式,该芯片包括处理器与数据接口,该处理器通过该数据接口读取存储器上存储的指令,执行上述第一方面或第一方面任意一种可能的实现方式中的方法。Optionally, as an implementation, the chip includes a processor and a data interface, and the processor reads instructions stored in the memory through the data interface to execute the method in the above-mentioned first aspect or any possible implementation of the first aspect.
可选地,作为一种实现方式,该芯片还可以包括存储器,该存储器中存储有指令,该处理器用于执行该存储器上存储的指令,当该指令被执行时,该处理器用于执行上述第一方面或第一方面任意一种可能的实现方式中的方法。Optionally, as an implementation method, the chip may also include a memory, in which instructions are stored, and the processor is used to execute the instructions stored in the memory. When the instructions are executed, the processor is used to execute the method in the above-mentioned first aspect or any possible implementation method of the first aspect.
第六方面,提供了一种包含指令的计算机程序产品,当指令被计算设备集群运行时,使得计算设备集群执行上述第一方面或第一方面任意一种可能的实现方式中的方法。In a sixth aspect, a computer program product comprising instructions is provided. When the instructions are executed by a computing device cluster, the computing device cluster executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
第七方面,提供了一种计算机可读存储介质,包括计算机程序指令,当计算机指令由计算设备集群执行时,使得计算设备集群执行上述第一方面或第一方面任意一种可能的实现方式中的方法。In a seventh aspect, a computer-readable storage medium is provided, comprising computer program instructions. When the computer instructions are executed by a computing device cluster, the computing device cluster executes the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
作为示例,这些计算机可读存储介质包括但不限于如下的一个或者多个:只读存储器(read-only memory,ROM)、可编程ROM(programmable ROM,PROM)、可擦除的PROM(erasable PROM,EPROM)、Flash存储器、电EPROM(electrically EPROM,EEPROM)以及硬盘驱动器(hard drive)。As examples, these computer-readable storage media include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), Flash memory, electrically EPROM (EEPROM), and hard drive.
可选地,作为一种实现方式,上述存储介质具体可以是非易失性存储介质。Optionally, as an implementation manner, the above-mentioned storage medium may specifically be a non-volatile storage medium.
图1是本申请实施例提供的一种见解数据生成的应用场景示意图。FIG1 is a schematic diagram of an application scenario for generating insight data provided in an embodiment of the present application.
图2是本申请实施例提供的另一种见解数据生成的应用场景示意图。FIG. 2 is a schematic diagram of another application scenario for generating insight data provided in an embodiment of the present application.
图3是本申请实施例提供的一种系统架构的示意图。FIG3 is a schematic diagram of a system architecture provided in an embodiment of the present application.
图4是本申请实施例提供的一种见解数据生成过程的示意图。FIG4 is a schematic diagram of an insight data generation process provided in an embodiment of the present application.
图5是本申请实施例提供的一种排序策略的示意图。FIG5 is a schematic diagram of a sorting strategy provided in an embodiment of the present application.
图6是本申请实施例提供的一种见解数据生成过程的案例示意图。FIG6 is a schematic diagram of a case study of an insight data generation process provided in an embodiment of the present application.
图7是本申请实施例提供的另一种见解数据生成过程的案例示意图。FIG. 7 is a schematic diagram of another example of an insight data generation process provided in an embodiment of the present application.
图8是本申请实施例提供的一种排序策略的案例示意图。FIG8 is a schematic diagram of an example of a sorting strategy provided in an embodiment of the present application.
图9是本申请实施例提供的一种见解数据生成的装置的示意性结构框图。FIG. 9 is a schematic structural block diagram of an apparatus for generating insight data provided in an embodiment of the present application.
图10是本申请实施例提供的一种计算设备的示意性结构框图。FIG. 10 is a schematic structural block diagram of a computing device provided in an embodiment of the present application.
图11是本申请实施例提供的一种计算设备集群的示意性结构框图。FIG. 11 is a schematic structural block diagram of a computing device cluster provided in an embodiment of the present application.
图12是本申请实施例提供的另一计算设备集群的示意性结构框图。FIG. 12 is a schematic structural block diagram of another computing device cluster provided in an embodiment of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will describe the technical solutions in the embodiments of the present application in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.
除非另有说明,本申请实施例所使用的所有技术和科学术语与本申请的技术领域的技术人员通常理解的含义相同。本申请中所使用的术语只是为了描述具体的实施例的目的,不是旨在限制本申请的范围。Unless otherwise specified, all technical and scientific terms used in the embodiments of the present application have the same meaning as those commonly understood by those skilled in the art of the present application. The terms used in this application are only for the purpose of describing specific embodiments and are not intended to limit the scope of this application.
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that in the various embodiments of the present application, the size of the serial number of each process does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
另外,在本申请实施例中,“示例的”、“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。
确切而言,使用示例的一词旨在以具体方式呈现概念。In addition, in the embodiments of the present application, words such as "exemplary" and "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" in the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Rather, use of the word exemplary is intended to present concepts in a concrete way.
本申请实施例中,“相应的(corresponding,relevant)”和“对应的(corresponding)”有时可以混用,应当指出的是,在不强调其区别时,其所要表达的含义是一致的。In the embodiments of the present application, “corresponding” and “relevant” may sometimes be used interchangeably. It should be noted that when the distinction between them is not emphasized, the meanings they intend to express are consistent.
本申请实施例描述的网络架构以及业务场景是为了更加清楚地说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The network architecture and business scenarios described in the embodiments of the present application are intended to more clearly illustrate the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided in the embodiments of the present application. A person of ordinary skill in the art can appreciate that with the evolution of the network architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。References to "one embodiment" or "some embodiments" etc. described in this specification mean that a particular feature, structure or characteristic described in conjunction with the embodiment is included in one or more embodiments of the present application. Thus, the phrases "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification do not necessarily all refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways.
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:包括单独存在A,同时存在A和B,以及单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。In this application, "at least one" means one or more, and "plurality" means two or more. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can mean: including the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A and B can be singular or plural. The character "/" generally indicates that the previous and next associated objects are in an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple.
为了便于理解本申请,下文先介绍本申请涉及的术语。In order to facilitate the understanding of the present application, the terms involved in the present application are first introduced below.
1、维度:维度是对于数据集中字段的一种分类方式,对数据具有一定进行分类意义的字段被称为维度,通常数据形式为可枚举值形式,例如“月份”、“ID”等。1. Dimension: Dimension is a classification method for fields in a data set. Fields that have a certain classification meaning for data are called dimensions. Usually, the data is in the form of enumerable values, such as "month", "ID", etc.
2、度量:具有可量化数据的指标字段被称为度量,通常为数值形式。2. Metrics: Indicator fields with quantifiable data are called metrics, usually in numerical form.
3、聚合值:聚合值是数据集中的单个字段在被筛选的数据子集中,经过一些计算操作后最终生成的汇总值或总计值,例如求和聚合、均值聚合等。3. Aggregate value: The aggregate value is the summary value or total value generated by a single field in a data set in a filtered data subset after some calculation operations, such as sum aggregation, mean aggregation, etc.
4、记录:指构成数据集的数据库表中的一行或者多行。4. Record: refers to one or more rows in a database table that constitutes a data set.
5、图表可视化元素:图表可视化元素是可视化图表中一个可选择的数据点,它汇总了数据中的一些基础记录值。图表可视化元素的数据可以由单个记录或聚合在一起的多个记录组成。可视化图表中的图表可视化元素可以采用多种方式进行展示如点、线条、形状等。5. Chart Visualization Element: A chart visualization element is a selectable data point in a visualization chart that summarizes some basic record values in the data. The data of a chart visualization element can consist of a single record or multiple records aggregated together. Chart visualization elements in a visualization chart can be displayed in a variety of ways such as points, lines, shapes, etc.
6、内部和外部:内部指的是参与分析的维度和度量参与构成用户当前分析的图表绘制,外部指的是参与分析的维度和度量未参与构成用户当前分析的图表绘制。6. Internal and external: Internal refers to the dimensions and measurements involved in the analysis participating in the drawing of the chart that constitutes the user's current analysis; external refers to the dimensions and measurements involved in the analysis not participating in the drawing of the chart that constitutes the user's current analysis.
自动化的见解数据生成是商业智能辅助分析决策中非常重要的能力,逐渐成为各厂商提供的商业智能产品中的核心竞争力之一。如何基于用户提供的数据,设计恰当的前端交互流程,保证后端数据查询性能,提升算法特征挖掘、关联案例分析、异常模式定义、成因分析构建等能力,最终整合以简洁美观的前端展示和易用的交互,反馈呈现给用户是见解数据生成类技术的竞争力构建的关键因素。Automatic insight data generation is a very important capability in business intelligence-assisted analysis and decision-making, and has gradually become one of the core competitive advantages of business intelligence products provided by various manufacturers. How to design appropriate front-end interaction processes based on user-provided data, ensure back-end data query performance, improve algorithm feature mining, related case analysis, abnormal pattern definition, cause analysis construction and other capabilities, and finally integrate simple and beautiful front-end display and easy-to-use interaction, and present feedback to users is a key factor in building the competitiveness of insight data generation technology.
为了更好地理解本申请实施例的方案,下面先结合图1对本申请实施例可能的应用场景进行简单的介绍。In order to better understand the solution of the embodiment of the present application, the possible application scenarios of the embodiment of the present application are briefly introduced below in conjunction with Figure 1.
图1示出了一种见解数据生成系统,该见解数据生成系统可包括用户设备以及数据处理设备。其中,用户设备可包括手机、个人电脑或者信息处理中心等智能终端。通常情况下用户设备可以作为见解数据生成请求的发起端。FIG1 shows an insight data generation system, which may include a user device and a data processing device. The user device may include a smart terminal such as a mobile phone, a personal computer or an information processing center. In general, the user device may be used as the initiator of the insight data generation request.
可选地,上述数据处理设备可以是云服务器、网络服务器、应用服务器或管理服务器等具有数据处理功能的设备或服务器。数据处理设备通过交互接口接收来自智能终端的选取图表可视化元素的指令,再通过存储数据的存储器以及数据处理的处理器环节进行机器学习、深度学习、搜索、推理、决策等方式的数据处理。数据处理设备中的存储器可以是一个统称,可以是存储历史数据的本地存储设备或者数据库中的存储管理器。Optionally, the above-mentioned data processing device can be a device or server with data processing function such as a cloud server, a network server, an application server or a management server. The data processing device receives the instruction of selecting the visualization element of the chart from the intelligent terminal through the interactive interface, and then performs data processing in the form of machine learning, deep learning, search, reasoning, decision-making, etc. through the memory storing the data and the processor link of the data processing. The memory in the data processing device can be a general term, which can be a local storage device storing historical data or a storage manager in the database.
可选地,在图1所示的见解数据生成系统中,用户设备可以接收用户选取可视化图表中一个或者多个图表可视化元素的指令,然后向数据处理设备发起筛选和查询请求,用于查找出选取的图表可视化元素的精细粒度原始记录,使得数据处理设备对用户设备选取的一个或者多个图表可视化元素对应的原始数据记录进行数据分析,从而生成一个或者多个图表可视化元素的见解数据。
Optionally, in the insight data generation system shown in Figure 1, the user device can receive an instruction from a user to select one or more chart visualization elements in a visualization chart, and then initiate a screening and query request to a data processing device to find out the fine-grained original records of the selected chart visualization elements, so that the data processing device performs data analysis on the original data records corresponding to the one or more chart visualization elements selected by the user device, thereby generating insight data for one or more chart visualization elements.
在图1中,数据处理设备可以执行本申请实施例的见解数据生成方法。需要说明的是,虽然图1中将用户设备和数据处理设备描绘为独立的设备,但在本申请的其他实施例中,两个设备可以由同一个装置实现。In Figure 1, the data processing device can execute the insight data generating method of the embodiment of the present application. It should be noted that although the user device and the data processing device are depicted as independent devices in Figure 1, in other embodiments of the present application, the two devices can be implemented by the same device.
图2示出了另一种见解数据生成系统,在图2中,用户设备可直接作为数据处理设备,该用户设备可以直接接收来自用户的输入并直接由用户设备本身的硬件进行处理,具体过程与图1相似,可参考上面的描述,在此不再赘述。FIG2 shows another insight data generation system. In FIG2 , the user device can be directly used as a data processing device. The user device can directly receive input from the user and process it directly by the hardware of the user device itself. The specific process is similar to that of FIG1 . Please refer to the above description and will not be repeated here.
图2中的用户设备可以是云服务器、网络服务器、应用服务器或管理服务器等具有数据处理功能的服务器,也可以是台式计算机、移动计算机、平板计算设备或移动通信设备等具有数据处理功能的电子设备。The user device in FIG. 2 may be a server with data processing capabilities such as a cloud server, a network server, an application server or a management server, or may be an electronic device with data processing capabilities such as a desktop computer, a mobile computer, a tablet computing device or a mobile communication device.
在图2所示的见解数据生成系统中,用户设备可以接收用户选取可视化图表中一个或者多个图表可视化元素的指令,然后由用户设备自身发起请求,对选取的一个或者多个图表可视化元素进行数据分析,从而生成一个或者多个图表可视化元素的见解数据。In the insight data generation system shown in Figure 2, the user device can receive an instruction from the user to select one or more chart visualization elements in a visualization chart, and then the user device itself initiates a request to perform data analysis on the selected one or more chart visualization elements, thereby generating insight data for the one or more chart visualization elements.
在图2中,用户设备自身就可以执行本申请实施例的见解数据生成方法。In FIG. 2 , the user device itself can execute the insight data generating method of the embodiment of the present application.
在本申请的实施例中,图1和图2中的处理器可以根据业务需求进行数据分析。例如,根据业务需求做图表的见解分析,支持多种不同的分析模式,包含统计值特征分析、分布特征分析、空值告警分析、零值告警分析、高关联性度量分析、全局-子集差异性分析等,可以从统计分析和传统机器学习层面,对于不同类型的见解检测不同类别的特征,并定制化地生成多样的特征描述,从而得到用户筛选的图表可视化元素背后的兴趣数据的见解分析。In an embodiment of the present application, the processors in FIG. 1 and FIG. 2 can perform data analysis according to business needs. For example, according to business needs, the insight analysis of the chart is performed, and a variety of different analysis modes are supported, including statistical value feature analysis, distribution feature analysis, null value alarm analysis, zero value alarm analysis, high correlation measurement analysis, global-subset difference analysis, etc., and different types of insights can be detected from the statistical analysis and traditional machine learning levels. Features of different categories are generated, and a variety of feature descriptions are customized to obtain insight analysis of the interest data behind the visualization elements of the chart screened by the user.
如图3所示,本申请实施例提供了一种系统架构100。系统架构100可包括执行设备110、数据库130、客户设备140、数据存储系统150以及数据采集设备160。应理解,图1仅为示意,可选地,系统架构中可以包括更多或更少的数据库和执行设备,或者其他功能模块。As shown in FIG3 , an embodiment of the present application provides a system architecture 100. The system architecture 100 may include an execution device 110, a database 130, a client device 140, a data storage system 150, and a data acquisition device 160. It should be understood that FIG1 is only for illustration, and optionally, the system architecture may include more or fewer databases and execution devices, or other functional modules.
在图3中,数据采集设备160可用于采集图表数据,本申请实施例中图表数据可用于生成包含图表可视化元素的可视化图表。在采集到图表数据之后,数据采集设备160将这些数据存入数据库130。需要说明的是,在实际的应用中,数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的,例如也可以是从云端或其他地方直接获取。另外需要说明的是,执行设备110也不一定完全基于数据库130维护的训练数据进行见解的生成,也有可能从云端或其他地方获取数据生成见解,上述描述不应该作为对本申请实施例的限定。In Figure 3, the data acquisition device 160 can be used to collect chart data, and in the embodiment of the present application, the chart data can be used to generate a visual chart containing chart visualization elements. After collecting the chart data, the data acquisition device 160 stores the data in the database 130. It should be noted that in actual applications, the training data maintained in the database 130 does not necessarily all come from the collection of the data acquisition device 160, and may also be received from other devices, for example, it may also be directly obtained from the cloud or other places. It should also be noted that the execution device 110 does not necessarily generate insights based entirely on the training data maintained by the database 130, and it is also possible to obtain data from the cloud or other places to generate insights. The above description should not be used as a limitation on the embodiments of the present application.
可选地,数据库可以是硬件设备,可以集成在执行设备110中,也可以设置在云上或者其他网络服务器上。Optionally, the database may be a hardware device, may be integrated in the execution device 110, or may be set up on a cloud or other network server.
可视化图表和见解数据的生成可以应用于不同的系统或设备中,如应用于图3所示的执行设备110上并呈现在应用界面120上。执行设备110可以是图1中的数据处理设备,可以是终端,如手机终端、平板电脑、笔记本电脑、AR/VR或车载终端等,还可以是服务器或者云端等。在图3中,执行设备110可配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互。用户可以通过客户设备140向I/O接口112输入数据,输入数据在本申请实施例中可以包括:选取一个或者多个图表可视化元素的指令以及图表可视化元素对应的可视化图表的维度和度量。执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的输入数据等存入数据存储系统150中。The generation of visual charts and insight data can be applied to different systems or devices, such as being applied to the execution device 110 shown in FIG. 3 and presented on the application interface 120. The execution device 110 can be the data processing device in FIG. 1, can be a terminal, such as a mobile terminal, a tablet computer, a laptop computer, an AR/VR or a vehicle-mounted terminal, etc., can also be a server or a cloud, etc. In FIG. 3, the execution device 110 can be configured with an input/output (I/O) interface 112 for data interaction with an external device. The user can input data to the I/O interface 112 through the client device 140, and the input data can include: instructions for selecting one or more chart visualization elements and dimensions and metrics of the visualization charts corresponding to the chart visualization elements in the embodiment of the present application. The execution device 110 can call the data, code, etc. in the data storage system 150 for corresponding processing, and can also store the input data obtained by the corresponding processing in the data storage system 150.
最后,I/O接口112将处理结果,例如,将生成的见解数据反馈给客户设备140。客户设备也可以是图3中的执行设备110,反馈的见解数据呈现在执行设备的应用界面120上。Finally, the I/O interface 112 feeds back the processing result, for example, the generated insight data, to the client device 140. The client device may also be the execution device 110 in FIG3, and the fed-back insight data is presented on the application interface 120 of the execution device.
执行设备110中包括应用界面120,可选的,应用界面120可以是被本地存储在执行设备110上的客户端应用的界面,也可以是位于远端服务器上并且通过网络(诸如因特网或内联网)可访问的客户端应用的界面,例如可以在浏览器控制的环境中被托管或以浏览器支持的语言被编码,并且依靠网络浏览器来执行数据计算的应用界面。The execution device 110 includes an application interface 120. Optionally, the application interface 120 can be an interface of a client application stored locally on the execution device 110, or it can be an interface of a client application located on a remote server and accessible through a network (such as the Internet or an intranet). For example, it can be an application interface that is hosted in a browser-controlled environment or coded in a language supported by the browser and relies on a web browser to perform data calculations.
应用界面120可以包括可视化图表界面121和见解数据界面125,也可以通过多个应用界面呈现可视化图表界面121和见解数据界面125。The application interface 120 may include a visualization chart interface 121 and an insight data interface 125 , or the visualization chart interface 121 and the insight data interface 125 may be presented through multiple application interfaces.
可视化图表界面121可以包括一个或者多个不同类型的图表以及界面配置信息,界面配置信息可以包括维度选项、度量选项、图表界面设置模块等模块或者用于选择绘制图表的轴配置信息、图表原始数据等元素。应理解,图3仅为示例,可选的,可视化图表界面121中还包括更多的选择模块,例
如图表类型选择模块。The visualization chart interface 121 may include one or more different types of charts and interface configuration information. The interface configuration information may include modules such as dimension options, measurement options, and chart interface setting modules, or elements such as axis configuration information for selecting charts to be drawn, and chart raw data. It should be understood that FIG. 3 is only an example. Optionally, the visualization chart interface 121 may also include more selection modules, such as Select the module as chart type.
见解界面125可以包括一个或者多个见解数据126、127,见解数据126和见解数据127可以包括见解图表或者见解文字。见解数据根据可视化图表界面121中的图表122或图表123得到,见解数据界面还可以包括见解模式选择模块或者用于选择见解数据生成的分析类型。分析类型可以是分布特征分析、空值告警分析、零值告警分析、高关联性度量分析、全局-子集差异性分析等,也可以是定制化的特征分析。形成的分析结果可以通过不同的图表类型以及对应的文字见解信息描述,并展示在见解图表或者见解文字中。应理解,图3仅为示例,可选的,见解界面125中包括更多的模块,例如见解数据排序模块。The insight interface 125 may include one or more insight data 126, 127, and the insight data 126 and the insight data 127 may include insight charts or insight texts. The insight data is obtained according to the chart 122 or the chart 123 in the visualization chart interface 121, and the insight data interface may also include an insight mode selection module or an analysis type for selecting insight data generation. The analysis type may be distribution feature analysis, null value alarm analysis, zero value alarm analysis, high correlation metric analysis, global-subset difference analysis, etc., or may be a customized feature analysis. The analysis results formed may be described by different chart types and corresponding text insight information, and displayed in insight charts or insight texts. It should be understood that FIG. 3 is only an example, and optionally, the insight interface 125 includes more modules, such as an insight data sorting module.
值得注意的是,图3仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在图3中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。可选地,系统架构中还可以包括其他模块,例如图表绘制模块。可选地,可视化图表界面和见解界面可以不在同一个应用界面中。本申请实施例可应用的场景不限于图3所示。It is worth noting that Figure 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 3, the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110. Optionally, other modules may also be included in the system architecture, such as a chart drawing module. Optionally, the visual chart interface and the insight interface may not be in the same application interface. The scenarios to which the embodiments of the present application can be applied are not limited to those shown in Figure 3.
当下商业智能分析平台中图表的自动化智能见解生成应用场景中,单点数据相关的见解数据分析能够帮助用户构建、浏览和分析数据时,可以检查、发现和深入了解可视化图表中的单个图表可视化元素。但是,当用户想要分析含有多个图表可视化元素的局部数据子集时,未被选取的图表可视化元素的数据记录会对选取的单个图表可视化元素形成的见解数据分析造成干扰,造成多次选取图表单个图表可视化元素形成的总的见解分析的准确性难以保证,交互代价较高。In the current application scenario of automatic intelligent insight generation of charts in business intelligence analysis platforms, the insight data analysis related to single-point data can help users to inspect, discover and gain in-depth understanding of individual chart visualization elements in a visualization chart when building, browsing and analyzing data. However, when users want to analyze a local data subset containing multiple chart visualization elements, the data records of unselected chart visualization elements will interfere with the insight data analysis formed by the selected single chart visualization element, making it difficult to ensure the accuracy of the overall insight analysis formed by selecting multiple chart visualization elements, and the interaction cost is high.
示例性地,若图表中存在3个呈现相同或者相似现象的异常值,这些异常值是用户的兴趣数据,用户想要得到该3个异常值的形成原因的见解数据。用户若只选取其中1个异常值进行见解数据生成时,另外两个异常值也参与了见解数据的分析过程。进而见解数据可能存在偏差,例如被选取的异常值可能由于另外两个异常值的存在而被判定为正常值。因此,未被选取的图表可视化元素可以对选取的单个图表可视化元素形成的见解数据造成干扰。For example, if there are three outliers in a chart that present the same or similar phenomena, these outliers are the user's interest data, and the user wants to obtain insight data on the causes of the three outliers. If the user only selects one of the outliers to generate insight data, the other two outliers also participate in the analysis process of the insight data. As a result, the insight data may be biased, for example, the selected outlier may be judged as a normal value due to the presence of the other two outliers. Therefore, the unselected chart visualization elements may interfere with the insight data formed by the selected single chart visualization element.
因此,见解自动生成相关技术产品中从数据单点的分析粒度,缺失了基于用户提供交互关注的批量选择数据局部的辅助见解生成方案。同时其中支持的交互和分析自由度均仍有所欠缺,仍有进一步改良优化的空间。Therefore, the technical products related to automatic insight generation lack the auxiliary insight generation solution of batch selection of local data based on user-provided interactive attention, which is based on the analysis granularity of single data points. At the same time, the interaction and analysis freedom supported are still lacking, and there is still room for further improvement and optimization.
有鉴于此,本申请实施例提供了一种见解数据生成方案。图4示出了本申请实施例提供的一种见解数据生成的方法400的示意性流程框图。图4的方法可以由图1的数据处理设备执行或者图2的用户设备执行。In view of this, an embodiment of the present application provides a solution for generating insight data. Figure 4 shows a schematic flowchart of a method 400 for generating insight data provided by an embodiment of the present application. The method of Figure 4 can be executed by the data processing device of Figure 1 or the user device of Figure 2.
步骤410:呈现第一图表,第一图表包括M个图表可视化元素,每个图表可视化元素对应于数据源中至少一个数据记录。Step 410: Present a first chart, the first chart comprising M chart visualization elements, each chart visualization element corresponding to at least one data record in a data source.
可选地,第一图表可以被呈现在应用界面120中的可视化图表界面121中,也可以被呈现在任一可视化界面中。绘制第一图表的数据记录可以是数据库130中的数据的全部或者一部分,也可以是任意数据源中一个或者多个表格中的全部或者部分数据。Optionally, the first chart may be presented in the visualization chart interface 121 in the application interface 120, or in any visualization interface. The data records for drawing the first chart may be all or part of the data in the database 130, or all or part of the data in one or more tables in any data source.
第一图表包括M个图表可视化元素,例如柱状图的柱子、散点图的离散数据点、折线图的数据点及相邻折线、饼状图或者圆环图的扇面等数据记录的图形表示。每个图表可视化元素由数据记录绘制而成。单个图表可视化元素可以对应于单个数据记录,也可以对应于多个数据记录的聚合值,即多个数据记录经过一些计算操作后最终生成的汇总值或总计值,例如求和聚合、均值聚合等。The first chart includes M chart visualization elements, such as bars of a bar chart, discrete data points of a scatter plot, data points and adjacent lines of a line chart, sectors of a pie chart or a donut chart, and other graphical representations of data records. Each chart visualization element is drawn by a data record. A single chart visualization element may correspond to a single data record, or may correspond to an aggregate value of multiple data records, that is, a summary value or total value finally generated after some calculation operations are performed on multiple data records, such as sum aggregation, mean aggregation, etc.
步骤420:确认从M个图表可视化元素中选择的N个图表可视化元素,其中M和N为大于1的正整数,且M大于或等于N。Step 420 : confirming N chart visualization elements selected from the M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N.
本实施例技术方案可支持对于待分析可视化图表的N个图表可视化元素批量选取后的分析和见解数据生成。不同于仅支持单图表可视化元素进行分析,步骤420可以支持选取多个图表可视化元素的交互方式,并确认选取的多个图表可视化元素。The technical solution of this embodiment can support the analysis and insight data generation after batch selection of N chart visualization elements of the visualization chart to be analyzed. Different from only supporting the analysis of a single chart visualization element, step 420 can support the interactive mode of selecting multiple chart visualization elements and confirm the selected multiple chart visualization elements.
可选地,本申请技术方案中的步骤420也可以支持选取单个图表可视化元素的交互方式,并确认选取的单个图表可视化元素。Optionally, step 420 in the technical solution of the present application may also support an interactive mode of selecting a single chart visualization element and confirming the selected single chart visualization element.
示例性地,用户可以通过应用界面的可视化图表界面点击刷选一个或者N个图表可视化元素。例如,该应用界面为台式计算机的电子表格应用的界面,用户可以用鼠标拖动刷选进而生成一个选择框,选择框中的一个或者N个图表可视化元素被确定为用户选择的图表可视化元素,图表可视化元素可以
是柱状图中的一个或N个柱子,也可以是折线图或散点图中的一个或N个数据点,也可以是饼状图或圆环图中的一个或N个扇面。For example, the user can click and select one or N chart visualization elements through the visualization chart interface of the application interface. For example, the application interface is an interface of a spreadsheet application on a desktop computer, and the user can drag and select with the mouse to generate a selection box, and one or N chart visualization elements in the selection box are determined as the chart visualization elements selected by the user. The chart visualization elements can be It can be one or N bars in a bar chart, one or N data points in a line chart or scatter chart, or one or N sectors in a pie chart or donut chart.
可选地,用户选取一个或者N个图表可视化元素的交互方式也可以是点击单选或者多选。例如,用户可以用鼠标同时点击多选一个或者N个的图表可视化元素,点选的一个或者N个图表可视化元素被确定为用户选择的图表可视化元素。Optionally, the user may select one or N chart visualization elements by clicking on a single selection or multiple selections. For example, the user may click on one or N chart visualization elements at the same time with a mouse, and the selected one or N chart visualization elements are determined as the chart visualization elements selected by the user.
可选地,选取的图表可视化元素在图表x轴的维度上可以是不连续的,被选取的图表可视化元素之间可以相隔一个或者多个图表可视化元素。Optionally, the selected chart visualization elements may be discontinuous in the dimension of the x-axis of the chart, and the selected chart visualization elements may be separated by one or more chart visualization elements.
可选地,该技术方案支持的关注数据可以批量支持在多种不同的图表上进行,并在图表类型切换时保留用户刷选的高亮,保证用户进行见解生成的始终高亮。Optionally, the focus data supported by this technical solution can be batch supported on a variety of different charts, and the highlights selected by the user can be retained when the chart type is switched, ensuring that the highlights are always used for user insight generation.
应理解,步骤420中的批量选取方式能够在多种不同的图表进行,例如柱状图、折线图或散点图等,被刷选的数据相对于未被刷选的数据会被高亮,在对数据进行分析图表的类型切换时会保持高亮。例如,在对同一组数据绘制图表时,在柱状图上刷选出用户感兴趣的图表可视化元素,图表可视化元素进而高亮,当用户将柱状图切换为折线图时,图表可视化元素依然会保持始终高亮。在用户对已生成并进行刷选可视化元素后的柱状图进行图表类型切换时,包含被刷选的可视化元素对应的数据记录的可视化元素也会在新的图表中高亮展示。It should be understood that the batch selection method in step 420 can be performed on a variety of different charts, such as bar charts, line charts, or scatter charts, and the data that is brushed will be highlighted relative to the data that is not brushed, and will remain highlighted when the type of chart for analyzing the data is switched. For example, when drawing a chart for the same set of data, the chart visualization elements that the user is interested in are brushed on the bar chart, and the chart visualization elements are then highlighted. When the user switches the bar chart to a line chart, the chart visualization elements will still remain highlighted. When the user switches the chart type for the bar chart that has been generated and the visualization elements have been brushed, the visualization elements containing the data records corresponding to the brushed visualization elements will also be highlighted in the new chart.
步骤430:确定N个图表可视化元素对应的所有K个数据记录,其中K为大于1的正整数。Step 430: Determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1.
可选地,确认N个图表可视化元素对应的全部K个数据记录的方式可以是确定用户交互选择可视化图表的图表可视化元素的交互形式。例如,用户交互时的交互形式可以是用户进行刷选交互时的操作,用户沿着图表的横轴进行批量刷选操作时,其交互形式是刷选横轴维度在特定取值时的图表可视化元素数据,图表可视化元素绑定的维度数据可以是图表可视化元素数据对应的维度的特定取值,分析粒度可以是维度的具体类别。进而,根据该交互形式,用户关注的是图表的横轴对应的维度中特定取值时图表可视化元素的特征,本方案后续的见解数据也将关注于该图表的横轴维度。Optionally, the method of confirming all K data records corresponding to N chart visualization elements may be to determine the interactive form of the chart visualization elements of the user interactively selecting the visualization chart. For example, the interactive form of the user interaction may be the operation of the user performing a swiping interaction. When the user performs a batch swiping operation along the horizontal axis of the chart, the interactive form is to swipe the chart visualization element data when the horizontal axis dimension takes a specific value. The dimension data bound to the chart visualization element may be a specific value of the dimension corresponding to the chart visualization element data, and the analysis granularity may be a specific category of the dimension. Furthermore, according to this interactive form, the user is concerned about the characteristics of the chart visualization element when the dimension corresponding to the horizontal axis of the chart takes a specific value, and the subsequent insight data of this solution will also focus on the horizontal axis dimension of the chart.
示例性地,如果第一图表的类型是柱状图,x轴维度是时间,用户执行的交互操作是沿着可视化图表的x轴方向刷选柱状图的三个柱状图表可视化元素。其中三个柱状图表可视化元素对应的时间段组成字段A,则该方案分析出用户的交互操作是沿x轴刷选,可视化图表x轴绑定的维度数据字段是某时间类型的字段A,分析粒度为具体的时间类型,例如月份、时间段等。根据维度数据字段,从数据源或者数据库中筛选出维度数据字段A以及具体的时间类型对应的全部数据记录。For example, if the type of the first chart is a bar chart, the x-axis dimension is time, and the interactive operation performed by the user is to swipe the three bar chart visualization elements of the bar chart along the x-axis direction of the visualization chart. The time periods corresponding to the three bar chart visualization elements constitute field A, then the solution analyzes that the user's interactive operation is swiping along the x-axis, the dimension data field bound to the x-axis of the visualization chart is field A of a certain time type, and the analysis granularity is a specific time type, such as month, time period, etc. According to the dimension data field, all data records corresponding to the dimension data field A and the specific time type are filtered out from the data source or database.
可选地,确认N个图表可视化元素对应的全部K个数据记录的方式可以是直接提取交互选择的图表可视化元素对应的数据点的枚举值,数据点的枚举值可以是图表的图表可视化元素对应的字段的具体取值。例如,图表中的维度为月份时,枚举值可以是选取的图表可视化元素对应的月份1到月份12的一个或者多个不同取值,也可以是分析选中的图表可视化元素中横轴或图例中绑定的数据集外部维度信息取值与月份取值的组合。确定图表中图表可视化元素相关的维度组合,可包括未参与绘制第一图表的外部维度的具体取值和参与绘制第一图表的维度的被选取的多个枚举值,也可以包括该图表可视化元素相关的维度组合以及度量,或者图表可视化元素其他的相关信息。Optionally, a method for confirming all K data records corresponding to N chart visualization elements may be to directly extract the enumeration value of the data point corresponding to the interactively selected chart visualization element, and the enumeration value of the data point may be the specific value of the field corresponding to the chart visualization element of the chart. For example, when the dimension in the chart is month, the enumeration value may be one or more different values of month 1 to month 12 corresponding to the selected chart visualization element, or it may be a combination of the value of the external dimension information of the data set bound to the horizontal axis or legend in the selected chart visualization element and the value of the month. Determining the dimension combination related to the chart visualization element in the chart may include the specific value of the external dimension that is not involved in drawing the first chart and the selected multiple enumeration values of the dimension that participates in drawing the first chart, and may also include the dimension combination and measurement related to the chart visualization element, or other relevant information of the chart visualization element.
本申请的技术方案可将包含的信息整合生成过滤逻辑,用于筛选查找数据集或者数据库中的数据原始记录。本方案后续的见解数据也将关注于筛选出的精细粒度原始记录和图表可视化元素相关的维度或者维度组合。示例性地,N个图表可视化元素对应的维度的枚举值为维度A、B和C三个不同字段时,过滤第一图表数据的逻辑可以是不同取值的A或B或C的或逻辑组合。The technical solution of the present application can integrate the information contained to generate filtering logic for filtering and searching data sets or original data records in databases. The subsequent insight data of this solution will also focus on the dimensions or dimension combinations related to the filtered fine-grained original records and chart visualization elements. Exemplarily, when the enumeration values of the dimensions corresponding to N chart visualization elements are three different fields of dimensions A, B, and C, the logic for filtering the first chart data can be a logical combination of A or B or C with different values.
应理解,上述流程只是一个十分简单的情景,本方案也可以支持多个不同的维度字段联合生成图表的x轴,还可以支持图例字段以及图表本身配置叠加复杂过滤条件,过滤逻辑是由多个嵌套的过滤逻辑模块复合生成。It should be understood that the above process is only a very simple scenario. This solution can also support multiple different dimension fields to jointly generate the x-axis of the chart, and can also support the legend field and the configuration of the chart itself to superimpose complex filtering conditions. The filtering logic is generated by multiple nested filtering logic modules.
可选地,本方案可以将用户在第一图表上交互选择的一个或者N个图表可视化元素(对应到原始数据集记录中的一行或者部分行的聚合结果),转化为查询请求进而查询交互选择的图表可视化元素在原始数据集或者数据库中的全部K个数据记录,并将该全部K个数据记录用于后续的见解。Optionally, this solution can convert one or N chart visualization elements interactively selected by the user on the first chart (corresponding to the aggregated results of one row or part of the rows in the original data set records) into query requests to further query all K data records of the interactively selected chart visualization elements in the original data set or database, and use all K data records for subsequent insights.
可选地,查询请求可以是请求字段的所有信息、过滤操作符列表、过滤枚举值列表和过滤逻辑中的任意组合,请求的对象可以是后端模块。例如,结构化查询语言(Structured Query Language,SQL)支持生成where子句,进行原始表记录查询,向算法模块返回用户选取的图表可视化元素对应的兴趣数据子集。
Optionally, the query request may be any combination of all information of the request field, a list of filter operators, a list of filter enumeration values, and filter logic, and the object of the request may be a backend module. For example, the Structured Query Language (SQL) supports the generation of a where clause, performs a query on the original table records, and returns the subset of interest data corresponding to the chart visualization element selected by the user to the algorithm module.
示例性地,在本申请的一个实施例中,交互操作指导生成的用户兴趣数据过滤是参与分析的数据集中满足维度A字段的所有记录,则本方案对应最终生成SQL查询语句中即为where子句中多个维度A字段基于IN操作符实现的或逻辑的复合实现。Exemplarily, in one embodiment of the present application, the user interest data filtering generated by the interactive operation guidance is all records that satisfy the dimension A field in the data set participating in the analysis. Then, this solution corresponds to the final generated SQL query statement, which is a composite implementation of multiple dimension A fields in the where clause based on the IN operator or logic.
可选地,该方案中不止于单一数据源的单表分析查询,还可以支持查找原始数据源中的多关联数据表查询,可以支持联邦查询和后续分析。示例性地,在本申请的一个实施例中,该方案的功能底层基于分布式SQL查询引擎,进而将多表融合到数据集层面,从而获取数据。Optionally, the solution is not limited to single-table analysis queries of a single data source, but can also support queries for multiple related data tables in the original data source, and can support federated queries and subsequent analysis. Exemplarily, in one embodiment of the present application, the functional bottom layer of the solution is based on a distributed SQL query engine, which then merges multiple tables into a data set level to obtain data.
步骤440:基于K个数据记录,进行联合数据分析,以生成N个图表可视化元素的第一见解数据。Step 440: Perform joint data analysis based on the K data records to generate first insight data for N chart visualization elements.
应理解,K个数据记录的联合数据分析过程不同于选取单个图表可视化元素的数据分析过程以及选取多个图表可视化元素后先进行单个图表可视化元素的数据分析再进行整合数据分析信息的分析过程。联合数据分析可将K个数据记录作为一个整体,同时与数据集中的其他数据进行分析,进而生成N个图表可视化元素或者N个图表可视化元素中至少两个图表可视化元素与其他数据记录对比而得到的见解数据。It should be understood that the joint data analysis process of K data records is different from the data analysis process of selecting a single chart visualization element and the analysis process of selecting multiple chart visualization elements and then performing data analysis on a single chart visualization element and then integrating data analysis information. The joint data analysis can analyze the K data records as a whole and with other data in the data set, thereby generating N chart visualization elements or at least two of the N chart visualization elements compared with other data records to obtain insight data.
可选地,将K个数据记录作为一个整体,确定K个数据记录中每个数据记录具有的关联关系,关联关系确定了K个数据记录共有的特征信息。例如,该K个数据记录可以是具有相同或者相似的外部维度,也可以是具有相关性关系的数据记录,也可以是呈现相同或者相反的度量值现象,共有的特征信息即是K个数据记录对应的外部维度或者相关性关系分析数据或者度量值现象。根据共有的特征信息,筛选出数据源中具有共有特征信息的数据记录,将K个数据记录和该具有共有特征信息的数据记录进行数据分析,以形成N个图表可视化元素的见解数据。Optionally, the K data records are taken as a whole, and the association relationship of each data record in the K data records is determined, and the association relationship determines the characteristic information shared by the K data records. For example, the K data records may have the same or similar external dimensions, or may be data records with a correlation relationship, or may present the same or opposite measurement value phenomena, and the shared characteristic information is the external dimension or correlation relationship analysis data or measurement value phenomenon corresponding to the K data records. Based on the shared characteristic information, the data records with shared characteristic information in the data source are screened out, and the K data records and the data records with shared characteristic information are subjected to data analysis to form insight data of N chart visualization elements.
可选地,K个数据记录中可以有L个数据记录具有共有的特征信息,该L个数据记录对应于至少两个图表可视化元素,其中L为大于1的正整数,且K大于L。根据该L个数据记录及其共有的特征信息以及用于对比的数据记录,生成L个数据记录对应的至少两个图表可视化元素的见解数据。用于对比的数据记录可以是步骤420中M个图表可视化元素中未被选取的(M-N)个图表可视化元素对应的数据记录,也可以是K个数据记录中未被选取的(K-L)个数据记录。Optionally, L of the K data records may have common feature information, and the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than L. According to the L data records and their common feature information and the data records for comparison, insight data of at least two chart visualization elements corresponding to the L data records are generated. The data records for comparison may be data records corresponding to (M-N) chart visualization elements that are not selected from the M chart visualization elements in step 420, or may be (K-L) data records that are not selected from the K data records.
本申请技术方案可以实现同时多个图表可视化元素及其原始数据记录的同时分析,得到包含多个图表可视化元素关联信息的见解数据。本申请技术方案生成的见解数据不同于选取单个图表可视化元素进行分析以及多次选取单个图表可视化元素进行分析再进行整合的见解数据,降低了见解数据分析过程中未被选取但是存在关联关系的图表可视化元素对被选取的图表可视化元素造成的干扰。The technical solution of the present application can realize simultaneous analysis of multiple chart visualization elements and their original data records, and obtain insight data containing the association information of multiple chart visualization elements. The insight data generated by the technical solution of the present application is different from the insight data obtained by selecting a single chart visualization element for analysis and selecting a single chart visualization element for analysis multiple times and then integrating it, which reduces the interference of chart visualization elements that are not selected but have an associated relationship on the selected chart visualization elements during the analysis of insight data.
应理解,生成L个数据记录对应的至少两个图表可视化元素的见解数据的次数可以不止一次,L的取值也可以不同,最终可以形成多个不同的数据候选集。这些不同的数据候选集可以通过策略或者算法分析形成多个不同的见解数据,数据候选集作为策略或者算法分析的子空间。It should be understood that the number of times the insight data of at least two chart visualization elements corresponding to L data records are generated may be more than once, and the value of L may be different, and finally a plurality of different data candidate sets may be formed. These different data candidate sets may be analyzed by strategies or algorithms to form a plurality of different insight data, and the data candidate sets serve as subspaces for strategy or algorithm analysis.
本申请将该数据子集中的数据候选集进行见解数据生成策略或者算法的分析,生成能够展现该子空间数据特征的见解数据。The present application analyzes the data candidate set in the data subset according to the insight data generation strategy or algorithm to generate insight data that can demonstrate the characteristics of the subspace data.
在本申请的一个实施例中,见解数据生成算法输入可以包括全量的全表数据记录、用户交互选择的可视化原始图表可视化元素对应产生的筛选条件和查询请求产生的关注数据原始记录、数据原始记录共有的特征信息以及前端支持用户交互配置的算法参数。算法产出的见解数据结果,可以包含绘制见解图表所需的图表原始数据、图表类型信息、轴配置信息或者文字见解描述信息等。In one embodiment of the present application, the insight data generation algorithm input may include the full amount of full table data records, the screening conditions corresponding to the visualization elements of the original visualization chart selected by the user interaction, the original records of the data of interest generated by the query request, the common feature information of the original data records, and the algorithm parameters configured by the front-end user interaction. The insight data results produced by the algorithm may include the original chart data, chart type information, axis configuration information, or text insight description information required to draw the insight chart.
可选地,见解数据生成算法可以支持多种分析模式,例如统计值特征分析、分布特征分析、空值告警分析、零值告警分析、高关联性度量分析、全局-子集差异性分析等,也可以支持从统计分析和传统机器学习层面,对于不同类型的见解检测不同类别的特征,并定制化地生成多样的特征描述。Optionally, the insight data generation algorithm can support multiple analysis modes, such as statistical value feature analysis, distribution feature analysis, null value warning analysis, zero value warning analysis, high correlation measurement analysis, global-subset difference analysis, etc. It can also support the detection of different categories of features for different types of insights from the statistical analysis and traditional machine learning levels, and generate a variety of customized feature descriptions.
可选地,对于不同类型的见解数据,见解数据生成算法可以适应地采用不同的图表类型进行展示,例如分布图表采用柱状图、关联性的度量图表灵活地基于数据分布选用散点图对数轴和线性轴等。见解数据的文字见解描述信息也具有差异,文字描述可以是见解具有的特征描述,可以是具有各特征组成而成的可能存在的先验模式分析,也可以是两种的组合或者其他能够解释特征的描述。Optionally, for different types of insight data, the insight data generation algorithm can adaptively use different chart types for display, such as using a bar chart for distribution charts, and flexibly using a scatter plot logarithmic axis and a linear axis based on data distribution for correlation measurement charts. The textual insight description information of insight data also varies. The textual description can be a description of the characteristics of the insight, a possible priori pattern analysis composed of various characteristics, a combination of the two, or other descriptions that can explain the characteristics.
本申请的一个实施例中,见解数据生成算法可以产出多类不同的见解类型,支持分析图表内部、外部的度量或维度对于产生用户兴趣选择的模式的贡献,引导用户探索选取的图表可视化元素对应的数据记录与数据集中关联数据的分析内容。In one embodiment of the present application, the insight data generation algorithm can produce multiple different types of insights, support the contribution of metrics or dimensions inside and outside the analysis chart to the generation of patterns of user interest selection, and guide users to explore the analysis content of data records corresponding to the selected chart visualization elements and related data in the data set.
应理解,其中提及的内部和外部概念,指的是参与分析的维度和度量是否参与构成用户当前分析的图表绘制,而考虑外部的维度和度量的特征和贡献可以帮助用户发现与刷选的图表可视化元素相关
联的数据聚合或者感兴趣的子空间。It should be understood that the concepts of internal and external refer to whether the dimensions and measures involved in the analysis are involved in the drawing of the chart that the user is currently analyzing. Considering the characteristics and contributions of external dimensions and measures can help users discover the relevant information related to the selected chart visualization elements. data aggregations or subspaces of interest.
示例性地,算法生成的见解类型可以包括图表度量聚合展开分析、外部维度有效记录数分析、外部维度分布贡献分析、外部维度子空间内部特征分析和外部高可解释度度量分析等,但是并不局限于这几类见解类型。Exemplarily, the types of insights generated by the algorithm may include chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, external dimension subspace internal feature analysis, and external high interpretability metric analysis, but are not limited to these types of insights.
可选地,图表度量聚合展开分析可以关注于将具有特征的可视化图表可视化元素的绑定的度量聚合值拆解为原始的数据分布构成,帮助用户理解聚合值的构成情况。例如常见的分析图表中的纵轴为度量的求和聚合,用户关注到具有较高聚合值的图表可视化元素,该类见解数据可以帮助用户理解异常聚合值的构成,例如是单个原始异常记录,或是整体的分布均具有一定的偏向性。Optionally, the chart metric aggregation expansion analysis can focus on decomposing the bound metric aggregation values of the characteristic visualization chart visualization elements into the original data distribution composition, helping users understand the composition of the aggregation value. For example, the vertical axis in a common analysis chart is the sum aggregation of metrics, and the user pays attention to the chart visualization elements with higher aggregation values. This type of insight data can help users understand the composition of abnormal aggregation values, such as a single original abnormal record, or the overall distribution has a certain bias.
可选地,外部维度有效记录数分析可以关注于探寻用户交互选择的数据记录,在其他外部维度(未参与图表绘制)上的有效记录数分布情况,用来分析用户选择的可视化图表图表可视化元素聚合值表现出特定的模式的潜在原因。若发现特定模式对应的原始数据记录在某个维度上聚合出现在某个特定的取值子空间,该方法认为该聚合对于此模式呈现有较大关联。Optionally, the analysis of the number of valid records in the external dimension can focus on exploring the data records selected by the user interactively, and the distribution of the number of valid records in other external dimensions (not involved in the chart drawing) to analyze the potential reasons why the aggregate values of the visualization elements of the visualization chart selected by the user show a specific pattern. If it is found that the original data records corresponding to a specific pattern are aggregated in a certain dimension in a certain value subspace, this method believes that the aggregation has a greater correlation with the presentation of this pattern.
可选地,外部维度分布贡献分析可以关注于探寻用户交互选择的数据记录,在其他外部维度(未参与图表绘制)上的对于用户关注的图表度量的贡献度分布。该类解释本质上将聚合值沿图表外部另一方向进行拆解,发现潜在的高贡献维度取值,供用户进行进一步探索,当用户发现兴趣的维度子空间,可以进一步使用数据解释子空间分布探索功能,查看子空间内的详细分布。Optionally, external dimension distribution contribution analysis can focus on exploring the contribution distribution of data records selected by users in other external dimensions (not involved in chart drawing) to the chart metrics that users are interested in. This type of explanation essentially disassembles the aggregated value in another direction outside the chart to find potential high-contribution dimension values for users to further explore. When users find a dimensional subspace of interest, they can further use the data explanation subspace distribution exploration function to view the detailed distribution within the subspace.
可选地,外部维度子空间内部特征分析可以与上述外部维度相关的解释高度相关,可支持自动地搜索并推荐出一些维度取值的子空间,这类子空间内部的数值分布对于用户选择的度量分布具有一定特征。用户可以基于子空间分布进一步使用溯源原始数据记录功能,分析特征模式的源头。Optionally, the internal feature analysis of the external dimension subspace can be highly related to the above-mentioned explanation related to the external dimension, and can support automatic search and recommendation of some subspaces of dimensional values, where the numerical distribution inside such subspaces has certain characteristics for the metric distribution selected by the user. Based on the subspace distribution, the user can further use the traceability original data recording function to analyze the source of the feature pattern.
可选地,外部高可解释度度量分析可以关注于对于用户关注的数据子集中度量的数据模式,分别从全集数据和子集数据中进行高关联度性度量分析,获取一批具有解释性的外部度量候选,并从中进一步分析获取具有较高惊奇度的度量,并通过散点图的模式展示出该度量与用户关注的图表度量的关联性,以期从中探寻出可能的见解数据。Optionally, external highly interpretable metric analysis can focus on data patterns measured in a subset of data that the user is interested in, perform highly correlated metric analysis on the full set of data and the subset of data respectively, obtain a batch of explanatory external metric candidates, and further analyze them to obtain metrics with higher surprise, and display the correlation between the metric and the chart metric that the user is interested in through a scatter plot pattern, in order to explore possible insights from the data.
本申请实施例中的见解类型并不局限于此。本申请的另一个实施例中,算法还可以产出多类不同的见解类型,支持分析选取的图表可视化元素对应的数据记录内部的关联分析。示例性地,算法生成的见解类型可以包括图表可视化元素趋势分析、图表可视化元素聚类分析等,但是本申请实施例并不局限于这几种见解类型。The types of insights in the embodiments of the present application are not limited thereto. In another embodiment of the present application, the algorithm can also generate multiple different types of insights to support the analysis of associations within the data records corresponding to the selected chart visualization elements. Exemplarily, the types of insights generated by the algorithm may include chart visualization element trend analysis, chart visualization element cluster analysis, etc., but the embodiments of the present application are not limited to these types of insights.
可选地,图表可视化元素趋势分析可以关注于选取的图表可视化元素对应的数据记录随着x轴维度变化的走向模式。例如,从数据记录内部整体可能出现的数值高点或者数值低点,获取数据记录可能存在的周期性的变化模式。该图表可视化元素趋势分析还可以用于数据记录的预测等。再例如,当呈现某一特定趋势的数据记录中存在部分异常值,选取图表可视化元素的过程中可以只选取非异常值来进行分析,跳过异常值来提高趋势分析的准确性。Optionally, the trend analysis of the chart visualization element can focus on the trend pattern of the data record corresponding to the selected chart visualization element as the x-axis dimension changes. For example, the periodic change pattern that may exist in the data record is obtained from the numerical high points or numerical low points that may appear in the data record as a whole. The trend analysis of the chart visualization element can also be used for the prediction of data records. For another example, when there are some outliers in the data record showing a specific trend, only non-outliers can be selected for analysis in the process of selecting the chart visualization element, and the outliers can be skipped to improve the accuracy of the trend analysis.
可选地,图表可视化元素聚类分析可以关注于批量选取的多个图表可视化元素对应的数据记录的群聚模式和差异性。例如,该见解类型可以根据数据的内在性质,将一个或者多个图表中的图表可视化元素对应的数据记录分为聚合类,每一聚合类中的数据记录具有相同的特性,不同聚合类的数据记录的特性差别较大。该见解类型可以分析多个数据源中的数据表,尽可能地分类多个图表可视化元素对应的数据记录。Optionally, the cluster analysis of chart visualization elements can focus on the clustering patterns and differences of data records corresponding to multiple chart visualization elements selected in batches. For example, this insight type can classify data records corresponding to chart visualization elements in one or more charts into aggregate classes based on the intrinsic properties of the data, where data records in each aggregate class have the same characteristics, and data records in different aggregate classes have greatly different characteristics. This insight type can analyze data tables in multiple data sources and classify data records corresponding to multiple chart visualization elements as much as possible.
本申请技术方案中数据解释功能的自动产生的见解数据呈现可以采用类似于手风琴的自由展开和收缩的形式,共分为两层。其中第一层手风琴的标题标注了不同见解类别的名称。当用户展开第一层后,第二层则显示该类见解下所有算法推荐的具体见解,用户再次展开后则会具体显示该类见解数据的文字描述和图表绘制。当用户展开某一特定见解数据后,其他见解数据会被收起以保证前端界面的整洁。The automatically generated insight data presentation of the data interpretation function in the technical solution of the present application can be in the form of free expansion and contraction similar to an accordion, and is divided into two layers. The title of the first layer of the accordion marks the names of different insight categories. When the user expands the first layer, the second layer displays the specific insights recommended by all algorithms under this type of insight. When the user expands it again, the text description and chart drawing of this type of insight data will be displayed specifically. When the user expands a specific insight data, other insight data will be folded to ensure the neatness of the front-end interface.
可选地,本技术方案可以支持用户自由的观察每一类不同见解中算法推荐的图表和文字结果,其中算法生成的所有图表同样支持交互选择、高亮展示、图例开关等基本交互方式,优化了用户的探索分析流程体验,也为用户在见解图表的特征子空间进行交互分析提供了可能。Optionally, the present technical solution can support users to freely observe the charts and text results recommended by the algorithm in each type of different insights, where all charts generated by the algorithm also support basic interactive methods such as interactive selection, highlighting, and legend switching, thereby optimizing the user's exploration and analysis process experience, and also providing users with the possibility to conduct interactive analysis in the feature subspace of the insight chart.
可选地,本技术方案可以支持用户将数据解释产生的感兴趣的见解图表导出到仪表盘,与原始图表平级展示,同时在右侧展示见解文字信息。该功能支持关联高亮,即当用户选择到导出到仪表盘上的见解图表时,会同步地高亮产生该见解数据的图表,并高亮展示出母图表产生该见解数据时用户筛
选的兴趣数据。Optionally, this technical solution can support users to export the insight charts of interest generated by data interpretation to the dashboard, and display them at the same level as the original charts, while displaying the insight text information on the right. This function supports associated highlighting, that is, when the user selects the insight chart exported to the dashboard, the chart that generated the insight data will be highlighted synchronously, and the user's filter information when the parent chart generated the insight data will be highlighted. Selected interest data.
可选地,本技术方案还可以应用于云环境场景下,可以兼容所在的微服务中的洞察保存相关功能,可以与普通图表一样被保存、预览、加载。Optionally, this technical solution can also be applied in cloud environment scenarios, and can be compatible with insight saving related functions in the microservices where it is located, and can be saved, previewed, and loaded like ordinary charts.
步骤410-440的本技术方案可以有效产出准确的启发性的见解,但是提供的数据解释操作时,对用户观察到局部数据子集无法进行后续分析,一定程度上限制了用户交互探索的方式。The technical solution of steps 410-440 can effectively produce accurate and inspiring insights, but when providing data interpretation operations, the local data subset observed by the user cannot be subsequently analyzed, which to some extent limits the user's interactive exploration method.
为了避免上述问题,本申请另一个实施例示出了一种见解数据生成的方法450,提供见解子空间的进一步生成,实现见解数据到进一步见解数据的后续分析。该方法包括步骤460-490,下面分别对步骤460-490进行详细描述。To avoid the above problems, another embodiment of the present application shows a method 450 for generating insight data, providing further generation of insight subspaces to achieve subsequent analysis of insight data to further insight data. The method includes steps 460-490, which are described in detail below.
步骤460:呈现第二见解数据中的见解图表,见解图表包括W个图表可视化元素,每个图表可视化元素对应于数据源中至少一个数据记录。Step 460: Present an insight chart in the second insight data, the insight chart comprising W chart visualization elements, each chart visualization element corresponding to at least one data record in the data source.
可选地,呈现的第二见解数据可以是通过选取图表可视化元素时生成的任意见解数据,例如步骤440中的第一见解数据,也可以是根据任一已生成的见解图表而进一步分析生成的见解数据。Optionally, the second insight data presented may be any insight data generated by selecting a chart visualization element, such as the first insight data in step 440, or may be insight data generated by further analyzing any generated insight chart.
可选地,第二见解数据的类型可以是上文的任一种见解类型,例如图表度量聚合展开分析、外部维度有效记录数分析、外部维度分布贡献分析、外部维度子空间内部特征分析和外部高可解释度度量分析等,也可以是其他类型的见解分析。Optionally, the type of the second insight data can be any of the insight types mentioned above, such as chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, external dimension subspace internal feature analysis, and external high interpretability metric analysis, etc., or it can be other types of insight analysis.
可选地,见解图表还可以包括见解图表中的见解文字说明对应的统计特征值或者极值。Optionally, the insight chart may further include statistical characteristic values or extreme values corresponding to the insight text descriptions in the insight chart.
步骤470:确认从W个图表可视化元素选取的J个图表可视化元素,其中W和J为大于1的正整数。Step 470 : confirming J chart visualization elements selected from W chart visualization elements, where W and J are positive integers greater than 1.
步骤480:确定J个图表可视化元素对应的所有H个数据记录,其中H为大于1的正整数。Step 480: Determine all H data records corresponding to the J chart visualization elements, where H is a positive integer greater than 1.
应理解,步骤470和步骤480的过程与步骤420和430的过程大体上一致,在此并不赘述。It should be understood that the processes of step 470 and step 480 are substantially the same as the processes of step 420 and step 430 and are not described in detail herein.
步骤490:根据所有H个数据记录,生成第三见解数据,第三见解数据包括H个数据记录的数据分布分析或者数据记录溯源。Step 490: Generate third insight data based on all H data records, where the third insight data includes data distribution analysis or data record tracing of the H data records.
应理解,本技术方案可以实现对于不同类型的见解数据配置了进一步交互探索分析功能,生成进一步的见解数据。步骤410中的第一图表可以是步骤460中的任意见解数据中的见解图表,步骤490中的第三见解数据的见解类型可以是步骤440中的第一见解数据见解类型,也可以在第一见解数据的见解类型的基础上增加其他见解类型。It should be understood that the present technical solution can realize further interactive exploration and analysis functions configured for different types of insight data to generate further insight data. The first chart in step 410 can be an insight chart in any insight data in step 460, and the insight type of the third insight data in step 490 can be the insight type of the first insight data in step 440, or other insight types can be added based on the insight type of the first insight data.
可选地,生成的第三见解数据可以是进一步分析见解数据的子空间内部的数据记录分布分析,旨在帮助用户对于算法推荐见解的维度分布图表中感兴趣模式的进一步深入发掘。Optionally, the generated third insight data may be a further analysis of the data record distribution within a subspace of the insight data, aiming to help users further explore patterns of interest in the dimensional distribution chart of the algorithm-recommended insights.
示例性地,若本技术方案中步骤460的见解图表是派生的外部维度有效记录数分析、外部维度分布分析类型的见解图表,当用户交互刷选或点击选择了见解特征中感兴趣的维度子空间,并执行子空间分布探索,该技术方案将再次生成一张同样支持交互的子空间度量分布见解图表,该见解的度量的选择与该类见解关联的度量以及原始生成该数据解释的图表度量相关。Exemplarily, if the insight chart of step 460 in the present technical solution is a derived insight chart of the external dimension valid record number analysis or external dimension distribution analysis type, when the user interactively swipes or clicks to select the dimensional subspace of interest in the insight feature and performs subspace distribution exploration, the technical solution will again generate a subspace metric distribution insight chart that also supports interaction. The selection of the metric for this insight is related to the metric associated with this type of insight and the metric of the chart that originally generated the data interpretation.
可选地,生成的进一步的见解数据可以是原始数据溯源,旨在帮助用户对于推荐展示的见解分布中异常的局部进行便捷的原始数据查询,探寻分布特征的原因。Optionally, the further insight data generated can be original data traceability, aiming to help users easily perform original data queries on abnormal parts of the recommended insight distribution and explore the reasons for the distribution characteristics.
示例性地,若本技术方案中步骤460的见解图表是图表度量聚合展开分析、外部维度子空间内部特征分析及上述子空间分布探索二次派生出的见解图表,由于执行该功能操作时通常已经进行了多次足够精细粒度的向下剖析,其直接返回的原始数据记录往往数量不多,但具有强大的解释性。类似地,对于算法生成的见解数据中的文字见解描述的统计特征值,本技术方案支持便捷地进行原始记录溯源,两者采用一致的展示形式。可选地,本技术方案采用分页表格的形式来进行原始记录展示。Exemplarily, if the insight chart of step 460 in the present technical solution is an insight chart derived from the aggregation and expansion analysis of chart metrics, the internal feature analysis of the external dimension subspace, and the above-mentioned subspace distribution exploration, since multiple sufficiently fine-grained downward analyses have usually been performed when executing this functional operation, the original data records directly returned are often small in number, but have strong explanatory power. Similarly, for the statistical characteristic values of the textual insight descriptions in the insight data generated by the algorithm, the present technical solution supports convenient tracing of the original records, and both use a consistent display format. Optionally, the present technical solution uses a paginated table to display the original records.
可选地,第三见解数据的类型并不局限于以上两种见解类型,也可以是上文中提及的图表度量聚合展开分析、外部维度有效记录数分析、外部维度分布贡献分析等见解数据类型的任一种。Optionally, the type of the third insight data is not limited to the above two insight types, and may also be any of the insight data types mentioned above, such as the chart metric aggregation expansion analysis, external dimension valid record number analysis, external dimension distribution contribution analysis, etc.
本技术能够支持用户对于算法派生的数据解释图表继续开展丰富的交互操作,实现对于见解特征子空间内部的进一步聚焦分析。构建关注的特征子空间后,点击功能菜单中相应内容,保证了本技术方案中数据解释功能内部使用的逻辑连续性,降低了学习成本。This technology can support users to continue to carry out rich interactive operations on the data interpretation charts derived from the algorithm, and realize further focused analysis within the insight feature subspace. After constructing the feature subspace of interest, click the corresponding content in the function menu to ensure the logical continuity of the internal use of the data interpretation function in this technical solution, reducing the learning cost.
当下商业智能分析平台中图表的自动化智能见解生成应用场景中,基于全局数据进行自动搜索和见解挖掘的见解生成方式会一次性产生大量的见解图表呈现给用户,缺失了用户关注的焦点,使其难以抉择从何处入手探索,存在一定的“冷启动”问题。In the current application scenarios of automated intelligent insight generation of charts in business intelligence analysis platforms, the insight generation method based on automatic search and insight mining of global data will generate a large number of insight charts at one time and present them to users. This lacks the focus of user attention, making it difficult for users to decide where to start exploring, and there is a certain "cold start" problem.
为了避免上述问题,本申请设计了一种排序策略,可确定见解中多个子见解的优先级顺序,按照
优先级顺序推荐多个子见解,排序生成最终的结果。具体地,排序策略应用在步骤440和490的生成见解图表之后,呈现见解界面之前。图5示出了本申请排序策略500的一种实施例的示意性流程框图,统筹考虑同一类见解内部的各见解具有的全量特征的置信度和见解所具有的特征丰富度两个方面。如图5所示,该方法包括步骤510-540,下面分别对步骤510-540进行详细描述。假设见解数据中包括了P个子见解数据。In order to avoid the above problems, this application designs a sorting strategy to determine the priority order of multiple sub-insights in an insight. Multiple sub-insights are recommended in order of priority, and the final result is generated by sorting. Specifically, the sorting strategy is applied after generating the insight chart in steps 440 and 490 and before presenting the insight interface. Figure 5 shows a schematic flowchart of an embodiment of the sorting strategy 500 of the present application, which comprehensively considers two aspects: the confidence of the full amount of features possessed by each insight within the same type of insight and the feature richness of the insight. As shown in Figure 5, the method includes steps 510-540, and steps 510-540 are described in detail below. Assume that the insight data includes P sub-insight data.
步骤510:确定P个子见解数据中每个子见解数据的特征指标值,特征指标值用于度量P个子见解数据中每个子见解数据的置信度或显著度。Step 510: Determine a characteristic index value of each sub-insight data among the P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data.
可选地,对于不同的特征,本技术方案分别制定了不同的度量方式。Optionally, for different features, the present technical solution formulates different measurement methods respectively.
示例性地,统计特征可以基于特征值的数量和异常值对于全量数据的离群度等指标来描述,分布分析特征可以通过分布的不均匀度、极大分布占比等指标描述,告警相关分析特征可以通过对应的告警值占比来描述,关联性度量分析特征可以通过上述度量指标来描述,差异性分析可以基于分箱后的离散分布KL散度来描述等。Exemplarily, statistical features can be described based on indicators such as the number of eigenvalues and the degree of outliers of outliers relative to the full data. Distribution analysis features can be described by indicators such as the unevenness of distribution and the proportion of maximum distribution. Alarm-related analysis features can be described by the corresponding proportion of alarm values. Correlation measurement analysis features can be described by the above-mentioned measurement indicators. Difference analysis can be described based on the KL divergence of discrete distribution after binning, etc.
步骤520:确认特征指标值高于特征指标值的阈值的Q个子见解数据,其中Q为大于1的正整数,且P大于Q。Step 520: Confirm Q sub-insight data whose feature index values are higher than the threshold value of the feature index value, where Q is a positive integer greater than 1, and P is greater than Q.
可选地,本技术方案过滤特征指标值低于该阈值的特征见解,可以是将特征指标较低的见解放在呈现界面队列的最末端,也可以是删除该特征见解。Optionally, the technical solution filters feature insights whose feature index values are lower than the threshold value, which may be by placing the insights with lower feature indexes at the very end of the presentation interface queue, or by deleting the feature insights.
步骤530:确定Q个子见解数据中每个子见解数据的特征种类数量。Step 530: Determine the number of feature types of each sub-insight data among the Q sub-insight data.
应理解,特征种类数量可用于描述见解数据的特征丰富度。It should be understood that the number of feature types can be used to describe the feature richness of insight data.
步骤540:根据每个子见解数据的特征种类数量,降序排序确定Q个子见解数据的优先级顺序。Step 540: Determine the priority order of the Q sub-insight data by sorting in descending order according to the number of feature types of each sub-insight data.
可选地,对于完成了高特征指标过滤的每个子见解数据,本技术方案可以统计其具有的特征丰富度的描述,通过降序排序确定不同子见解数据的推荐优先级。Optionally, for each sub-insight data that has completed the high feature index filtering, the present technical solution can count the description of its feature richness and determine the recommendation priority of different sub-insight data by sorting in descending order.
可选地,若多个见解具有相同数量的特征种类,则将不同见解具有的各个特征的特征指标降序排序后依次对比确定优先级。Optionally, if multiple insights have the same number of feature types, the feature indicators of the features of different insights are sorted in descending order and then compared in sequence to determine the priorities.
上文中方法400与方法450可以单独使用,也可以结合使用。下文结合具体的例子介绍结合使用的以实现见解数据生成的方法,并结合该例子介绍排序策略500的实现。图6、图7和图8示出了使用本申请技术方案实现见解数据生成以及见解数据生成的逐步探索深入过程的详细案例,案例中的元素和数据皆为示例,实际案例包括但不限于图6、图7和图8中案例的情形。The above method 400 and method 450 can be used alone or in combination. The following describes the method used in combination to achieve insight data generation with a specific example, and describes the implementation of the sorting strategy 500 with the example. Figures 6, 7 and 8 show detailed cases of using the technical solution of the present application to achieve insight data generation and the step-by-step exploration and in-depth process of insight data generation. The elements and data in the cases are all examples, and actual cases include but are not limited to the cases in Figures 6, 7 and 8.
图6示出了应用界面600,本案例中的应用可以是表格应用或者数据智能分析应用。应用界面600中包括了数据表610,数据表610中包含了多个维度和度量。在本案例中数据表即是数据源,实际案例中可以包含多个数据源,每个数据源可以包含多个数据表。FIG6 shows an application interface 600. The application in this case may be a table application or a data intelligence analysis application. The application interface 600 includes a data table 610, which includes multiple dimensions and metrics. In this case, the data table is the data source. In an actual case, multiple data sources may be included, and each data source may include multiple data tables.
提取数据表610中的时间维度1-5,地点维度A、B和C,单价维度a、b和c以及销售量X。地点维度A、B和C以及单价维度a、b和c进行求和聚合组成销售量X,即地点维度和单价维度作为外部维度,不参与图表的绘制,直接将其数据记录进行累加得到求和聚合值。取时间维度1-5作为x轴,销售量X作为y轴,绘制柱状图图表620并呈现在应用界面600中。本案例选取柱状图图表作为示例,实际案例中还可以是折线图、饼状图等图表。图表620中包含了五个图表可视化元素,即五个柱子,每个图表可视化元素的数据点的数值均由地点维度A、B和C以及单价维度a、b和c的多个数据取值求和聚合而成,即对应于数据表中的多个相关维度取值的数据记录。实际案例中每个图表可视化元素的数据点的取值也可以只对应一个数据记录。这一部分步骤过程对应于上文中的步骤410。Extract the time dimension 1-5, location dimension A, B and C, unit price dimension a, b and c and sales volume X in the data table 610. The location dimension A, B and C and the unit price dimension a, b and c are summed and aggregated to form the sales volume X, that is, the location dimension and the unit price dimension are external dimensions and do not participate in the drawing of the chart. Their data records are directly accumulated to obtain the summed aggregate value. Take the time dimension 1-5 as the x-axis and the sales volume X as the y-axis to draw a bar chart 620 and present it in the application interface 600. This case selects a bar chart as an example, and in actual cases, it can also be a line chart, a pie chart, etc. Chart 620 contains five chart visualization elements, that is, five columns, and the value of the data point of each chart visualization element is summed and aggregated by multiple data values of the location dimensions A, B and C and the unit price dimensions a, b and c, that is, corresponding to the data records of multiple related dimension values in the data table. In actual cases, the value of the data point of each chart visualization element can also correspond to only one data record. This part of the step process corresponds to step 410 above.
图表620中的虚线框是应用界面600的选取框,选取框范围内的图表可视化元素会高亮,即图表620中虚线框内两个图表可视化元素被斜线填充。该两个图表可视化元素即是本案例中需要进行见解分析的对象。本案例中选取框是连续选取,实际案例可以是多个选取框选取不连续的多个数据,也可以是只选取一个图表可视化元素,本案例不做限定。这一部分步骤对应于上文中的步骤420。The dotted box in chart 620 is a selection box of application interface 600. The chart visualization elements within the selection box will be highlighted, that is, the two chart visualization elements in the dotted box in chart 620 are filled with diagonal lines. The two chart visualization elements are the objects that need to be analyzed for insights in this case. The selection box in this case is a continuous selection. In actual cases, multiple selection boxes can select multiple discontinuous data, or only one chart visualization element can be selected. This case is not limited. This part of the steps corresponds to step 420 above.
确定了被选取的图表可视化元素后,应用的后台确定被选取的图表可视化元素对应的数据表610中的数据记录。在本案例中,被选取的图表可视化元素对应的x轴维度的具体取值为时间1-3,即选取交互的操作为沿着图表的x轴的维度进行批量选取具体取值为1-3的图表可视化元素。根据选取交互的操作以及构成y轴的聚合值的维度,生成被选取的图表可视化元素对应的数据记录的筛选逻辑,即筛选维度组合的特定取值为(时间维度1或2或3)和(地点维度A或B或C)和(单价维度a或b或c)的逻辑组合。生成的筛选逻辑即可用于查询数据表610中的全部原始数据记录,即确定被选取的图表
可视化元素对应的数据记录。这一部分步骤对应于上文中的步骤430。在实际案例中,每个柱子也可能由于呈现出不同的图例取值而被分割成多个子柱子,当只有部分子柱子被选取时,此时得到的逻辑组合中的外部维度也可能存在部分取值的情况。After determining the selected chart visualization element, the background of the application determines the data records in the data table 610 corresponding to the selected chart visualization element. In this case, the specific value of the x-axis dimension corresponding to the selected chart visualization element is time 1-3, that is, the operation of selecting the interaction is to batch select chart visualization elements with specific values 1-3 along the dimension of the x-axis of the chart. According to the operation of selecting the interaction and the dimensions that constitute the aggregate value of the y-axis, the filtering logic of the data records corresponding to the selected chart visualization element is generated, that is, the specific value of the filtering dimension combination is a logical combination of (time dimension 1 or 2 or 3) and (location dimension A or B or C) and (unit price dimension a or b or c). The generated filtering logic can be used to query all the original data records in the data table 610, that is, to determine the selected chart. Data records corresponding to the visualization elements. This part of the steps corresponds to step 430 above. In actual cases, each column may be divided into multiple sub-columns due to different legend values. When only some sub-columns are selected, the external dimensions in the obtained logical combination may also have partial values.
基于上文确定的被选取的图表可视化元素对应的数据记录,进行联合数据分析,以生成见解数据。具体的见解数据根据不同数据记录子集的分析结果而定,下面示例性地举例:Based on the data records corresponding to the selected chart visualization elements determined above, joint data analysis is performed to generate insight data. The specific insight data depends on the analysis results of different data record subsets, and the following is an exemplary example:
假设本案例中选取的两个图表可视化元素对应的销售量的数值均呈现相同现象,即呈现偏高的异常值。在联合数据分析过程中,两个图表可视化元素对应的销售量同时与图表620中未被选取的三个图表可视化元素以及数据表中剩余的数据记录进行比较。当两个图表可视化元素对应的销售量与图表620中未被选取的三个图表可视化元素进行比较时,如果发现地点维度取值为A的数据记录对销售量偏高具有重大贡献,即时间维度取值为1或2或3以及地点维度取值为A时,销售量异常高于其他维度取值下的销售量,于是时间维度取值为1或2或3的数据记录被确定为具有关联关系。关联关系即时间维度取值为1或2或3与地点维度取值为A具有关联,或者关联关系可以表述为时间维度取值为1或2或3与地点维度取值为A的数据记录具有共有的特征信息,即销售量异常高于其他维度取值下的销售量。基于前述时间维度取值为1或2或3与地点维度取值为A的数据记录、共有的特征信息以及数据表的其他数据记录,生成见解数据的内容可以是时间、地点或者单价对于异常值现象的影响贡献度,也可以是将时间维度取值为1或2或3与地点维度取值为A的数据记录沿着单价维度的聚合值展开分析,也可以是其他与外部维度相关的见解数据。这些见解数据可以对应于图6中的见解数据621、622等。但是实际案例中可以产生更多的见解数据,以及被选取的图表可视化元素呈现的现象也可以有更多个,每个现象能够产生的见解数据也可以有更多个,本案例中不做赘述。这些见解数据的分析过程均运用到了多个图表可视化元素对应的数据记录,使得用户能够对观察到的局部数据进行分析。上述的步骤过程对应于上文中的步骤440。Assume that the values of the sales volume corresponding to the two chart visualization elements selected in this case both present the same phenomenon, that is, they present abnormally high values. In the joint data analysis process, the sales volume corresponding to the two chart visualization elements are compared with the three chart visualization elements not selected in the chart 620 and the remaining data records in the data table at the same time. When the sales volume corresponding to the two chart visualization elements is compared with the three chart visualization elements not selected in the chart 620, if it is found that the data record with the location dimension value of A has a significant contribution to the high sales volume, that is, when the time dimension value is 1 or 2 or 3 and the location dimension value is A, the sales volume is abnormally higher than the sales volume under other dimension values, then the data record with the time dimension value of 1 or 2 or 3 is determined to have an association relationship. The association relationship is that the time dimension value of 1 or 2 or 3 is associated with the location dimension value of A, or the association relationship can be expressed as the data record with the time dimension value of 1 or 2 or 3 and the location dimension value of A has common feature information, that is, the sales volume is abnormally higher than the sales volume under other dimension values. Based on the data records with the aforementioned time dimension value of 1 or 2 or 3 and the location dimension value of A, the shared feature information and other data records in the data table, the content of the generated insight data can be the contribution of time, location or unit price to the outlier phenomenon, or it can be the analysis of the data records with the time dimension value of 1 or 2 or 3 and the location dimension value of A along the aggregate value of the unit price dimension, or it can be other insight data related to external dimensions. These insight data can correspond to the insight data 621, 622, etc. in Figure 6. However, more insight data can be generated in actual cases, and there can be more phenomena presented by the selected chart visualization elements, and there can be more insight data generated by each phenomenon, which will not be repeated in this case. The analysis process of these insight data uses the data records corresponding to multiple chart visualization elements, so that users can analyze the observed local data. The above steps correspond to step 440 above.
本案例根据图表620生成的见解数据621、622等包括见解图表631、632等以及见解图表对应的文字描述,其中见解图表631、632等的绘制方式与图表620的绘制方式相同,见解图表对应的文字描述中可以包含见解数据中的特征值或者极值。In this case, the insight data 621, 622, etc. generated based on chart 620 include insight charts 631, 632, etc. and text descriptions corresponding to the insight charts, wherein the drawing method of insight charts 631, 632, etc. is the same as that of chart 620, and the text descriptions corresponding to the insight charts may include characteristic values or extreme values in the insight data.
基于上文本案例选取图表620中图表可视化元素的相同的方法,选取呈现的见解图表631和632中的两个图表可视化元素。基于上述相同的分析步骤,得到见解数据641、642等见解数据。见解数据641、642等即为本案例中见解数据的进一步分析见解数据,本案例对其分析步骤不做赘述。见解数据641、642的见解类型可以是上文提到的见解数据621、622等的相同或者相似的见解类型,也可以是见解数据的子空间分析或者见解数据的溯源。Based on the same method of selecting the chart visualization elements in chart 620 in the above text case, two chart visualization elements in the presented insight charts 631 and 632 are selected. Based on the same analysis steps mentioned above, insight data 641, 642 and other insight data are obtained. Insight data 641, 642 and other insight data are further analyzed insight data in this case, and the analysis steps are not repeated in this case. The insight type of insight data 641, 642 can be the same or similar insight type as the insight data 621, 622 and other insight data mentioned above, or it can be a subspace analysis of the insight data or the tracing of the insight data.
假设见解数据641的类型是见解图表631的子空间分析,其内容可以是分析组成见解图表631的数据点聚合值对应的原始数据记录的组成,x轴可以是原始数据记录的维度具体取值,y轴是销售量,这一部分可以用来解释构成见解数据621的原始数据记录中可能存在的异常值或者贡献度较大的数据记录。Assuming that the type of insight data 641 is the subspace analysis of insight chart 631, its content may be the analysis of the composition of the original data records corresponding to the aggregated values of the data points constituting the insight chart 631. The x-axis may be the specific value of the dimension of the original data record, and the y-axis is the sales volume. This part can be used to explain the possible outliers or data records with greater contribution in the original data records constituting the insight data 621.
假设见解数据642的类型是见解图表632的原始记录溯源,其内容可以是组成见解数据642的具体原始数据记录的数值及其维度的具体取值。原始记录溯源通过分页表格来呈现这些原始数据记录。同时,见解数据中见解图表对应的文字描述中的特征值也可以进行原始记录溯源。Assuming that the type of insight data 642 is the original record traceability of insight chart 632, its content may be the numerical value of the specific original data record constituting insight data 642 and the specific value of its dimension. The original record traceability presents these original data records through a paginated table. At the same time, the feature values in the text description corresponding to the insight chart in the insight data can also be traced to the original record.
基于见解数据641中的见解图表,本案例还可以根据前述的选取图表可视化元素和数据分析步骤生成见解数据,从而实现见解数据的不断进一步下钻分析,本案例不做赘述。Based on the insight chart in insight data 641, this case can also generate insight data according to the aforementioned selection of chart visualization elements and data analysis steps, thereby achieving continuous further drill-down analysis of the insight data, which is not elaborated in this case.
图7示出了另一种详细案例的界面示意图。图7中的详细案例与图6中的详细案例略微有所不同。不同点在于,图7中的维度变为了距离、ID、编号和吞吐量等,以及图7中的见解数据生成后被呈现在不同的应用界面中,应用界面可以是属于不同应用的界面,即不同数据记录对应的见解数据可以产生在不同的应用或者应用界面中。FIG7 shows a schematic diagram of an interface of another detailed case. The detailed case in FIG7 is slightly different from the detailed case in FIG6. The difference is that the dimensions in FIG7 are changed to distance, ID, number, throughput, etc., and the insight data in FIG7 are presented in different application interfaces after being generated. The application interface may be an interface belonging to different applications, that is, the insight data corresponding to different data records may be generated in different applications or application interfaces.
图7示出的详细案例中的中间分析过程与图6示出的详细案例相似,在此不再进行赘述,只说明见解数据生成后被呈现在不同的应用界面的情况。The intermediate analysis process in the detailed case shown in FIG. 7 is similar to the detailed case shown in FIG. 6 , and will not be described in detail here. Only the situation in which the insight data is presented in different application interfaces after being generated will be described.
根据应用界面701中的数据表710,生成应用界面702中的图表720,图表720中的图表可视化元素对应于数据表710中的至少一个数据记录。在图表720中刷选出选择框,选择框内的两个图表可视化元素被用来进行见解数据生成。最终生成的见解数据721、722等被呈现在应用界面703中。在应用界面703中的见解图表731或者见解图表723中选择两个图表可视化元素,生成见解数据741等,并
呈现在应用界面704中。According to the data table 710 in the application interface 701, a chart 720 in the application interface 702 is generated, and the chart visualization element in the chart 720 corresponds to at least one data record in the data table 710. A selection box is selected in the chart 720, and two chart visualization elements in the selection box are used to generate insight data. The insight data 721, 722, etc. generated finally are presented in the application interface 703. Two chart visualization elements are selected in the insight chart 731 or the insight chart 723 in the application interface 703 to generate insight data 741, etc., and Presented in application interface 704.
以此类推,本案例支持对见解图表的聚焦的特征子空间二次分析探索并派生见解数据,优化自动见解数据生成辅助分析过程中的多层级子空间分析探索流程,提升分析自由度,由面到点,由浅入深。By analogy, this case supports secondary analysis and exploration of the focused feature subspace of the insight chart and derives insight data, optimizes the multi-level subspace analysis and exploration process in the automatic insight data generation auxiliary analysis process, and improves the analysis freedom from surface to point, from shallow to deep.
图8示出了本案例对于多个见解数据所用的排序策略800。本案例作为排序策略的一个实施例,对确定多个见解的优先级顺序的过程并不做限定。Fig. 8 shows a sorting strategy 800 used for multiple insight data in this case. This case is an embodiment of a sorting strategy, and does not limit the process of determining the priority order of multiple insights.
假设本案例在图6或者图7所示的见解数据生成过程中,获得10个见解数据,分别为见解数据810至819。应用后台对于获得的见解数据810至819进行排序。Assume that in this case, in the insight data generation process shown in FIG6 or FIG7 , 10 insight data are obtained, namely insight data 810 to 819. The application background sorts the obtained insight data 810 to 819.
首先,确定排列不同类型的见解数据所用的特征指标值,例如本案例可以选取置信度分数作为特征指标值。根据前文不同类型的特征运用不同的度量方式,应用后台确定见解数据810至819的特征指标值进行降序排列,得到的排列列表如图8所示。First, determine the feature index values used to arrange different types of insight data. For example, in this case, the confidence score can be selected as the feature index value. According to the different types of features mentioned above, different measurement methods are used. The application background determines the feature index values of insight data 810 to 819 and arranges them in descending order. The resulting arrangement list is shown in Figure 8.
其次,确定特征指标值的阈值,用于过滤特征指标值较低的部分见解数据。例如本案例选取置信度分数为0.95作为阈值,过滤掉图8所示的置信度分数低于0.95的部分见解数据。Secondly, determine the threshold of the characteristic index value to filter out some insight data with lower characteristic index values. For example, in this case, the confidence score of 0.95 is selected as the threshold to filter out some insight data with confidence scores lower than 0.95 as shown in FIG8 .
最后,确定图8中置信度分数高于0.95的各个见解数据的特征种类数量,根据特征种类数量降序排列得到最终呈现的见解数据的优先级顺序,图8中按优先级顺序降序排列的见解数据815、818、811和810等对应于图6中呈现出的见解数据721、722等或者图7中呈现出的见解数据821、822等。Finally, determine the number of feature types for each insight data with a confidence score higher than 0.95 in Figure 8, and arrange the feature types in descending order to obtain the priority order of the insight data finally presented. The insight data 815, 818, 811 and 810, etc. arranged in descending order of priority in Figure 8 correspond to the insight data 721, 722, etc. presented in Figure 6 or the insight data 821, 822, etc. presented in Figure 7.
对于图6所示的见解数据841、842等以及图7所示的见解数据841、842等的排序过程与上述排列过程相同,本案例不做赘述。The sorting process of the insight data 841 , 842 , etc. shown in FIG. 6 and the insight data 841 , 842 , etc. shown in FIG. 7 is the same as the above arrangement process, and is not described in detail in this case.
本案例对于见解数据的优先级排序的步骤使得用户能够快速找到关注的焦点,避免用户难以抉择从何处入手探索。The steps in this case to prioritize insight data allow users to quickly find the focus of attention and avoid being confused about where to start exploring.
下文结合图9介绍本申请实施例的生成见解的装置。需要说明的是,图9所示的装置可以执行图4和图5所示的方法。应理解,下面描述的装置能够执行前述本申请实施例的方法,为了避免不必要的重复,下面在介绍本申请实施例的装置时适当省略重复的描述。The following describes the apparatus for generating insights according to an embodiment of the present application in conjunction with FIG9. It should be noted that the apparatus shown in FIG9 can execute the methods shown in FIG4 and FIG5. It should be understood that the apparatus described below can execute the methods of the aforementioned embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the apparatus of the embodiment of the present application.
图9是本申请实施例的一种生成见解的装置的示意图,图9所示的装置900包括:交互模块910以及处理模块920。FIG9 is a schematic diagram of a device for generating insights according to an embodiment of the present application. The device 900 shown in FIG9 includes: an interaction module 910 and a processing module 920 .
具体地,交互模块,用于:呈现第一图表,第一图表包括M个图表可视化元素,每个图表可视化元素对应于数据源中至少一个数据记录。Specifically, the interaction module is used to: present a first chart, the first chart includes M chart visualization elements, and each chart visualization element corresponds to at least one data record in a data source.
具体地,处理模块,用于:用于确认从M个图表可视化元素中选择的N个图表可视化元素,其中M和N为大于1的正整数,且M大于或等于N,确定N个图表可视化元素对应的所有K个数据记录,其中K为大于1的正整数,并基于K个数据记录,进行联合数据分析,以生成N个图表可视化元素的第一见解数据。Specifically, the processing module is used to: confirm N chart visualization elements selected from M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N, determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1, and perform joint data analysis based on the K data records to generate first insight data for the N chart visualization elements.
可选地,作为一个实施例,处理模块还用于确定K个数据记录中的L个数据记录共有的特征信息,L个数据记录对应于至少两个图表可视化元素,其中L为大于1的正整数,且K大于或等于L,并基于L个数据记录、L个数据记录共有的特征信息以及数据源中的所有数据记录,进行数据分析。Optionally, as an embodiment, the processing module is also used to determine characteristic information common to L data records out of K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the characteristic information common to the L data records, and all data records in the data source.
可选地,作为一个实施例,处理模块还用于根据基于第二见解数据生成的见解图表,生成N个图表可视化元素对应的数据记录内部的数值分布情况或者数据记录溯源。Optionally, as an embodiment, the processing module is further used to generate a numerical distribution or data record traceability inside data records corresponding to N chart visualization elements according to an insight chart generated based on the second insight data.
可选地,作为一个实施例,处理模块还用于确定第一见解数据包括的P个子见解数据的优先级顺序,其中P为大于1的正整数,并按照优先级顺序推荐该P个子见解数据。Optionally, as an embodiment, the processing module is further used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data in order of priority.
可选地,作为一个实施例,处理模块还用于确定P个子见解数据中每个子见解数据的特征指标值,该特征指标值用于度量P个子见解数据中每个子见解数据的置信度或显著度,确认特征指标值高于特征指标值的阈值的Q个子见解数据,其中Q为大于1的正整数,且P大于Q,确定Q个子见解数据中每个子见解数据的特征种类数量,并根据每个子见解数据的特征种类数量,降序排序确定该Q个子见解数据的优先级顺序。Optionally, as an embodiment, the processing module is also used to determine a characteristic index value for each sub-insight data among P sub-insight data, where the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data, confirm Q sub-insight data whose characteristic index values are higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q, determine the number of characteristic types of each sub-insight data among the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting them in descending order according to the number of characteristic types of each sub-insight data.
可选地,作为一个实施例,处理模块还用于确定N个图表可视化元素对应的第一图表中的维度和度量,并根据第一图表中的维度和度量生成查询请求,该查询请求用于查询数据源中的数据记录。Optionally, as an embodiment, the processing module is further used to determine dimensions and metrics in a first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, where the query request is used to query data records in a data source.
其中,上述模块均可以通过软件实现,或者可以通过硬件实现。示例性的,接下来以处理模块920为例,介绍处理模块920的实现方式。类似的,交互模块910的实现方式可以参考处理模块920的实现方式。The above modules can be implemented by software or hardware. For example, the implementation of the processing module 920 is described below by taking the processing module 920 as an example. Similarly, the implementation of the interaction module 910 can refer to the implementation of the processing module 920.
模块作为软件功能单元的一种举例,处理模块920可以包括运行在计算实例上的代码。其中,计
算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,处理模块920可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。As an example of a software functional unit, the processing module 920 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the computing instance may be one or more. For example, the processing module 920 may include code running on multiple hosts/virtual machines/containers. It should be noted that the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region or in different regions. Furthermore, the multiple hosts/virtual machines/containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs, each AZ including one data center or multiple data centers with close geographical locations. Among them, generally a region may include multiple AZs.
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。Similarly, multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs. Usually, a VPC is set up in a region. For cross-region communication between two VPCs in the same region and between VPCs in different regions, a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
模块作为硬件功能单元的一种举例,处理模块920可以包括至少一个计算设备,如服务器等。或者,处理模块920也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。As an example of a hardware functional unit, the processing module 920 may include at least one computing device, such as a server, etc. Alternatively, the processing module 920 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
处理模块920包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。处理模块920包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,处理模块920包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,处理模块920包括的多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。The multiple computing devices included in the processing module 920 can be distributed in the same region or in different regions. The multiple computing devices included in the processing module 920 can be distributed in the same AZ or in different AZs. Similarly, the multiple computing devices included in the processing module 920 can be distributed in the same VPC or in multiple VPCs. The multiple computing devices included in the processing module 920 can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
本申请还提供一种计算设备1000。如图10所示,计算设备1000包括:总线1002、处理器1004、存储器1006和通信接口1008。处理器1004、存储器1006和通信接口1008之间通过总线1002通信。计算设备1000可以是服务器或终端设备。应理解,本申请不限定计算设备1000中的处理器、存储器的个数。The present application also provides a computing device 1000. As shown in FIG10 , the computing device 1000 includes: a bus 1002, a processor 1004, a memory 1006, and a communication interface 1008. The processor 1004, the memory 1006, and the communication interface 1008 communicate with each other through the bus 1002. The computing device 1000 may be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 1000.
总线1002可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1002可包括在计算设备1000各个部件(例如,存储器1006、处理器1004、通信接口1008)之间传送信息的通路。The bus 1002 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG. 10 is represented by only one line, but does not mean that there is only one bus or one type of bus. The bus 1002 may include a path for transmitting information between various components of the computing device 1000 (e.g., the memory 1006, the processor 1004, and the communication interface 1008).
处理器1004可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。Processor 1004 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).
存储器1006可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器1004还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。The memory 1006 may include a volatile memory, such as a random access memory (RAM). The processor 1004 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).
存储器1006中存储有可执行的程序代码,处理器1004执行该可执行的程序代码以分别实现前述交互模块910和处理模块920的功能,从而实现上述见解数据生成的方法。也即,存储器1006上存有用于执行上述见解数据分析生成的方法的指令。The memory 1006 stores executable program codes, and the processor 1004 executes the executable program codes to respectively implement the functions of the aforementioned interaction module 910 and the processing module 920, thereby implementing the aforementioned method for generating insight data. That is, the memory 1006 stores instructions for executing the aforementioned method for analyzing and generating insight data.
通信接口1008使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备1000与其他设备或通信网络之间的通信。The communication interface 1008 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 1000 and other devices or communication networks.
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。The embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.
如图11所示,该计算设备集群包括至少一个计算设备1000。计算设备集群中的一个或多个计算设备1000中的存储器1006中可以存有相同的用于执行上述见解数据生成的方法的指令。As shown in Fig. 11, the computing device cluster includes at least one computing device 1000. The memory 1006 in one or more computing devices 1000 in the computing device cluster may store the same instructions for executing the above-mentioned insight data generation method.
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备1000的存储器1006中也可以分别存有用于执行上述见解数据生成的方法的部分指令。换言之,一个或多个计算设备1000的组合可以共同执行用于执行上述见解数据生成的方法的指令。
In some possible implementations, the memory 1006 of one or more computing devices 1000 in the computing device cluster may also respectively store some instructions for executing the above-mentioned method for generating insight data. In other words, the combination of one or more computing devices 1000 may jointly execute instructions for executing the above-mentioned method for generating insight data.
需要说明的是,计算设备集群中的不同的计算设备1000中的存储器1006可以存储不同的指令,分别用于执行上述装置的部分功能。也即,不同的计算设备1000中的存储器1006存储的指令可以实现交互模块和处理模块中的一个或多个模块的功能。It should be noted that the memory 1006 in different computing devices 1000 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the above-mentioned apparatus. That is, the instructions stored in the memory 1006 in different computing devices 1000 can implement the functions of one or more modules in the interaction module and the processing module.
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,该网络可以是广域网或局域网等等。图12示出了一种可能的实现方式。如图12所示,两个计算设备1000A和1000B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与该网络进行连接。在这一类可能的实现方式中,计算设备1000A中的存储器1006中存有交互模块的功能的指令。同时,计算设备1000B中的存储器1006中存有执行处理模块的功能的指令。In some possible implementations, one or more computing devices in the computing device cluster can be connected via a network. Among them, the network can be a wide area network or a local area network, etc. Figure 12 shows a possible implementation. As shown in Figure 12, two computing devices 1000A and 1000B are connected via a network. Specifically, the network is connected through the communication interface in each computing device. In this type of possible implementation, the memory 1006 in the computing device 1000A stores instructions for the functions of the interaction module. At the same time, the memory 1006 in the computing device 1000B stores instructions for executing the functions of the processing module.
应理解,图12中示出的计算设备1000A的功能也可以由多个计算设备1000完成。同样,计算设备1000B的功能也可以由多个计算设备1000完成。It should be understood that the functions of the computing device 1000A shown in FIG12 may also be completed by multiple computing devices 1000. Similarly, the functions of the computing device 1000B may also be completed by multiple computing devices 1000.
本申请实施例还提供一种芯片,该芯片包括处理器与数据接口,该处理器通过该数据接口读取存储器上存储的指令,以执行上述见解数据生成的方法。An embodiment of the present application also provides a chip, which includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface to execute the above-mentioned method for generating insight data.
本申请实施例还提供了一种包含指令的计算机程序产品。该计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当该计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行上述见解数据生成的方法。The present application also provides a computer program product including instructions. The computer program product may be software or a program product including instructions that can be run on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, the at least one computing device executes the above-mentioned method for generating insight data.
本申请实施例还提供了一种计算机可读存储介质。该计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,该指令指示计算设备执行上述见解数据生成的方法。The embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk). The computer-readable storage medium includes instructions that instruct the computing device to execute the above-mentioned method for generating insight data.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的保护范围。The above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, a person skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the protection scope of the technical solutions of the embodiments of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and modules described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of modules is only a logical function division. There may be other division methods in actual implementation, such as multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or modules, which can be electrical, mechanical or other forms.
作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in one place or distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例的方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods of various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.
以上仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
The above are only specific implementations of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.
Claims (17)
- 一种见解数据的生成方法,其特征在于,包括:A method for generating insight data, characterized by comprising:呈现第一图表,所述第一图表包括M个图表可视化元素,每个所述图表可视化元素对应于数据源中至少一个数据记录;Presenting a first chart, the first chart comprising M chart visualization elements, each of the chart visualization elements corresponding to at least one data record in a data source;确认从所述M个图表可视化元素中选择的N个图表可视化元素,其中M和N为大于1的正整数,且M大于或等于N;confirming N chart visualization elements selected from the M chart visualization elements, wherein M and N are positive integers greater than 1, and M is greater than or equal to N;确定所述N个图表可视化元素对应的所有K个数据记录,其中K为大于1的正整数;Determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1;基于所述所有K个数据记录,进行联合数据分析,以生成所述N个图表可视化元素的第一见解数据。Based on the all K data records, joint data analysis is performed to generate first insight data for the N chart visualization elements.
- 根据权利要求1所述的方法,其特征在于,所述基于所述所有K个数据记录,进行联合数据分析,包括:The method according to claim 1, characterized in that the performing joint data analysis based on all K data records comprises:确定所述K个数据记录中的L个数据记录共有的特征信息,所述L个数据记录对应于至少两个图表可视化元素,其中L为大于1的正整数,且K大于或等于L;Determining feature information common to L data records of the K data records, the L data records corresponding to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L;基于所述L个数据记录、所述L个数据记录共有的特征信息以及所述数据源中的所有数据记录,进行数据分析。Data analysis is performed based on the L data records, the feature information common to the L data records, and all data records in the data source.
- 根据权利要求1或2所述的方法,其特征在于,所述第一见解数据包括以下见解数据类型的至少一种:The method according to claim 1 or 2, characterized in that the first insight data includes at least one of the following insight data types:图表度量聚合展开分析,所述图表度量聚合展开分析用于分析所述N个图表可视化元素对应的数据记录的原始数据分布构成;A chart metric aggregation expansion analysis, wherein the chart metric aggregation expansion analysis is used to analyze the original data distribution composition of the data records corresponding to the N chart visualization elements;外部维度有效记录数分析,所述外部维度有效记录数分析用于分析所述K个数据记录在未参与绘制第一图表的维度上的有效记录数分布情况;An analysis of the number of valid records in an external dimension, wherein the analysis of the number of valid records in an external dimension is used to analyze the distribution of the number of valid records of the K data records in a dimension that is not involved in drawing the first chart;外部维度分布贡献分析,所述外部维度分布贡献分析用于分析所述K个数据记录在未参与绘制第一图表的维度上对图表度量的贡献度;External dimension distribution contribution analysis, the external dimension distribution contribution analysis is used to analyze the contribution of the K data records to the chart measurement on the dimension that does not participate in drawing the first chart;外部维度子空间内部特征分析,所述外部维度子空间内部特征分析用于分析未参与绘制第一图表的维度中数据记录内部的特征分布情况;Internal feature analysis of the external dimensional subspace, wherein the internal feature analysis of the external dimensional subspace is used to analyze the internal feature distribution of the data records in the dimension that is not involved in drawing the first chart;外部高可解释度度量分析,所述外部高可解释度度量分析用于分析未参与绘制第一图表的度量及原始数据记录与所述L个数据记录的关联情况。External highly interpretable metric analysis, the external highly interpretable metric analysis is used to analyze the correlation between the metrics that are not involved in drawing the first chart and the original data records and the L data records.
- 根据权利要求1至3中任一项所述的方法,其特征在于,The method according to any one of claims 1 to 3, characterized in that所述第一图表为基于第二见解数据生成的见解图表,所述生成所述N个图表可视化元素的第一见解数据包括生成所述N个图表可视化元素的对应的数据记录内部的数值分布情况或者数据记录溯源。The first chart is an insight chart generated based on the second insight data, and the first insight data for generating the N chart visualization elements includes the numerical distribution inside the data records corresponding to the N chart visualization elements or the data record tracing.
- 根据权利要求1至4中任一项所述的方法,其特征在于,所述第一见解数据包括P个子见解数据,其中P为大于1的正整数,所述方法还包括:The method according to any one of claims 1 to 4, characterized in that the first insight data includes P sub-insight data, where P is a positive integer greater than 1, and the method further comprises:确定所述P个子见解数据的优先级顺序;Determining the priority order of the P sub-insight data;按照所述优先级顺序推荐所述P个子见解数据。The P sub-insight data are recommended according to the priority order.
- 根据权利要求5所述的方法,其特征在于,所述确定所述P个子见解数据的优先级顺序,包括:The method according to claim 5, characterized in that the determining the priority order of the P sub-insight data comprises:确定P个子见解数据中每个子见解数据的特征指标值,所述特征指标值用于度量所述P个子见解数据中每个子见解数据的置信度或显著度;Determine a characteristic index value of each sub-insight data in the P sub-insight data, wherein the characteristic index value is used to measure the confidence or significance of each sub-insight data in the P sub-insight data;确认特征指标值高于特征指标值的阈值的Q个子见解数据,其中Q为大于1的正整数,且P大于Q;Confirm Q sub-insight data whose characteristic index values are higher than the threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q;确定所述Q个子见解数据中每个子见解数据的特征种类数量;Determining the number of feature types of each sub-insight data in the Q sub-insight data;根据所述每个子见解数据的特征种类数量,降序排序确定所述Q个子见解数据的优先级顺序。The priority order of the Q sub-insight data is determined by sorting in descending order according to the number of feature types of each sub-insight data.
- 根据权利要求1至6中任一项所述的方法,其特征在于,所述确定所述N个图表可视化元素对应的所有K个数据记录,包括:The method according to any one of claims 1 to 6, characterized in that the step of determining all K data records corresponding to the N chart visualization elements comprises:确定所述N个图表可视化元素对应的第一图表中的维度和度量;Determining dimensions and metrics in a first chart corresponding to the N chart visualization elements;根据所述第一图表中的维度和度量生成查询请求,所述查询请求用于查询所述数据源中的数据记 录。Generate a query request based on the dimensions and metrics in the first chart, the query request is used to query the data records in the data source record.
- 一种生成见解数据的装置,其特征在于,包括:A device for generating insight data, comprising:交互模块,用于呈现第一图表,所述第一图表包括M个图表可视化元素,每个所述图表可视化元素对应于数据源中至少一个数据记录;An interaction module, configured to present a first chart, wherein the first chart comprises M chart visualization elements, each of the chart visualization elements corresponding to at least one data record in a data source;处理模块,用于确认从所述M个图表可视化元素中选择的N个图表可视化元素,其中M和N为大于1的正整数,且M大于或等于N,确定所述N个图表可视化元素对应的所有K个数据记录,其中K为大于1的正整数,并基于所述所有K个数据记录,进行联合数据分析,以生成所述N个图表可视化元素的第一见解数据。A processing module is used to confirm N chart visualization elements selected from the M chart visualization elements, where M and N are positive integers greater than 1, and M is greater than or equal to N, determine all K data records corresponding to the N chart visualization elements, where K is a positive integer greater than 1, and perform joint data analysis based on all K data records to generate first insight data for the N chart visualization elements.
- 根据权利要求8所述的装置,其特征在于,所述处理模块还用于确定所述K个数据记录中的L个数据记录共有的特征信息,所述L个数据记录对应于至少两个图表可视化元素,其中L为大于1的正整数,且K大于或等于L,并基于所述L个数据记录、所述L个数据记录共有的特征信息以及所述数据源中的所有数据记录,进行数据分析。The device according to claim 8 is characterized in that the processing module is also used to determine feature information common to L data records among the K data records, where the L data records correspond to at least two chart visualization elements, where L is a positive integer greater than 1, and K is greater than or equal to L, and perform data analysis based on the L data records, the feature information common to the L data records, and all data records in the data source.
- 根据权利要求8或9所述的装置,其特征在于,所述处理模块还用于根据基于第二见解数据生成的见解图表,生成所述N个图表可视化元素的对应的数据记录内部的数值分布情况或者数据记录溯源。The device according to claim 8 or 9 is characterized in that the processing module is also used to generate the numerical distribution or data record traceability within the data records corresponding to the N chart visualization elements based on the insight chart generated based on the second insight data.
- 根据权利要求8至10任一所述的装置,其特征在于,所述处理模块还用于确定所述第一见解数据包括的P个子见解数据的优先级顺序,其中P为大于1的正整数,并按照所述优先级顺序推荐所述P个子见解数据。The device according to any one of claims 8 to 10 is characterized in that the processing module is also used to determine the priority order of P sub-insight data included in the first insight data, where P is a positive integer greater than 1, and recommend the P sub-insight data according to the priority order.
- 根据权利要求11所述的装置,其特征在于,所述处理模块还用于确定P个子见解数据中每个子见解数据的特征指标值,所述特征指标值用于度量所述P个子见解数据中每个子见解数据的置信度或显著度,确认特征指标值高于特征指标值的阈值的Q个子见解数据,其中Q为大于1的正整数,且P大于Q,确定所述Q个子见解数据中每个子见解数据的特征种类数量,并根据所述每个子见解数据的特征种类数量,降序排序确定所述Q个子见解数据的优先级顺序。The device according to claim 11 is characterized in that the processing module is also used to determine a characteristic index value for each sub-insight data among P sub-insight data, the characteristic index value is used to measure the confidence or significance of each sub-insight data among the P sub-insight data, confirm Q sub-insight data whose characteristic index value is higher than a threshold value of the characteristic index value, where Q is a positive integer greater than 1, and P is greater than Q, determine the number of characteristic types of each sub-insight data among the Q sub-insight data, and determine the priority order of the Q sub-insight data by sorting in descending order according to the number of characteristic types of each sub-insight data.
- 根据权利要求8至12任一所述的装置,其特征在于,所述处理模块还用于确定所述N个图表可视化元素对应的第一图表中的维度和度量,并根据所述第一图表中的维度和度量生成查询请求,所述查询请求用于查询所述数据源中的数据记录。According to the device according to any one of claims 8 to 12, it is characterized in that the processing module is also used to determine the dimensions and metrics in the first chart corresponding to the N chart visualization elements, and generate a query request based on the dimensions and metrics in the first chart, and the query request is used to query the data records in the data source.
- 一种计算设备,其特征在于,包括处理器和存储器,所述处理器用于执行所述存储器中存储的指令,以使得所述计算设备执行如权利要求1至7中任一项所述的方法。A computing device, comprising a processor and a memory, wherein the processor is used to execute instructions stored in the memory so that the computing device executes the method according to any one of claims 1 to 7.
- 一种计算设备集群,其特征在于,包括:包括至少一个计算设备,每个计算设备包括处理器和存储器;A computing device cluster, characterized by comprising: at least one computing device, each computing device comprising a processor and a memory;所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1至7中任一项所述的方法。The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method according to any one of claims 1 to 7.
- 一种包含指令的计算机程序产品,其特征在于,当所述指令被计算设备集群运行时,使得所述计算设备集群执行如权利要求的1至7任一项所述的方法。A computer program product comprising instructions, characterized in that when the instructions are executed by a computing device cluster, the computing device cluster executes the method according to any one of claims 1 to 7.
- 一种计算机可读介质,其特征在于,包括计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求1至7中任一项所述的方法。 A computer-readable medium, characterized in that it comprises a computer program, and when the computer program is run on a computer, the computer is caused to execute the method according to any one of claims 1 to 7.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211275756.8A CN117951186A (en) | 2022-10-18 | 2022-10-18 | Method and device for generating insight data |
CN202211275756.8 | 2022-10-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024082754A1 true WO2024082754A1 (en) | 2024-04-25 |
Family
ID=90736922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/109267 WO2024082754A1 (en) | 2022-10-18 | 2023-07-26 | Insight data generation method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117951186A (en) |
WO (1) | WO2024082754A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200051293A1 (en) * | 2015-06-29 | 2020-02-13 | Microsoft Technology Licensing, Llc | Multi-dimensional data insight interaction |
CN110795458A (en) * | 2019-10-08 | 2020-02-14 | 北京百分点信息科技有限公司 | Interactive data analysis method, device, electronic equipment and computer readable storage medium |
US20210240702A1 (en) * | 2020-02-05 | 2021-08-05 | Microstrategy Incorporated | Systems and methods for data insight generation and display |
-
2022
- 2022-10-18 CN CN202211275756.8A patent/CN117951186A/en active Pending
-
2023
- 2023-07-26 WO PCT/CN2023/109267 patent/WO2024082754A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200051293A1 (en) * | 2015-06-29 | 2020-02-13 | Microsoft Technology Licensing, Llc | Multi-dimensional data insight interaction |
CN110795458A (en) * | 2019-10-08 | 2020-02-14 | 北京百分点信息科技有限公司 | Interactive data analysis method, device, electronic equipment and computer readable storage medium |
US20210240702A1 (en) * | 2020-02-05 | 2021-08-05 | Microstrategy Incorporated | Systems and methods for data insight generation and display |
Also Published As
Publication number | Publication date |
---|---|
CN117951186A (en) | 2024-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11670021B1 (en) | Enhanced graphical user interface for representing events | |
US11741160B1 (en) | Determining states of key performance indicators derived from machine data | |
US8326869B2 (en) | Analysis of object structures such as benefits and provider contracts | |
JP6063053B2 (en) | System and method for presenting and navigating network data sets | |
Brundage et al. | Using graph-based visualizations to explore key performance indicator relationships for manufacturing production systems | |
US7890519B2 (en) | Summarizing data removed from a query result set based on a data quality standard | |
US10353958B2 (en) | Discriminative clustering | |
Blumenschein et al. | Evaluating reordering strategies for cluster identification in parallel coordinates | |
US10373058B1 (en) | Unstructured database analytics processing | |
WO2024082754A1 (en) | Insight data generation method and apparatus | |
CN114490833B (en) | Method and system for visualizing graph calculation result | |
US11449513B2 (en) | Data analysis system | |
US20170199911A1 (en) | Method and Query Processing Server for Optimizing Query Execution | |
CN113297040A (en) | Method and apparatus for determining insight data, computer storage medium, and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23878750 Country of ref document: EP Kind code of ref document: A1 |