CN117453805B - Visual analysis method for uncertainty data - Google Patents

Visual analysis method for uncertainty data Download PDF

Info

Publication number
CN117453805B
CN117453805B CN202311774201.2A CN202311774201A CN117453805B CN 117453805 B CN117453805 B CN 117453805B CN 202311774201 A CN202311774201 A CN 202311774201A CN 117453805 B CN117453805 B CN 117453805B
Authority
CN
China
Prior art keywords
data
analysis
target
uncertainty
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311774201.2A
Other languages
Chinese (zh)
Other versions
CN117453805A (en
Inventor
卢智嘉
韩明
杨蓓
王现彬
杨丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang University
Original Assignee
Shijiazhuang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang University filed Critical Shijiazhuang University
Priority to CN202311774201.2A priority Critical patent/CN117453805B/en
Publication of CN117453805A publication Critical patent/CN117453805A/en
Application granted granted Critical
Publication of CN117453805B publication Critical patent/CN117453805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a visual analysis method of uncertainty data, which belongs to the technical field of visual analysis and comprises the following steps: acquiring target data, dividing the target data into data sets according to data types, and judging the corresponding data sets according to matching rule characteristics; determining a history application service and a history processing means of the target data; according to all the judging results and the determining results, carrying out qualitative analysis on the target data; if the qualitative analysis result is uncertainty data, acquiring an effective type related to the target data and an analysis requirement of the target data, and selecting a visual analysis mode; and carrying out visual analysis on the target data based on a visual analysis mode, and displaying an analysis result. The problem that the same processing is often required to be carried out on data under different requirements according to a specified mode is solved.

Description

Visual analysis method for uncertainty data
Technical Field
The invention relates to the technical field of visual analysis, in particular to a visual analysis method of uncertainty data.
Background
Currently, with the advent of the big data age, uncertainty data is very common in practical applications, such as fields of financial markets, weather prediction, medical diagnosis and the like, so that a large amount of uncertainty data, such as random variables and missing values, is collected and stored, how to effectively analyze the data, extract valuable information, and plays a vital role in decision making. The conventional data analysis method usually needs to perform the same processing on the data under different requirements according to a specified mode, for example, when two different requirements exist, the specified visual mode 1 is generally adopted to perform visual effective analysis on the requirement 1, but the specified visual mode 1 is adopted to perform visual analysis on the requirement 2, the effectiveness of the visual analysis on the requirement 1 and the requirement 2 is severely reduced, but the specified visual mode 1 is still adopted to perform the same processing on the requirement 1 and the requirement 2, and finally the visual processing effect on the requirement 2 can not meet the requirement, which obviously cannot meet the analysis requirement, so how to perform targeted visual analysis on the uncertainty data becomes the urgent requirement of the hot spot and practical application of the current research.
Therefore, the invention provides a visual analysis method of uncertainty data.
Disclosure of Invention
The invention provides a visual analysis method of uncertainty data, which is characterized in that target data are acquired and divided into data sets according to data types, corresponding data sets are judged according to matching rule features, historical application service and historical processing means of the target data are determined, qualitative analysis is carried out on the target data according to judging results and determining results, if the qualitative analysis results are the uncertainty data, effective types related to the target data and analysis requirements of the target data are acquired, and a visual analysis mode is selected, so that the problem that the same visual processing is needed to be carried out on the data under different requirements according to the same specified visual mode in the background art is solved.
The invention provides a visual analysis method of uncertainty data, which comprises the following steps:
step 1: acquiring target data, dividing the target data into data sets according to data types, and judging the corresponding data sets according to matching rule characteristics;
step 2: determining a history application service and a history processing means of the target data;
step 3: according to all the judging results and the determining results, carrying out qualitative analysis on the target data;
step 4: if the qualitative analysis result is uncertainty data, acquiring an effective type related to the target data and an analysis requirement of the target data, and selecting a visual analysis mode;
step 5: and carrying out visual analysis on the target data based on a visual analysis mode, and displaying an analysis result.
Preferably, the method for acquiring the target data and dividing the target data into data sets according to data types, and judging the corresponding data sets according to type rule features includes:
analyzing each piece of sub data in the target data based on a feature extraction model, determining the data feature of each piece of sub data, and acquiring the data type of each piece of sub data according to a feature-type mapping table;
dividing all sub-data of the same data type into the same kind of data, and obtaining a data set;
and acquiring rule features matched with each data type based on the data-rule feature database, and judging corresponding data sets.
Preferably, the determining the historical application service and the historical processing means of the target data includes:
according to the source log of the target data, tracing to determine the data source of the target data and determining the service module of the target data;
the historical application business matched with the target data is called from a service database of each service module;
performing operation analysis on the historical application service, and determining the service condition and processing logic of the historical application service based on target data;
historical processing means for the target data is determined based on the usage of the target data and processing logic.
Preferably, the qualitative analysis of the target data according to all the determination results and the determination results includes:
determining whether the data set is a coarse-grained data set according to all the judging results, and if so, confirming that the target data is uncertainty data;
otherwise, judging whether the target data is subjected to missing value processing or data integration processing, if so, confirming that the target data is uncertain data, and if not, confirming that the target data is deterministic data.
Preferably, the method for selecting the visual analysis mode by acquiring the effective type related to the target data and the analysis requirement of the target data includes:
if the qualitative analysis result is uncertainty data, acquiring an effective type related to the corresponding target data;
determining the proportion of each type of data in the effective types related to the target data;
reversely deducing the analysis purpose and the analysis target of the target data based on the proportion of the data of each type;
determining analysis requirements of target data based on the analysis purpose and the analysis target, and determining a visual index according to the analysis requirements of the target data;
a visual analysis mode is selected from the index-mode mapping table based on the visual index.
Preferably, the visual analysis is performed on the target data based on the visual analysis mode, and the analysis result is displayed, including:
determining a display form of the target data based on a visual analysis mode;
determining a visualization processing tool for the target data based on the presentation form;
performing data cleaning and conversion processing on target data to obtain processed data;
and performing visual analysis on the processed data by using a visual processing tool, acquiring an analysis result, and displaying the analysis result.
Preferably, the obtaining the type rule feature matched with each data type based on the type-rule feature database includes:
defining the data field of each type of data;
performing domain feature analysis on the data based on the data domain of each type of data to obtain a first analysis feature;
carrying out standardization processing on the first analysis features of each type of data to obtain second analysis features;
and inputting the second analysis characteristic of each type of data into a data-rule characteristic database to acquire the rule characteristic of the type of data.
Preferably, retrieving the historical application business matching the target data from the service database of each service module includes:
determining historical service applications of the target data based on the corresponding service modules from the service database;
screening out successful application from the history service application, and determining service factors and scene factors of the successful application;
a historical application service of the target data is determined based on the service factor and the scenario factor.
Preferably, before acquiring the target data and dividing the target data into data sets according to the data types and judging the corresponding data sets according to the matching rule characteristics, the method further comprises:
dividing the target data into a plurality of complete sub-data;
determining the data type of each complete sub-data, and selecting a proper service releasing field according to the data type of each complete sub-data;
inputting each complete sub-data into the corresponding put-in service field to obtain a service precision estimation result of the corresponding complete sub-data;
confirming whether the service precision estimation result accords with a data format standard, if so, confirming that the corresponding complete sub-data preliminarily accords with the standard, and if not, confirming that the corresponding complete sub-data does not accord with the standard;
after confirming that the corresponding complete sub-data preliminarily accords with the standard, dividing the corresponding complete sub-data into a plurality of stage data;
singular value decomposition is carried out on the data of each stage to obtain singular data components corresponding to the data of the stage;
acquiring sample entropy of singular data components corresponding to data of each stage;
acquiring data characteristic related parameters of the stage data according to sample entropy of the singular data component corresponding to each stage data;
arranging and combining data characteristic related parameters corresponding to each stage of data to obtain a data characteristic related parameter set corresponding to the target data;
based on a vector analysis model, obtaining a feature vector corresponding to the data feature related parameter set;
constructing a model by utilizing the feature vector to obtain a target data estimation model;
carrying out parameter tracking on the target data by utilizing the target data estimation model to obtain a tracking result;
confirming a correlation index of each data parameter in the target data according to the tracking result;
judging timeliness of the target data according to the correlation index of each data parameter in the target data;
and confirming whether the timeliness is greater than or equal to a preset threshold, if so, confirming that the target data further meets the standard, and carrying out data type division, otherwise, confirming that the target data does not meet the standard, and re-acquiring the target data.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a method for visual analysis of uncertainty data in an embodiment of the present invention;
FIG. 2 is a flow chart of type feature determination for a data set in an embodiment of the invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Example 1:
the invention provides a visual analysis method of uncertainty data, as shown in fig. 1, comprising the following steps:
step 1: acquiring target data, dividing the target data into data sets according to data types, and judging the corresponding data sets according to matching rule characteristics;
step 2: determining a history application service and a history processing means of the target data;
step 3: according to all the judging results and the determining results, carrying out qualitative analysis on the target data;
step 4: if the qualitative analysis result is uncertainty data, acquiring an effective type related to the target data and an analysis requirement of the target data, and selecting a visual analysis mode;
step 5: and carrying out visual analysis on the target data based on a visual analysis mode, and displaying an analysis result.
In this embodiment, the data types are the types used in computer programming to store and process data, and common data types include integer, floating point, character, string.
In this embodiment, the data set is a structure for storing and organizing data, typically for storing data from different sources or types, and allowing processing, analysis, and manipulation of such data, and may be an array, list, dictionary.
In this embodiment, the matching rule features are criteria for describing and matching similar features between two or more data elements, and the matching rule may be based on different features, such as: the length of an element, the range of elements, the data type of an element, the respective data sets may be determined according to type rule characteristics, for example by checking whether each element in the data set complies with a certain type of rule, for example whether it is a numeric type of data set, or whether it complies with a certain string length rule.
In this embodiment, the history application service refers to which service or scene a certain data is used in.
In this embodiment, the history processing means may be data cleansing, data conversion, data grouping.
In this embodiment, qualitative analysis refers to determining whether the data is deterministic or non-deterministic.
Uncertainty data refers to the fact that in a certain process, uncertain factors or random variables exist, and the result is unknown and unpredictable, wherein the uncertain factors are generated by a random number generator or are caused by natural phenomena which are difficult to predict, and the uncertainty data generally comprises probability distribution, statistics, confidence interval and variance.
Deterministic data refers to data where the results are known, predictable, such as experimental data, historical data, mathematical model data.
In this embodiment, the analysis requirement refers to the requirement of determining the final target data when analyzing the target data, such as the requirement of statistics or the requirement of observing fluctuation.
In this embodiment, the visual analysis method includes: line, bar, pie, and scatter plots.
The beneficial effects of the technical scheme are as follows: and judging a data set divided by the data types through the matching rule characteristics, determining historical application business and processing means of the target data, so that qualitative analysis is performed on the target data, and if the target data is uncertain, selecting a visual analysis method according to the effective type of the target data and the analysis requirement of the target data, and selecting a proper visual analysis method according to the data type and the analysis requirement, thereby avoiding the problem that the data under different requirements are required to be processed in the same way according to a specified mode.
Example 2:
the invention provides a visual analysis method of uncertainty data, as shown in fig. 2, the method comprises the steps of obtaining target data, dividing the target data into data sets according to data types, judging the corresponding data sets according to type rule characteristics, and the method comprises the following steps:
s01: analyzing each piece of sub data in the target data based on a feature extraction model, determining the data feature of each piece of sub data, and acquiring the data type of each piece of sub data according to a feature-type mapping table;
s02: dividing all sub-data of the same data type into the same kind of data, and obtaining a data set;
s03: and acquiring rule features matched with each data type based on the data-rule feature database, and judging corresponding data sets.
In this embodiment, the feature extraction model is a technique widely used in the field of machine learning, which refers to automatically extracting features useful for the model from raw data.
In this embodiment, the child data refers to a data type obtained by dividing the target data type in order to further distinguish the data type from the target data, the child data type and the parent data type exist correspondingly, and the parent data type refers to the original data entity to be divided.
In this embodiment, the data features are some attributes or features of the data object or the data set, and these features are useful information in classifying, clustering, dimension reduction and encoding of the data, and common data features include:
numerical characteristics: including integer, floating point number, double precision floating point number, suitable for numeric type data.
Character characteristics: including text strings, are suitable for text-type data.
Boolean features: including boolean values, are applicable to binary data types.
In this embodiment, the feature-type mapping table is a data structure for representing a mapping relationship of data features and data types, and is generally used to store data having multi-type attributes.
In this embodiment, the data types are the types used in computer programming to store and process data, and common data types include integer, floating point, character, string.
In this embodiment, the data set is a structure for storing and organizing data, typically for storing data from different sources or types, and allowing processing, analysis, and manipulation of such data, and may be an array, list, dictionary.
The beneficial effects of the technical scheme are as follows: the data characteristics of each item of sub data in the target data are determined, the sub data are divided into data sets in the same type, data sharing and operation can be achieved, the data type of each item of sub data is determined according to the characteristic-type mapping table, the data sets are judged, the types and the characteristics of the data can be determined rapidly, and a foundation is laid for subsequent qualitative determination of the target data.
Example 3:
the invention provides a visual analysis method of uncertainty data, which determines historical application business and historical processing means of target data, and comprises the following steps:
according to the source log of the target data, tracing to determine the data source of the target data and determining the service module of the target data;
the historical application business matched with the target data is called from a service database of each service module;
performing operation analysis on the historical application service, and determining the service condition and processing logic of the historical application service based on target data;
historical processing means for the target data is determined based on the usage of the target data and processing logic.
In this embodiment, the source log refers to a record of target data generated in a computer system, and generally includes operation information, error information, and warning information.
In this embodiment, the data source refers to which log the target data exists in, such as certain operation process data in the operation information.
In this embodiment, the service module of data refers to a module for processing, storing and managing data, including: the device comprises a data acquisition module, a data processing module and a data storage module.
In this embodiment, the service database is a database management system that provides a method for accessing the database by a database application program, and the service database may use the data in the database in the application program by calling an object or module to connect to the database.
In this embodiment, the processing logic of the data generally refers to a process of processing the data in an application program, and when the data is processed, a series of operations such as reading, writing, modifying and deleting are performed, each operation needs a specific processing logic to determine the final state of the data. For example, when creating a new data object, data initialization is required; when reading data, reading and verifying the data are needed; when modifying data, reading, verifying, updating and writing of data are required.
The beneficial effects of the technical scheme are as follows: the source log of the target data is used for tracing the data source of the target data, and the service module of the target data is determined to call the historical application service of the data from the service database, so that the service condition and the service module of the target data can be rapidly determined, the historical background of the data can be known, further, the historical processing means of the target data can be determined, the difficulty in processing the data can be found, and the data processing efficiency can be improved.
Example 4:
the invention provides a visual analysis method of uncertainty data, which performs qualitative analysis on target data according to all judgment results and determination results, and comprises the following steps:
determining whether the data set is a coarse-grained data set according to all the judging results, and if so, confirming that the target data is uncertainty data;
otherwise, judging whether the target data is subjected to missing value processing or data integration processing, if so, confirming that the target data is uncertain data, and if not, confirming that the target data is deterministic data.
In this embodiment, coarse-grained data refers to data values in which there are deletions, inaccuracies, or errors in the data.
In this embodiment, the data integration process refers to a process of merging multiple data sources or multiple data tables by a certain method, so as to obtain a complete data set.
The beneficial effects of the technical scheme are as follows: whether the data set is the coarse-grained data set or not is judged, if yes, the target data is confirmed to be the uncertainty data, otherwise, whether the target data is processed or not is judged, so that whether the target data is the uncertainty data is further judged, and the accuracy of a judging result can be improved through double judgment.
Example 5:
the invention provides a visual analysis method of uncertainty data, which acquires an effective type related to target data and analysis requirements of the target data to select a visual analysis mode, and comprises the following steps:
if the qualitative analysis result is uncertainty data, acquiring an effective type related to the corresponding target data;
determining the proportion of each type of data in the effective types related to the target data;
reversely deducing the analysis purpose and the analysis target of the target data based on the proportion of the data of each type;
determining analysis requirements of target data based on the analysis purpose and the analysis target, and determining a visual index according to the analysis requirements of the target data;
a visual analysis mode is selected from the index-mode mapping table based on the visual index.
In this embodiment, the valid type may be integer, floating point, character, string.
In this embodiment, the specific gravity is 50% for example, and 25 integer data are included.
In this embodiment, the analysis purpose and analysis target of the reverse inference target data refer to, for example, that the analysis purpose is to determine sales of products, and then care is required for the integer data, and then the analysis target is sales and incomes.
In this embodiment, the index-mode mapping table is a table for describing the mapping relationship between the index and its influencing factors, and is generally used for analyzing the composition of the index, determining which factors have the greatest influence on the index, for example, the relationship between the product cost and the production efficiency to be analyzed, and may be created, where the table may include the following:
the index is as follows: and (5) product cost.
Influence factors: worker wages, raw material costs, equipment maintenance costs.
Weight: worker wages account for 40% of the total cost, raw material costs account for 30% of the total cost, and equipment maintenance costs account for 10% of the total cost.
In this embodiment, acquiring the valid type involved in the corresponding target data includes:
constructing an uncertainty data set according to target data, and representing each uncertainty data object in the uncertainty data set by using a data interval and a statistical analysis mode;
creating an ordered queue and a result queue and correlating the ordered queue with the result queue one by one;
inputting each uncertainty data object into an ordered queue, and acquiring a first output attribute value of the uncertainty data object in a result queue;
calculating the association index between each uncertainty data object and other uncertainty data objects according to the output attribute value of the uncertainty data object in the result queue and the second output attribute value of the other uncertainty data object in the result queue:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Expressed as an association index between the ith uncertainty data object and the jth uncertainty data object,/and->A first output attribute value, denoted as ith uncertainty data object, ">A second output attribute value, denoted j-th uncertainty data object, ">Represented as an associated limiting index factor,expressed as a data type consistency factor between the ith and jth uncertainty data objects, +.>Expressed as data interval distance of the ith uncertainty data object and the jth uncertainty data object in the uncertainty data set, +.>Default data interval reference interval distance expressed as the same type of data,/->A data interval reference interval distance limiting factor expressed as the same type of data;
screening target uncertainty data objects with association indexes greater than or equal to a preset threshold value, and confirming the target uncertainty data objects as effective data of uncertainty data;
and respectively acquiring the current data type corresponding to each piece of effective data, and further carrying out and processing on all the current data types to obtain the effective type related to the corresponding target data.
In this embodiment, statistical analysis refers to detailed analysis of each uncertainty data object using statistical methods, such as descriptive statistics, parameter estimation, to determine the statistical value of each data object.
In this embodiment, the ordered queue is a data input structure, storing attributes corresponding to the data.
In this embodiment, the result queue is a data output structure for storing the result of the data attribute of the processed data, and thus, the associated output value can be directly obtained through the data input.
In this embodiment, the preset threshold is preset.
The beneficial effects of the technical scheme are as follows: the method has the advantages that the composition of the data types can be effectively and rapidly determined by acquiring the effective types related to the uncertainty data and the proportion of various types of data in the effective types, further, the analysis purpose and the analysis target of the data are reversely deduced, the visual index of the target data is determined, and a visual analysis mode is selected, so that the data is more visual and easy to understand, the effective types can be conveniently determined by carrying out index calculation on the uncertainty data, and convenience is brought to the subsequent screening visual analysis mode.
Example 6:
the invention provides a visual analysis method of uncertainty data, which is used for carrying out visual analysis on target data based on a visual analysis mode and displaying analysis results, and comprises the following steps:
determining a display form of the target data based on a visual analysis mode;
determining a visualization processing tool for the target data based on the presentation form;
performing data cleaning and conversion processing on target data to obtain processed data;
and performing visual analysis on the processed data by using a visual processing tool, acquiring an analysis result, and displaying the analysis result.
In this embodiment, the presentation form may be a table, a statistical chart, or a scatter chart.
In this embodiment, the visualization processing tool may be: a data conversion tool, a data visualization tool and a data deriving tool.
The beneficial effects of the technical scheme are as follows: the visual processing tool for the target data is determined through the display form of the target data, the data is subjected to cleaning and conversion processing, the processed data is subjected to visual analysis, the distribution condition of the data, such as the range of the data and the centralized trend of the data, can be rapidly checked, and the characteristics and the trend of the data can be displayed more intuitively.
Example 7:
the invention provides a visual analysis method of uncertainty data, which is used for acquiring type rule features matched with each data type based on a type-rule feature database, and comprises the following steps:
defining the data field of each type of data;
performing domain feature analysis on the data based on the data domain of each type of data to obtain a first analysis feature;
carrying out standardization processing on the first analysis features of each type of data to obtain second analysis features;
and inputting the second analysis characteristic of each type of data into a data-rule characteristic database to acquire the rule characteristic of the type of data.
In this embodiment, the data field refers to a specific meaning or information category included in data, and is generally used to describe an attribute or structure of the data, including a category, structure, relationship, hierarchy, and feature of the data. Such as:
the data field of the text data includes category, topic, keyword and semantic of the text.
The data fields of the image data include category, style, theme, color, texture of the image.
In this embodiment, domain feature analysis refers to analyzing and extracting domain features to facilitate machine learning models to better understand and describe domain data, such as: in the text data, the domain feature analysis can be keywords, topics and emotion information in the text data so as to facilitate text classification and emotion analysis.
In computer vision, domain feature analysis can be used to extract information of categories, features, scenes, etc. in image data, so as to facilitate image classification and object detection.
In this embodiment, data-rule features refer to classifying, clustering or generating rules for domain features, such as: in natural language processing, data-rule features may be used to extract rules in text data, such as naming rules, grammar rules. In computer vision, data-rule features may be used to extract rules, such as shape rules, size rules, in the image data.
The beneficial effects of the technical scheme are as follows: by defining the data field of each type of data, carrying out field feature analysis on each type of data to obtain a first analysis feature, and then carrying out standardization processing to obtain a second analysis feature, the rule feature of each type of data is obtained, the effects of data clustering and classification can be improved, and further, the extraction and verification of rules are facilitated.
Example 8:
the invention provides a visual analysis method of uncertainty data, which is used for calling historical application business matched with target data from a service database of each service module and comprises the following steps:
determining historical service applications of the target data based on the corresponding service modules from the service database;
screening out successful application from the history service application, and determining service factors and scene factors of the successful application;
a historical application service of the target data is determined based on the service factor and the scenario factor.
In this embodiment, the history service application may be processing data, querying data, retrieving data.
In this embodiment, a business factor may be understood as a business feature or a business variable, which generally refers to factors affecting performance of a business under a certain business scenario, where the factors may exist objectively, and may be subjective factors, such as personnel, equipment, processes, markets, and technologies.
In this embodiment, the scenario factor refers to a scenario factor, such as a system version and a software version, that affects the result in what scenario and in the application process the application is successfully applied.
The beneficial effects of the technical scheme are as follows: the historical application service application of the corresponding service module of the target data is improved, the application with successful application is screened out, and the scene factor and the service factor are determined, so that the historical application service of the target data is determined, the service module of the target data can be rapidly determined, meanwhile, the mode and the trend in the data can be found, and the application mode of the target data can be known.
Example 9:
the invention provides a visual analysis method of uncertainty data, which is characterized in that the method comprises the following steps of obtaining target data, dividing the target data into data sets according to data types, and judging the corresponding data sets according to matching rule characteristics:
dividing the target data into a plurality of complete sub-data;
determining the data type of each complete sub-data, and selecting a proper service releasing field according to the data type of each complete sub-data;
inputting each complete sub-data into the corresponding put-in service field to obtain a service precision estimation result of the corresponding complete sub-data;
confirming whether the service precision estimation result accords with a data format standard, if so, confirming that the corresponding complete sub-data preliminarily accords with the standard, and if not, confirming that the corresponding complete sub-data does not accord with the standard;
after confirming that the corresponding complete sub-data preliminarily accords with the standard, dividing the corresponding complete sub-data into a plurality of stage data;
singular value decomposition is carried out on the data of each stage to obtain singular data components corresponding to the data of the stage;
acquiring sample entropy of singular data components corresponding to data of each stage;
acquiring data characteristic related parameters of the stage data according to sample entropy of the singular data component corresponding to each stage data;
arranging and combining data characteristic related parameters corresponding to each stage of data to obtain a data characteristic related parameter set corresponding to the target data;
based on a vector analysis model, obtaining a feature vector corresponding to the data feature related parameter set;
constructing a model by utilizing the feature vector to obtain a target data estimation model;
carrying out parameter tracking on the target data by utilizing the target data estimation model to obtain a tracking result;
confirming a correlation index of each data parameter in the target data according to the tracking result;
judging timeliness of the target data according to the correlation index of each data parameter in the target data;
and confirming whether the timeliness is greater than or equal to a preset threshold, if so, confirming that the target data further meets the standard, and carrying out data type division, otherwise, confirming that the target data does not meet the standard, and re-acquiring the target data.
In this embodiment, selecting a suitable service delivery area is mainly based on data analysis, and the suitable delivery area is determined according to data characteristics and potential requirements of different data types, for example, selecting a target audience according to user characteristics and requirements, and selecting a popular product according to product characteristics and requirements.
In this embodiment, the traffic accuracy estimation refers to the accuracy of the estimation or prediction made on the sub-data. Are often affected by many factors, such as market changes, competitive situations, corporate policies.
In this embodiment, the data format standard includes data structure, data type, data exchange format, data storage format, data access and processing rule, and may be different formats, such as XML, JSON, CSV, XMLRPC, etc., according to the requirements of the specific application field.
In this embodiment, singular magnitude decomposition refers to automatically detecting abnormal or heterogeneous behavior in a dataset while processing a large amount of data.
In this embodiment, the singular data component refers to that in component analysis, the distribution of a certain data component is different from other components, or the value range, the order of magnitude, the positive and negative values, etc. of the data component are different from other components, and the singular data component may be caused by the irregularity of the data distribution, or the difference of the value range or the order of magnitude of the data, so that the statistical analysis result is inaccurate, and different assumptions or conclusions are generated.
In this embodiment, the sample entropy is an indicator of the uncertainty and dispersion of the data set.
In this embodiment, the data feature-related parameters generally refer to parameters used to describe the data feature in machine learning, such as:
the characteristics are as follows: features are one or more variables in the data that can be used to describe certain properties of the data, and in an image recognition task, pixel values in an image can be considered features.
Characteristic engineering: feature engineering refers to the conversion and extraction of original data to create new features, and in natural language processing tasks, word vectorization and other technologies can be used to extract features of text data.
In this embodiment, for each stage of data, the data feature related parameters of each stage need to be extracted and stored in a list or array, and then the data sets of each stage are arranged and combined, which may be that the data feature related parameter list in the data set of each stage is combined to obtain a list or array, which represents the feature related parameters of all stage data sets, and finally the result needs to be stored in a new data set, where the new data set is the data feature related parameter set corresponding to the target data set.
In this embodiment, the vector analysis model is a statistical method for analyzing multivariate data, and the relationship between variables is mainly disclosed by comparing correlation coefficients between the variables to predict, and common applications of vector analysis include, for example, cluster analysis, factor analysis, and multiple regression analysis.
In this embodiment, the feature vector corresponding to the data feature related parameter set refers to a feature vector formed by extracting the data feature related parameters in each stage data set and then linearly combining the feature parameters, where the feature vector represents the relevance and direction of the data features in each stage data set.
In this embodiment, the target data estimation model refers to a model for predicting a target data set, which refers to a target data set that needs to be predicted, such as a medical data set, a financial data set.
In this embodiment, parameter tracking is a technique for tracking each parameter in a program, and by parameter tracking, the values and changes of each parameter when the program performs different operations can be known.
In this embodiment, the correlation index is a statistic for measuring the strength of the linear relationship between two variables, and the value ranges from-1 to 1, and the closer to 1 is that the linear relationship between two variables is stronger, the closer to-1 is that the linear relationship between two variables is weaker, and the closer to 0 is that the linear relationship between two variables is not.
In this embodiment, data timeliness refers to the degree and speed of change over time, which is typically used to describe how fast and how slow data changes and the data update period.
The beneficial effects of the technical scheme are as follows: the method comprises the steps of selecting a proper service delivery field through a data type of complete sub-data, acquiring a service precision estimation result, judging whether the complete sub-data meets a data preliminary meeting standard, screening out data meeting the data format standard, reducing data quantity, further dividing the corresponding complete sub-data into a plurality of stage data, carrying out singular value decomposition on each stage data to obtain singular data components corresponding to the stage data, determining whether the distribution of the data is regular, determining the influence on the data, simultaneously, arranging and combining data characteristic related parameters corresponding to each stage data to obtain a data characteristic related parameter set corresponding to target data, constructing a target data estimation model, tracking the parameters, judging timeliness of the target data, selecting optimized data for the division of the following data types, and improving the division quality.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (6)

1. A method for visual analysis of uncertainty data, the method comprising:
step 1: acquiring target data, dividing the target data into data sets according to data types, and judging the corresponding data sets according to matching rule characteristics;
step 2: according to all the judging results, carrying out qualitative analysis on the target data;
step 3: if the qualitative analysis result is uncertainty data, acquiring an effective type related to the target data and an analysis requirement of the target data, and selecting a visual analysis mode;
step 4: performing visual analysis on the target data based on a visual analysis mode, and displaying analysis results;
wherein, step 3 includes:
if the qualitative analysis result is uncertainty data, acquiring an effective type related to the corresponding target data;
determining the proportion of each type of data in the effective types related to the target data;
reversely deducing the analysis purpose and the analysis target of the target data based on the proportion of the data of each type;
determining analysis requirements of target data based on the analysis purpose and the analysis target, and determining a visual index according to the analysis requirements of the target data;
selecting a visual analysis mode from the index-mode mapping table based on the visual index;
the method for acquiring the valid types related to the corresponding target data comprises the following steps:
constructing an uncertainty data set according to target data, and representing each uncertainty data object in the uncertainty data set by using a data interval and a statistical analysis mode;
creating an ordered queue and a result queue and correlating the ordered queue with the result queue one by one;
inputting each uncertainty data object into an ordered queue, and acquiring a first output attribute value of the uncertainty data object in a result queue;
calculating the association index between each uncertainty data object and other uncertainty data objects according to the output attribute value of the uncertainty data object in the result queue and the second output attribute value of the other uncertainty data object in the result queue:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Expressed as an association index between the ith uncertainty data object and the jth uncertainty data object,/and->A first output attribute value represented as the ith uncertainty data object,a second output attribute value, denoted j-th uncertainty data object, ">Expressed as an associated restriction index factor,>expressed as a data type consistency factor between the ith and jth uncertainty data objects, +.>Expressed as data interval distance of the ith uncertainty data object and the jth uncertainty data object in the uncertainty data set, +.>Represented as the sameDefault data interval reference interval distance of type data, +.>A data interval reference interval distance limiting factor expressed as the same type of data;
screening target uncertainty data objects with association indexes greater than or equal to a preset threshold value, and confirming the target uncertainty data objects as effective data of uncertainty data;
and respectively acquiring the current data type corresponding to each piece of effective data, and further merging all the current data types to obtain the effective type related to the corresponding target data.
2. The visual analysis method of uncertainty data according to claim 1, wherein acquiring target data and dividing the target data into data sets according to data types, and determining corresponding data sets according to type rule features, comprises:
analyzing each piece of sub data in the target data based on a feature extraction model, determining the data feature of each piece of sub data, and acquiring the data type of each piece of sub data according to a feature-type mapping table;
dividing all sub-data of the same data type into the same kind of data, and obtaining a data set;
and acquiring rule features matched with each data type based on the data-rule feature database, and judging corresponding data sets.
3. The visual analysis method of uncertainty data according to claim 1, wherein the qualitative analysis of the target data based on all the determination results comprises:
determining whether the data set is a coarse-grained data set according to all the judging results, and if so, confirming that the target data is uncertainty data;
otherwise, judging whether the target data is subjected to missing value processing or data integration processing, if so, confirming that the target data is uncertain data, and if not, confirming that the target data is deterministic data.
4. The method for visual analysis of uncertainty data according to claim 1, wherein the visual analysis of the target data based on the visual analysis method and the presentation of the analysis result comprise:
determining a display form of the target data based on a visual analysis mode;
determining a visualization processing tool for the target data based on the presentation form;
performing data cleaning and conversion processing on target data to obtain processed data;
and performing visual analysis on the processed data by using a visual processing tool, acquiring an analysis result, and displaying the analysis result.
5. The visual analysis method of uncertainty data of claim 2, wherein obtaining type rule features matching each data type based on a type-rule feature database comprises:
defining the data field of each type of data;
performing domain feature analysis on the data based on the data domain of each type of data to obtain a first analysis feature;
carrying out standardization processing on the first analysis features of each type of data to obtain second analysis features;
and inputting the second analysis characteristic of each type of data into a data-rule characteristic database to acquire the rule characteristic of the type of data.
6. The visual analysis method of uncertainty data according to claim 1, wherein before acquiring the target data and dividing the target data into data sets according to data types and determining the corresponding data sets according to matching rule features, further comprising:
dividing the target data into a plurality of complete sub-data;
determining the data type of each complete sub-data, and selecting a proper service releasing field according to the data type of each complete sub-data;
inputting each complete sub-data into the corresponding put-in service field to obtain a service precision estimation result of the corresponding complete sub-data;
confirming whether the service precision estimation result accords with a data format standard, if so, confirming that the corresponding complete sub-data preliminarily accords with the standard, and if not, confirming that the corresponding complete sub-data does not accord with the standard;
after confirming that the corresponding complete sub-data preliminarily accords with the standard, dividing the corresponding complete sub-data into a plurality of stage data;
singular value decomposition is carried out on the data of each stage to obtain singular data components corresponding to the data of the stage;
acquiring sample entropy of singular data components corresponding to data of each stage;
acquiring data characteristic related parameters of the stage data according to sample entropy of the singular data component corresponding to each stage data;
arranging and combining data characteristic related parameters corresponding to each stage of data to obtain a data characteristic related parameter set corresponding to the target data;
based on a vector analysis model, obtaining a feature vector corresponding to the data feature related parameter set;
constructing a model by utilizing the feature vector to obtain a target data estimation model;
carrying out parameter tracking on the target data by utilizing the target data estimation model to obtain a tracking result;
confirming a correlation index of each data parameter in the target data according to the tracking result;
judging timeliness of the target data according to the correlation index of each data parameter in the target data;
and confirming whether the timeliness is greater than or equal to a preset threshold, if so, confirming that the target data further meets the standard, and carrying out data type division, otherwise, confirming that the target data does not meet the standard, and re-acquiring the target data.
CN202311774201.2A 2023-12-22 2023-12-22 Visual analysis method for uncertainty data Active CN117453805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311774201.2A CN117453805B (en) 2023-12-22 2023-12-22 Visual analysis method for uncertainty data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311774201.2A CN117453805B (en) 2023-12-22 2023-12-22 Visual analysis method for uncertainty data

Publications (2)

Publication Number Publication Date
CN117453805A CN117453805A (en) 2024-01-26
CN117453805B true CN117453805B (en) 2024-03-15

Family

ID=89591382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311774201.2A Active CN117453805B (en) 2023-12-22 2023-12-22 Visual analysis method for uncertainty data

Country Status (1)

Country Link
CN (1) CN117453805B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081764A (en) * 2011-01-11 2011-06-01 上海海洋大学 ULDB (Databases with Uncertainty and Lineage)-based marine environmental monitored data management system
CN102800128A (en) * 2012-08-09 2012-11-28 中国人民解放军信息工程大学 Method for establishing geomorphic description precision model by utilizing gplotmatrix and regression analysis
CN106919755A (en) * 2017-03-01 2017-07-04 清华大学 A kind of cloud manufacture system uncertainty quantitative analysis method and device based on data
CN113314100A (en) * 2021-07-29 2021-08-27 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for evaluating and displaying results of spoken language test
CN115757689A (en) * 2022-09-21 2023-03-07 中国人民解放军军事科学院军事科学信息研究中心 Information query system, method and equipment
CN116796233A (en) * 2023-06-30 2023-09-22 北京字跳网络技术有限公司 Data analysis method, data analysis device, computer readable medium and electronic equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951682B8 (en) * 2017-03-01 2019-05-10 大连理工大学 One kind is based on probabilistic Mountain Area hydrological model and data precision matching process
CN110441823B (en) * 2019-08-09 2021-06-01 浙江财经大学 Stratum contrast uncertainty visualization method based on multi-source data fusion
CN111024484B (en) * 2019-11-28 2021-07-13 上海交通大学 Method for predicting random mechanical property of fiber reinforced composite material
CN114036347B (en) * 2021-11-18 2022-06-03 北京中关村软件园发展有限责任公司 Cloud platform supporting digital fusion service and working method
CN114742289A (en) * 2022-03-31 2022-07-12 大连理工大学 Gaussian process robust optimization method for production process parameters
CN116415840B (en) * 2023-02-02 2023-12-05 北京三维天地科技股份有限公司 Automatic index early warning method and system based on machine learning model
CN116756616A (en) * 2023-06-26 2023-09-15 北京字跳网络技术有限公司 Data processing method, device, computer readable medium and electronic equipment
CN116993298A (en) * 2023-08-25 2023-11-03 中原工学院 Information collaborative management method and system based on manufacturing industry complete value chain
CN117114206B (en) * 2023-10-23 2024-01-26 北京联创高科信息技术有限公司 Calculation method for coal mine water damage index data trend

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081764A (en) * 2011-01-11 2011-06-01 上海海洋大学 ULDB (Databases with Uncertainty and Lineage)-based marine environmental monitored data management system
CN102800128A (en) * 2012-08-09 2012-11-28 中国人民解放军信息工程大学 Method for establishing geomorphic description precision model by utilizing gplotmatrix and regression analysis
CN106919755A (en) * 2017-03-01 2017-07-04 清华大学 A kind of cloud manufacture system uncertainty quantitative analysis method and device based on data
CN113314100A (en) * 2021-07-29 2021-08-27 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for evaluating and displaying results of spoken language test
CN115757689A (en) * 2022-09-21 2023-03-07 中国人民解放军军事科学院军事科学信息研究中心 Information query system, method and equipment
CN116796233A (en) * 2023-06-30 2023-09-22 北京字跳网络技术有限公司 Data analysis method, data analysis device, computer readable medium and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Visualization and Visual Knowledge Discovery from Big Uncertain Data;Carson K. Leung等;《2022 26th International Conference Information Visualisation (IV)》;20230123;全文 *
不确定性数据聚类算法及其并行化研究;何少元;中国优秀硕士学位论文全文数据库 信息科技辑;20200215;全文 *
多维数据的不确定性可视相关分析;张怡等;计算机辅助设计与图形学学报;20180615;第第30卷卷(第第6期期);全文 *

Also Published As

Publication number Publication date
CN117453805A (en) 2024-01-26

Similar Documents

Publication Publication Date Title
JP7090936B2 (en) ESG-based corporate evaluation execution device and its operation method
WO2018184518A1 (en) Microblog data processing method and device, computer device and storage medium
JPH0877010A (en) Method and device for data analysis
CN111427974A (en) Data quality evaluation management method and device
CN116894152B (en) Multisource data investigation and real-time analysis method
CN101000624A (en) Method, system and device for implementing data mining model conversion and application
CN116542800A (en) Intelligent financial statement analysis system based on cloud AI technology
CN116049379A (en) Knowledge recommendation method, knowledge recommendation device, electronic equipment and storage medium
CN117453764A (en) Data mining analysis method
Goyle et al. Dataassist: A machine learning approach to data cleaning and preparation
CN117764724A (en) Intelligent credit rating report construction method and system
CN112069314B (en) Specific field situation analysis system based on scientific and technical literature data
CN117726166A (en) Artificial intelligence enterprise customer risk information analysis and evaluation method and system based on large language model
CN110874366A (en) Data processing and query method and device
CN117592450A (en) Panoramic archive generation method and system based on employee information integration
CN116933130A (en) Enterprise industry classification method, system, equipment and medium based on big data
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
CN117453805B (en) Visual analysis method for uncertainty data
CN116595418A (en) Multi-dimensional image construction method for scientific and technological achievements
CN116523301A (en) System for predicting risk rating based on big data of electronic commerce
CN113420153B (en) Topic making method, device and equipment based on topic library and event library
CN115034762A (en) Post recommendation method and device, storage medium, electronic equipment and product
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
Yalaoui et al. A survey on data quality: principles, taxonomies and comparison of approaches
CN117556118B (en) Visual recommendation system and method based on scientific research big data prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant