CN113743506A - Data processing method and device and electronic equipment - Google Patents

Data processing method and device and electronic equipment Download PDF

Info

Publication number
CN113743506A
CN113743506A CN202111037955.0A CN202111037955A CN113743506A CN 113743506 A CN113743506 A CN 113743506A CN 202111037955 A CN202111037955 A CN 202111037955A CN 113743506 A CN113743506 A CN 113743506A
Authority
CN
China
Prior art keywords
data
feature
characteristic
display
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111037955.0A
Other languages
Chinese (zh)
Inventor
张俊丽
王奇刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202111037955.0A priority Critical patent/CN113743506A/en
Publication of CN113743506A publication Critical patent/CN113743506A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces

Abstract

The application discloses a data processing method, a data processing device and electronic equipment, wherein the method comprises the following steps: obtaining a plurality of data to be analyzed; determining the characteristic value of various data characteristics of each data; according to the characteristic values of various data characteristics of the data, constructing a characteristic display diagram of the data, wherein the characteristic display diagram comprises a plurality of characteristic display branches, each characteristic display branch is used for representing one data characteristic, and the branch length of each characteristic display branch can represent the characteristic value of the data characteristic corresponding to the characteristic display branch; a characteristic presentation graph for each data is presented. The scheme of the application can more intuitively show the difference of the multiple data on different characteristics.

Description

Data processing method and device and electronic equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, and an electronic device.
Background
In the big data era, application scenarios of data mining and analysis are increasing.
In a data application scenario, a data consumer often needs to know the difference between different data. However, in the case of a large amount of data, it is difficult for a data user to intuitively and quickly know the difference between different data.
Disclosure of Invention
The application provides a data processing method and device and electronic equipment.
The data processing method comprises the following steps:
obtaining a plurality of data to be analyzed;
determining a characteristic value of a plurality of data characteristics possessed by each piece of the data;
according to the characteristic values of various data characteristics of the data, constructing a characteristic display diagram of the data, wherein the characteristic display diagram comprises a plurality of characteristic display branches, each characteristic display branch is used for representing one data characteristic, and the branch length of each characteristic display branch can represent the characteristic value of the data characteristic corresponding to the characteristic display branch;
a characteristic display plot is presented for each of the data.
In a possible implementation manner, the obtaining multiple copies of data to be analyzed includes:
obtaining a plurality of data cluster sets clustered by data sets to be analyzed, wherein each data cluster set comprises at least one piece of data clustered into the same category;
the determining the characteristic value of the plurality of data characteristics of each piece of the data comprises the following steps:
and determining the characteristic value of each data feature of each data cluster set, wherein the characteristic value of each data feature of the data cluster sets is the average value of the characteristic values of the at least one piece of data on the data feature in the data cluster sets.
In another possible implementation manner, the method further includes:
obtaining the importance degree of the data characteristics;
the constructing of the feature display graph of the data according to the feature values of the plurality of data features of the data comprises:
and constructing a feature display graph of the data according to the feature values of various data features of the data and the importance degree of the data features.
In another possible implementation manner, the constructing a feature display diagram of the data according to the feature values of the plurality of data features of the data and the importance degrees of the data features includes:
determining the number of the feature display branches in the feature display graph of the data and the length of each feature display branch according to the feature values of various data features of the data;
according to the importance degree of the data features characterized by the feature display branches, the number of the feature display branches in the feature display diagram of the data and the length of each feature display branch, constructing the feature display diagram, wherein the feature display diagram has a plurality of feature display branches which are sequentially ordered, and the higher the importance degree of the data features characterized by the feature display branches is, the higher the ordering order of the feature display branches is.
In another possible implementation manner, the obtaining the importance degree of the data feature includes:
obtaining the importance degree of the data characteristics set by a user;
alternatively, the first and second electrodes may be,
and combining the characteristic values of the multiple data characteristics of each piece of data, and determining the importance degree of each data characteristic based on a principal component analysis algorithm.
In yet another possible implementation manner, the determining the characteristic value of the plurality of data characteristics that each piece of the data has includes:
obtaining characteristic values of various original data characteristics of each piece of data;
and combining the characteristic values of the multiple original data characteristics of each data, and performing dimensionality reduction on the multiple original data characteristics to obtain the characteristic values of the multiple data characteristics of each data.
In another possible implementation manner, before performing the dimension reduction processing on the multiple original data features by combining feature values of the multiple original data features of each piece of data, the method further includes:
determining at least one raw data feature group selected and combined by a user from the multiple raw data features, wherein each raw data feature group comprises at least two raw data features;
the combining the characteristic values of the multiple original data characteristics of each data to perform dimensionality reduction processing on the multiple original data characteristics to obtain the characteristic values of the multiple data characteristics of each data includes:
and performing dimensionality reduction on the multiple original data features based on the at least one original data feature group and the feature values of the multiple original data features of each piece of data.
In another possible implementation manner, before constructing the feature display graph of the data according to the feature values of the plurality of data features of the data, the method further includes:
and normalizing the characteristic value of the plurality of data on each data characteristic.
Wherein, a data processing device comprises:
a data obtaining unit for obtaining a plurality of data to be analyzed;
a feature determination unit configured to determine a feature value of a plurality of kinds of data features that each of the data has;
the graph construction unit is used for constructing a feature display graph of the data according to feature values of various data features of the data, wherein the feature display graph comprises a plurality of feature display branches, each feature display branch is used for representing one data feature, and the branch length of each feature display branch can represent the feature value of the data feature corresponding to the feature display branch;
and the graph display unit is used for displaying the characteristic display graph of each piece of data.
Wherein, an electronic equipment includes:
a memory and a processor;
the processor is used for executing the data processing method;
the memory is used for storing programs needed by the processor to execute the operation.
According to the scheme, after a plurality of data to be analyzed are obtained, the characteristic values of a plurality of data characteristics of each data are determined. On this basis, a feature display diagram of the data can be constructed according to feature values of various data features of the data, and because the feature display diagram of the data shows the feature display branches corresponding to the various data features respectively, and the branch lengths of the feature display branches can represent the feature values of the data features corresponding to the feature display branches, the differences of different data on different data features can be intuitively reflected through the feature display diagram of each data, so that a user can quickly know the differences among different data.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a feature display diagram provided by an embodiment of the present application;
FIG. 4 is a schematic view of a feature display diagram provided in an embodiment of the present application;
FIG. 5 is a schematic illustration of a feature presentation graph of a plurality of data presented in an embodiment of the present application;
fig. 6 is a schematic flowchart of another data processing method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic diagram of a composition architecture of an electronic device according to an embodiment of the present disclosure.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present disclosure.
As shown in fig. 1, which shows a schematic flow chart of a data processing method provided in an embodiment of the present application, the method of the present embodiment may include:
and S101, obtaining a plurality of data to be analyzed.
It will be appreciated that the specific data format for each piece of data to be analyzed may be varied, depending on the context in which the data is analyzed.
For example, each piece of data may be structured data, e.g., each piece of data is a data record in a data table.
As another example, each piece of data may be an image, i.e., image data.
It can be understood that the multiple pieces of data to be analyzed are data of the same type, for example, the multiple pieces of data are all access data of different users to a certain website. Accordingly, each piece of data is in the same data form, e.g., multiple data records to be analyzed or multiple images to be analyzed may be obtained. Of course, if the data in the plurality of copies is in different data forms, the same applies to the present application, and no limitation is imposed on this.
S102, determining characteristic values of various data characteristics of each data.
Wherein, each data has a plurality of data characteristics which are the characteristics of the data on a plurality of different dimensions.
For the sake of comparison, the data characteristics of any one of the different data sets are the same in the same kind. For any one piece of data, the data characteristics of the data can be subjected to characteristic extraction through a plurality of pieces of data, and finally, the various data characteristics of each piece of data are determined. Of course, it is also possible to manually specify various data characteristics that each data has.
There are many possibilities for the data to have data characteristics. For example, in the case that each piece of data is a piece of structured data, each attribute in the structured data can be a data feature of the structured data, and the attribute value of each attribute is the attribute value of the data feature. For example, one piece of structured data is basic information of the notebook, and the basic information may include data in different dimensions, such as a memory, a CPU, and a graphics card, of the notebook, so that specific values in three dimensions, such as the memory, the CPU, and the graphics card, of the notebook may be extracted.
For another example, in the case that each piece of data is an image, the data characteristics of the image may be the characteristics of color, gradient, and chromatic aberration of the image.
It can be understood that, since each piece of data has a plurality of data characteristics, a characteristic value of the piece of data on the plurality of data characteristics needs to be determined for each piece of data.
For example, for a piece of data a, the feature value of data feature 1 and the feature value of data feature 2 need to be determined, and correspondingly, for a piece of data B, the feature value of data feature 1 and the feature value of data feature 2 also need to be determined.
S103, constructing a feature display diagram of the data according to the feature values of the multiple data features of the data.
The feature display graph comprises a plurality of feature display branches, each feature display branch is used for representing one data feature, and the branch length of each feature display branch can represent the feature value of the data feature corresponding to the feature display branch.
For example, assuming that each piece of data has 5 data features, the constructed feature representation map needs to include 5 feature representation branches, and each feature representation branch represents one of the 5 data features.
The branch length of the feature display branch corresponds to the size of the feature value of the data feature identified by the feature display branch, so that when the feature values of the data feature are different, the branch lengths of the feature display branches representing the data feature are different.
For example, in one possible case, the branch length of the feature representation branch has a direct relationship with the magnitude of the feature value of the data feature represented by the feature representation branch. Correspondingly, for a data feature of a piece of data, the larger the feature value of the data feature, the longer the branch length of the feature display branch for the data feature.
Of course, it is described here by way of example that, if the branch length of the feature display branch is shorter, the feature value indicating the data feature corresponding to the feature display branch is larger, and the same is also applied to the present application.
It can be understood that feature display branches corresponding to various data features of the data are displayed through the feature display graph, and the branch lengths of the feature display branches can represent feature values of the corresponding data features, so that the feature values of each piece of data on the plurality of data features can be intuitively understood through the feature display graph of each piece of data.
It will be appreciated that specific forms of the feature presentation figures may be made possible in the present application. For example, the feature display map may include a set of bar charts, each set of bar charts includes a plurality of bars, each bar is a feature display branch, and a length (also referred to as a height) of each bar is used to characterize a feature value of a data feature corresponding to the bar.
As another example, each feature display may be a snowflake-shaped feature display exhibiting a snowflake morphology, as shown in fig. 3, which illustrates a schematic diagram of a snowflake-shaped feature display provided by the present application. As can be seen from FIG. 3, the appearance of the display picture is similar to a snowflake in the snowflake shape feature. The snowflake-shaped feature display comprises a plurality of branches which are around the center and diverge to the periphery, wherein the branches comprise a branch 301, a branch 302, a branch 303, a branch 304, a branch 305 and a branch 306, each branch is just like a prism of a snowflake, each branch is a feature display branch for representing a data feature, and the length of each branch reflects the size of a feature value of the data feature corresponding to the branch.
And S104, displaying a characteristic display diagram of each datum.
In step S104, the feature display graph of each of the multiple data sets can be displayed at the same time, so that the user can intuitively know the specific situation of the feature value of each data set on different data features according to the feature display graph of each data set, and can intuitively know the difference situation of the feature value of each data set on different data features.
From the above, after obtaining multiple data sets to be analyzed, the present application determines characteristic values of multiple data features of each data set. On this basis, a feature display diagram of the data can be constructed according to feature values of various data features of the data, and because the feature display diagram of the data shows the feature display branches corresponding to the various data features respectively, and the branch lengths of the feature display branches can represent the feature values of the data features corresponding to the feature display branches, the differences of different data on different data features can be intuitively reflected through the feature display diagram of each data, so that a user can quickly know the differences among different data.
It can be understood that, in order to enable the user to intuitively know the importance degree of different data characteristics, the importance degree of each data characteristic can be obtained. In addition, when constructing the feature representation map of the data, the feature representation map of the data may be constructed according to the feature values of the plurality of data features of the data and the importance degree of each data feature.
Wherein, the constructed feature display diagram can represent the importance degree of various data features.
In one possible case, the appearance of the feature display branch in the feature display map can represent the importance degree of the data feature corresponding to the feature display branch. For example, the degree of importance of the data feature corresponding to the feature display branch is characterized by the color depth of the feature display branch, wherein the darker the color of the feature display branch, the higher the degree of importance of the data feature corresponding to the feature display branch is. For another example, the importance degree of the data feature corresponding to the feature display branch is represented by the thickness degree of the feature display branch, wherein the thicker the feature display branch is, the higher the importance degree of the data feature corresponding to the feature display branch is.
In yet another possible case, the arrangement of the plurality of feature display branches representing the plurality of data features in the feature display diagram may characterize the importance degree of the data feature corresponding to the feature display branch.
For example, the feature branch corresponding to the more important data feature is shown in the central area of the feature display diagram.
For another example, the constructed feature display diagram has a plurality of feature display branches which are sequentially ordered, and the higher the importance degree of the data feature represented by each feature display branch is, the more forward the ordering order of the feature display branches is. For ease of understanding, the data processing method of the present application will be described below by taking this implementation as an example.
As shown in fig. 2, which shows another schematic flow chart of the data processing method of the present application, the method of this embodiment may include:
s201, obtaining a plurality of data to be analyzed.
S202, determining characteristic values of various data characteristics of each data.
The above steps S201 to S202 can refer to the related description of the previous embodiment, and are not described herein again.
S203, obtaining the importance degree of each data characteristic.
The importance degree of the data features can represent the importance degree of the data features having influence on the data, for example, the data features with higher importance degree can reflect the essential features of the data.
The specific manner of determining the importance of the data features may be various. In one possible case, the degree of importance of each of the plurality of data features may be set by the user, and accordingly, the degree of importance of each of the data features set by the user may be obtained. This situation is applicable to a scenario in which a user can specify data features of important interest.
In yet another possible case, feature values of a plurality of data features of each data may be combined, and the degree of importance of each raw data feature may be determined based on a principal component analysis algorithm.
The Principal Component Analysis (PCA) algorithm is a data dimension reduction method, and the basic principle is to find the most important aspect in data and replace the original data with the most important aspect in data, so that the Principal component analysis algorithm can analyze the order of importance of each data feature.
S204, aiming at each piece of data, determining the number of the characteristic display branches in the characteristic display diagram of the piece of data and the length of each characteristic display branch according to the characteristic values of various data characteristics of the piece of data.
For example, based on the number of the plurality of data features in each data, the feature display branches required to have the same data in the feature display diagram are determined, for example, 8 feature display branches are required to have 8 feature display branches in the feature display diagram.
In the present application, the length of a feature representation branch is related to the feature value of the data feature of the feature representation branch. There are many possibilities for the association between the length of the feature representation branch and the feature value that the data feature represented by the feature representation branch has.
For example, in one possible implementation, the larger the feature value a data feature has, the longer the length of the feature presentation branch representing that data feature. For example, the length required for the feature display branch may be determined according to the correspondence between different feature value intervals and lengths and the feature value of the data feature corresponding to the feature display branch, where the greater the feature value corresponding to the feature value interval, the longer the length corresponding to the feature value interval.
S205, constructing a feature display diagram according to the importance degree of the data features represented by the feature display branches, the number of the feature display branches in the feature display diagram of the data and the length of each feature display branch.
The feature display diagram is provided with a plurality of feature display branches which are sequentially ordered, and the higher the importance degree of the data features represented by the feature display branches is, the more the ordering sequence of the feature display branches is.
For example, after determining the number of feature display branches required to be present in the feature display diagram and the length of each feature display branch, a plurality of feature display branches in the feature display diagram can be constructed. Then, according to the sequence of the importance degree of the data characteristics represented by the characteristic display branches from high to low, the characteristic display branches are sequentially ranked, and the characteristic display graph is obtained.
There may be many possibilities for the ordering of the feature display branches in the feature display diagram.
For example, in a possible implementation manner, the plurality of feature display branches in the feature display diagram may be arranged in parallel to one another in the same row, and the order of the plurality of feature display branches is consistent with the order of importance of the plurality of data features corresponding to the plurality of feature display branches from high to low.
As shown in fig. 4, which shows yet another schematic of a feature display of the data in the present application. In fig. 4, one piece of data has 3 data features, which are, in turn, feature a, feature b, and feature c, and the three data features are, in order from high to low, feature a, feature b, and feature c, so that in fig. 3, the feature display branches corresponding to the data features are sequentially arranged in the same row, and the order is: a feature display branch 401 corresponding to the feature c, a feature display branch 402 corresponding to the feature b, and a feature display branch 403 corresponding to the feature a.
As another example, in yet another possible implementation, the feature display branches may be snowflake-shaped feature display diagrams as shown in fig. 3. In this case, the plurality of feature display branches in the feature display diagram may be sequentially ordered clockwise with the positive direction of the horizontal coordinate axis as the starting direction, and the more advanced the importance degree of the data feature corresponding to the feature display branch is, the more advanced the ordering of the feature display branch is. The positive direction of the horizontal coordinate axis is the horizontal rightward direction in fig. 3.
In this implementation, the plurality of feature display branches surround the snowflake center at 0 degree to 360 degrees, and therefore, the more important the data features corresponding to the feature display branches are, the smaller the included angle between the feature display branch and the positive direction of the horizontal coordinate axis is, that is, the more forward the ranking is.
As shown in fig. 3, in the scenario of fig. 3, it is assumed that one data has 6 data features, which are respectively referred to as feature 1, feature 2, feature 3, feature 4, feature 5, and feature 6, and accordingly, feature display branches respectively corresponding to the 6 data features need to be constructed, which sequentially include: branch 301 corresponding to feature 1, branch 301 corresponding to feature 2, branch 303 corresponding to feature 3, branch 304 corresponding to feature 4, branch 305 corresponding to feature 5, and branch 306 corresponding to feature 6.
It is assumed that the importance degrees of the 6 data features are ranked from high to low as feature 6, feature 4, feature 3, feature 1, feature 2, and feature 5 in order. Then in the feature display diagram shown in fig. 3, the feature display branch corresponding to feature 6, i.e. branch 306, is in the positive direction of the horizontal coordinate axis, i.e. the included angle with the horizontal coordinate axis is 0 degrees. On this basis, the remaining 5 feature display branches may be arranged in order of the high and low degrees of importance of the remaining 5 features in the clockwise direction, and as shown in fig. 3, after the branch 306, the branch 304, the branch 303, the branch 301, the branch 302, and the branch 305 are arranged in order in the clockwise direction.
It should be understood that, the above description is made by taking several sorting manners of the feature display branches in the feature display diagram as examples, in practical applications, there may be other possibilities for sorting and arranging the plurality of feature display branches in the feature display diagram, and this is not limited thereto.
And S206, displaying a characteristic display diagram of each datum.
As shown in fig. 5, a schematic of a feature presentation graph exhibiting different sets of data is shown. In fig. 5, the characteristic display diagram of each data is illustrated as a snowflake-shaped characteristic display diagram, and in fig. 5, the characteristic display diagram of 5 data is illustrated, so that the characteristic display diagram is illustrated by five snowflakes in fig. 5. The data features characterized by the various feature display branches in each snowflake-shaped feature display map are fixed and known. On the basis, the characteristic value size of different data on the data characteristic corresponding to the characteristic showing branch can be known according to the length of the characteristic showing branch at the same position in the characteristic showing diagram of different data. Meanwhile, according to the arrangement sequence of each characteristic display branch in the characteristic display diagram of the data, the importance degree of each data characteristic in the data can be known, and the user can conveniently perform comprehensive comparison and peer-to-peer by combining with the data characteristic with higher importance.
It can be understood that, in this embodiment, the order of the plurality of feature display branches of the feature display diagram is determined according to the importance degree of the data feature, so that the importance degree of the data feature represented by the feature display branch can be determined according to the order of the feature display branches in the feature display diagram, so that a user can intuitively know the importance degree of each data feature of the data based on the feature display diagram on the basis that the user can intuitively know the feature value condition of each data feature of the data.
It is understood that, in the present application, for a certain data feature, the feature value between different data may not be suitable for direct comparison, and for the convenience of comparison, the present application may normalize the feature value of multiple data on the data feature for each data feature.
In practical application, if the value of a certain data feature of the data is not numerical, the value of each data on the data feature is converted into a feature value before normalization.
It is understood that the data of the present application may have various data characteristics that are determined directly from the data, either manually or by a computer.
In order to effectively present the data characteristics of the data, the present application may further determine a plurality of data characteristics from the data, then screen the plurality of data characteristics, and then determine the data characteristics to be analyzed or processed.
For example, in one possible case, feature values of a plurality of original data features of each data may be obtained first. The original data features can be understood as data features directly extracted or determined from data. On the basis, the characteristic values of the multiple original characteristics of each data can be combined to perform dimensionality reduction processing on the multiple original data characteristics to obtain the characteristic values of the multiple data characteristics of each data.
The dimension reduction processing method for the original data features may have multiple possibilities, for example, a Singular Value Decomposition (SVD) algorithm or a principal component analysis (pca) algorithm may be used to reduce the dimensions of the multiple original data features, so as to obtain multiple data features after dimension reduction.
For ease of understanding, the following description will be made in conjunction with one implementation of determining various data characteristics of data. As shown in fig. 6, which shows a schematic flow chart of another embodiment of the data processing method provided in the present application, the method of this embodiment may include:
and S601, obtaining a plurality of data to be analyzed.
Wherein, the multiple data shares are the same type of multiple data shares.
S602, characteristic values of various original data characteristics of each data are obtained.
For example, for each piece of data, feature values of the data on various original data features can be extracted by performing feature extraction on the data. It can be understood that, since the data is the same type of data, the types of the original data features extracted from different data shares are the same.
It can be understood that, if the characteristic value of the data on a certain data feature is not a numerical value, the characteristic value of the data feature needs to be converted into a numerical value to obtain the characteristic value of the data value.
And S603, normalizing the characteristic values of the multiple copies of data on the original data characteristics aiming at each original data characteristic.
S604, combining the characteristic values of the multiple original data characteristics of each data, performing dimensionality reduction processing on the multiple original data characteristics to obtain the characteristic values of the multiple data characteristics of each data.
Wherein the plurality of data features have dimensions that are less than the dimensions of the plurality of raw data features.
For example, feature values of multiple original data features of each piece of data can be combined, and a principal component analysis algorithm is used for performing dimension reduction on the multiple original data features of each piece of data to obtain features of the multiple data features of each piece of data after dimension reduction. In the process of reducing the dimensions of various original data features by using a principal component analysis algorithm, two or more original data features may be combined, and finally, various data features which can reflect the essence of data most can be obtained through dimension reduction.
It is understood that, in order to perform dimension reduction more specifically, before the step S604, at least one raw data feature group selected and combined by a user from the plurality of data features may be determined, where each raw data feature group includes at least two data features. For example, assuming that the user needs to combine the original data features 1 and 2 into one dimensional data feature, the original data features 1 and 2 can be selected as a raw data feature group.
Correspondingly, the multiple original data features are subjected to dimensionality reduction processing based on the at least one original data feature group and the feature values of the multiple original data features of each data. For example, the principal component analysis algorithm is used to combine the feature values of the original data features to be combined in the original data feature group set by the user and the feature values of other original data features except the original data feature group, so as to perform dimension reduction on the multiple original data features of the data, and finally obtain the feature values of each data on the multiple data features after dimension reduction.
S605, the importance degree of each of the plurality of data features is obtained.
The implementation of determining the importance of the data features can be seen from the related description of the previous embodiments.
Particularly, if the plurality of data features are obtained based on the principal component analysis algorithm, in the process of performing dimension reduction on the plurality of original data features based on the principal component analysis algorithm, not only can the plurality of data features obtained by the dimension reduction be determined, but also the importance degrees of the plurality of data features can be obtained, so that the respective importance degrees of the plurality of data features can be obtained.
And S606, constructing a feature display diagram of the data according to the feature values of various data features of the data and the importance degree of the data features.
The feature display graph comprises a plurality of feature display branches, each feature display branch is used for representing one data feature, and the branch length of each feature display branch can represent the size of a feature value of the data feature corresponding to the feature display branch. Meanwhile, the higher the importance degree of the data features characterized by the feature display branches is, the more the feature display branches are ranked in the feature display diagram.
And S607, displaying the feature display diagrams of the data features.
The above steps S606 and S607 can refer to the related description of the previous embodiment, and are not described herein again.
It can be understood that the scheme of the application not only can be used for analyzing the difference of a plurality of independent data on different data characteristics, but also can be used for analyzing whether a plurality of clusters clustered by a data set are reasonable or not.
Under the condition that whether a plurality of clusters clustered by the data set are reasonable or not needs to be analyzed, the plurality of data to be analyzed in the application can be a plurality of data cluster sets clustered by one data set. The data set may include multiple data, and each data may be structured data or image data, and the like, which may be referred to in the foregoing description. And each data cluster set comprises at least one piece of data clustered into the same category.
Accordingly, for each data cluster set, a characteristic value of the data cluster set over a plurality of data characteristics, respectively, may be determined. The characteristic value of the data cluster set on one data characteristic is the average value of the characteristic values of at least one piece of data in the data cluster set on the data characteristic.
Similar to the construction of the feature exposure map, for each data cluster set, the feature exposure map of the data cluster set may be constructed according to feature values of a plurality of data features that the data cluster set has. For example, a feature exposure graph of a data cluster set may include a plurality of feature exposure branches, each feature exposure branch also identifies a data feature, and the length of each feature exposure branch is capable of representing a feature value of the data cluster set on the data feature.
The process of constructing the feature display map of the data cluster set is similar to the process of constructing the feature display map of the data, and is not repeated here.
It will be appreciated that in the case of analyzing a data cluster set, a feature presentation graph for the data cluster set may also be constructed in combination with the various data features that the data cluster set has and the importance of each data feature. For example, the higher the importance of a data feature, the higher the ranking order of the feature display branches representing the data feature in the feature display diagram.
For a specific implementation of constructing the feature display diagram of the data cluster set in combination with the importance degree of the data features, the same process as the process of constructing the feature display diagram of the data in combination with the importance degree of the data features is specifically referred to the foregoing related description, and details are not repeated herein.
It can be understood that after the feature display diagrams of the respective data cluster sets are displayed, the size relationship of the feature values of the different data cluster sets on some data features (such as important data features) can be intuitively understood based on the feature display diagrams of the different data cluster sets, and the data cluster sets can be assisted in analyzing whether clustering error data exists.
The application also provides a data processing device corresponding to the data processing method. As shown in fig. 7, which shows a schematic diagram of a component structure of a data processing apparatus according to the present application, the apparatus of this embodiment may include:
a data obtaining unit 701 configured to obtain a plurality of pieces of data to be analyzed;
a feature determination unit 702 configured to determine a feature value of a plurality of kinds of data features that each of the data has;
a graph constructing unit 703, configured to construct a feature display graph of the data according to feature values of multiple data features of the data, where the feature display graph includes multiple feature display branches, each feature display branch is used to characterize one of the data features, and a branch length of each feature display branch can characterize a feature value of a data feature corresponding to the feature display branch;
a graph display unit 704 for displaying a feature display graph of each of the data.
In one possible implementation, the data obtaining unit may include:
the system comprises a class obtaining unit, a data analyzing unit and a data analyzing unit, wherein the class obtaining unit is used for obtaining a plurality of data cluster sets clustered by data sets to be analyzed, and each data cluster set comprises at least one piece of data clustered into the same class;
the feature determination unit includes:
and the class characteristic determining unit is used for determining a characteristic value of each data characteristic of each data cluster set, wherein the characteristic value of each data characteristic of the data cluster set is an average value of the characteristic values of the at least one piece of data on the data characteristic in the data cluster set.
In yet another possible implementation manner, the apparatus further includes:
an importance obtaining unit for obtaining an importance degree of the data feature;
the graph construction unit is specifically configured to construct a feature display graph of the data according to feature values of a plurality of data features of the data and importance degrees of the data features.
In an alternative, the graph building unit includes:
the branch determining subunit is used for determining the number of the feature display branches in the feature display graph of the data and the length of each feature display branch according to the feature values of various data features of the data;
the graph constructing subunit is configured to construct the feature display graph according to the importance degree of the data features represented by the feature display branches, the number of the feature display branches in the feature display graph of the data, and the length of each feature display branch, wherein the feature display graph has a plurality of sequentially ordered feature display branches, and the higher the importance degree of the data features represented by the feature display branches is, the more the order of the feature display branches is.
In an alternative, the obtaining the importance of the data feature comprises:
obtaining the importance degree of the data characteristics set by a user;
alternatively, the first and second electrodes may be,
and combining the characteristic values of the multiple data characteristics of each piece of data, and determining the importance degree of each data characteristic based on a principal component analysis algorithm.
In another possible implementation manner, the feature determining unit includes:
an original feature acquisition unit configured to acquire feature values of a plurality of kinds of original data features possessed by each of the data;
and the dimension reduction processing unit is used for carrying out dimension reduction processing on the various original data characteristics by combining the characteristic values of the various original data characteristics of each data to obtain the characteristic values of the various data characteristics of each data.
In yet another possible implementation manner, the apparatus further includes:
the combination determining unit is used for determining at least one original data feature group selected and combined by a user in the multiple original data features before the dimension reduction processing unit performs dimension reduction processing on the multiple original data features, wherein each original data feature group comprises at least two original data features;
the dimension reduction processing unit is specifically configured to perform dimension reduction processing on multiple original data features based on the at least one original data feature group and feature values of multiple original data features of each piece of data.
In yet another possible implementation manner, the apparatus further includes:
and the normalization unit is used for normalizing the characteristic values of the multiple data on each data characteristic before the graph construction unit constructs the characteristic display graph of the data.
In yet another aspect, the present application further provides an electronic device, as shown in fig. 8, which shows a schematic structural diagram of the electronic device, and the electronic device may be any type of electronic device, and the electronic device at least includes a memory 801 and a processor 802;
wherein the processor 801 is adapted to perform the data processing method as in any of the above embodiments.
The memory 802 is used to store programs needed for the processor to perform operations.
It is to be understood that the electronic device may further include a display unit 803 and an input unit 804.
Of course, the electronic device may have more or less components than those shown in fig. 8, which is not limited thereto.
In another aspect, the present application further provides a computer-readable storage medium, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the data processing method according to any one of the above embodiments.
The present application also proposes a computer program comprising computer instructions stored in a computer readable storage medium. The computer program is for performing the data processing method as in any of the above embodiments when run on an electronic device.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. Meanwhile, the features described in the embodiments of the present specification may be replaced or combined with each other, so that those skilled in the art can implement or use the present application. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data processing, comprising:
obtaining a plurality of data to be analyzed;
determining a characteristic value of a plurality of data characteristics possessed by each piece of the data;
according to the characteristic values of various data characteristics of the data, constructing a characteristic display diagram of the data, wherein the characteristic display diagram comprises a plurality of characteristic display branches, each characteristic display branch is used for representing one data characteristic, and the branch length of each characteristic display branch can represent the characteristic value of the data characteristic corresponding to the characteristic display branch;
a characteristic display plot is presented for each of the data.
2. The method of claim 1, the obtaining a plurality of copies of data to be analyzed, comprising:
obtaining a plurality of data cluster sets clustered by data sets to be analyzed, wherein each data cluster set comprises at least one piece of data clustered into the same category;
the determining the characteristic value of the plurality of data characteristics of each piece of the data comprises the following steps:
and determining the characteristic value of each data feature of each data cluster set, wherein the characteristic value of each data feature of the data cluster sets is the average value of the characteristic values of the at least one piece of data on the data feature in the data cluster sets.
3. The method of claim 1, further comprising:
obtaining the importance degree of the data characteristics;
the constructing of the feature display graph of the data according to the feature values of the plurality of data features of the data comprises:
and constructing a feature display graph of the data according to the feature values of various data features of the data and the importance degree of the data features.
4. The method according to claim 3, wherein the constructing the feature display graph of the data according to the feature values of the plurality of data features of the data and the importance degrees of the data features comprises:
determining the number of the feature display branches in the feature display graph of the data and the length of each feature display branch according to the feature values of various data features of the data;
according to the importance degree of the data features characterized by the feature display branches, the number of the feature display branches in the feature display diagram of the data and the length of each feature display branch, constructing the feature display diagram, wherein the feature display diagram has a plurality of feature display branches which are sequentially ordered, and the higher the importance degree of the data features characterized by the feature display branches is, the higher the ordering order of the feature display branches is.
5. The method of claim 3, the obtaining the importance of the data features comprising:
obtaining the importance degree of the data characteristics set by a user;
alternatively, the first and second electrodes may be,
and combining the characteristic values of the multiple data characteristics of each piece of data, and determining the importance degree of each data characteristic based on a principal component analysis algorithm.
6. The method of claim 1, said determining a characteristic value of a plurality of data characteristics that each of said data has, comprising:
obtaining characteristic values of various original data characteristics of each piece of data;
and combining the characteristic values of the multiple original data characteristics of each data, and performing dimensionality reduction on the multiple original data characteristics to obtain the characteristic values of the multiple data characteristics of each data.
7. The method of claim 6, further comprising, before said performing dimension reduction processing on a plurality of raw data features in combination with feature values of the plurality of raw data features of each of the data, the step of:
determining at least one raw data feature group selected and combined by a user from the multiple raw data features, wherein each raw data feature group comprises at least two raw data features;
the combining the characteristic values of the multiple original data characteristics of each data to perform dimensionality reduction processing on the multiple original data characteristics to obtain the characteristic values of the multiple data characteristics of each data includes:
and performing dimensionality reduction on the multiple original data features based on the at least one original data feature group and the feature values of the multiple original data features of each piece of data.
8. The method according to claim 1, further comprising, before the constructing the feature display map of the data according to the feature values of the plurality of data features that the data has:
and normalizing the characteristic value of the plurality of data on each data characteristic.
9. A data processing apparatus comprising:
a data obtaining unit for obtaining a plurality of data to be analyzed;
a feature determination unit configured to determine a feature value of a plurality of kinds of data features that each of the data has;
the graph construction unit is used for constructing a feature display graph of the data according to feature values of various data features of the data, wherein the feature display graph comprises a plurality of feature display branches, each feature display branch is used for representing one data feature, and the branch length of each feature display branch can represent the feature value of the data feature corresponding to the feature display branch;
and the graph display unit is used for displaying the characteristic display graph of each piece of data.
10. An electronic device, comprising:
a memory and a processor;
the processor for performing the data processing method of any one of claims 1 to 8;
the memory is used for storing programs needed by the processor to execute the operation.
CN202111037955.0A 2021-09-06 2021-09-06 Data processing method and device and electronic equipment Pending CN113743506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111037955.0A CN113743506A (en) 2021-09-06 2021-09-06 Data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111037955.0A CN113743506A (en) 2021-09-06 2021-09-06 Data processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113743506A true CN113743506A (en) 2021-12-03

Family

ID=78735898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111037955.0A Pending CN113743506A (en) 2021-09-06 2021-09-06 Data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113743506A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109688437A (en) * 2018-12-10 2019-04-26 未来电视有限公司 A kind of method for exhibiting data, device, electronic equipment and readable storage medium storing program for executing
WO2020057145A1 (en) * 2018-09-21 2020-03-26 Boe Technology Group Co., Ltd. Method and device for generating painting display sequence, and computer storage medium
CN112580674A (en) * 2019-09-27 2021-03-30 阿里巴巴集团控股有限公司 Picture identification method, computer equipment and storage medium
CN113094581A (en) * 2021-03-30 2021-07-09 联想(北京)有限公司 Data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020057145A1 (en) * 2018-09-21 2020-03-26 Boe Technology Group Co., Ltd. Method and device for generating painting display sequence, and computer storage medium
CN109688437A (en) * 2018-12-10 2019-04-26 未来电视有限公司 A kind of method for exhibiting data, device, electronic equipment and readable storage medium storing program for executing
CN112580674A (en) * 2019-09-27 2021-03-30 阿里巴巴集团控股有限公司 Picture identification method, computer equipment and storage medium
CN113094581A (en) * 2021-03-30 2021-07-09 联想(北京)有限公司 Data processing method and device

Similar Documents

Publication Publication Date Title
JP6402265B2 (en) Method, computer device and storage device for building a decision model
CN102622335B (en) Automated table transformations from examples
US8161048B2 (en) Database analysis using clusters
CN112889042A (en) Identification and application of hyper-parameters in machine learning
US20090105984A1 (en) Methods and Apparatus for Dynamic Data Transformation for Visualization
CN107861981B (en) Data processing method and device
CN108255897B (en) Visualized chart data conversion processing method and device
US20130131993A1 (en) Iterative time series matrix pattern enhancer processor
CN111753094B (en) Method and device for constructing event knowledge graph and method and device for determining event
US9081822B2 (en) Discriminative distance weighting for content-based retrieval of digital pathology images
CN116757297A (en) Method and system for selecting features of machine learning samples
CN106605222B (en) Guided data exploration
CN110717806B (en) Product information pushing method, device, equipment and storage medium
WO2017203672A1 (en) Item recommendation method, item recommendation program, and item recommendation apparatus
CN110968585B (en) Storage method, device, equipment and computer readable storage medium for alignment
CN110716739A (en) Code change information statistical method, system and readable storage medium
CN110618926A (en) Source code analysis method and source code analysis device
Rinaldi Concave programming for finding sparse solutions to problems with convex constraints
CN114780368B (en) Table data synchronization method and apparatus
CN113743506A (en) Data processing method and device and electronic equipment
CN108170664B (en) Key word expansion method and device based on key words
CN115357696A (en) Dynamic chart billboard display method based on components and related equipment
CN111026935B (en) Cross-modal retrieval reordering method based on adaptive measurement fusion
CN114357299A (en) Data processing method and device
Liu et al. Numerical facet range partition: Evaluation metric and methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination