CN108319715B - Parallel coordinate improvement method for multi-dimensional integer data set - Google Patents

Parallel coordinate improvement method for multi-dimensional integer data set Download PDF

Info

Publication number
CN108319715B
CN108319715B CN201810131947.4A CN201810131947A CN108319715B CN 108319715 B CN108319715 B CN 108319715B CN 201810131947 A CN201810131947 A CN 201810131947A CN 108319715 B CN108319715 B CN 108319715B
Authority
CN
China
Prior art keywords
data
coordinate axis
dimension
integer
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810131947.4A
Other languages
Chinese (zh)
Other versions
CN108319715A (en
Inventor
陈红倩
程中娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Totem Technology Hefei Co ltd
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN201810131947.4A priority Critical patent/CN108319715B/en
Publication of CN108319715A publication Critical patent/CN108319715A/en
Application granted granted Critical
Publication of CN108319715B publication Critical patent/CN108319715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data

Abstract

The invention relates to a parallel coordinate improvement method for a multi-dimensional integer data set, and belongs to the technical field of computer graphics and visualization. The method comprises the following implementation steps: counting the number of types of data values of each integer data dimension in the data set; counting the occupation ratios of different data values in a data set, establishing a segmentation coordinate axis according to the occupation ratios, and displaying the occupation ratios of the data values through the height ratios of the segmentation of the coordinate axis; aiming at each record in the data set, continuously updating an offset value in the visual mapping process by an offset mapping method, so that the integral values of different records are mapped to different heights on a coordinate axis, and the problem of same-point mapping is solved; the invention can intuitively obtain the recording number ratio condition of each data value in each data dimension, can quickly analyze the association relation and the association strength among the data of each dimension, and improves the visual analysis capability aiming at the multi-dimensional integer data set.

Description

Parallel coordinate improvement method for multi-dimensional integer data set
Technical Field
The invention relates to a parallel coordinate improvement method for a multi-dimensional integer data set, and belongs to the technical field of computer graphics and visualization.
Technical Field
"integer data" is a common content in a data set, and is category data in the form of gender (male, female), region (province, city), pesticide toxicity (highly toxic, low toxic), etc., which is converted into discrete integer data instead of continuous data when being numerically converted for data analysis.
The parallel coordinate method is a commonly used visualization method for analyzing multidimensional data, but when the method faces multidimensional integer data, a lot of troubles are caused, for example, in the parallel coordinate visualization method, the same numerical value in the integer data is mapped to the same position on a coordinate axis in the visualization mapping process (in the invention, the same-point mapping is called), so that the visualization result cannot reflect the information such as the distribution and quantity information of the data, the correlation strength among the category data and the like, and the effectiveness of cross correlation analysis is greatly reduced.
The invention provides an improved scheme of a parallel coordinate visualization method for a multi-dimensional integer data set, introduces the concept of 'segmented coordinate axes' into the coordinate axes of a parallel coordinate system, and enables different records in integer data to be different in mapping positions on the coordinate axes even if the data values are the same by providing an 'offset mapping' method, thereby solving the 'same-point mapping' problem and improving the analysis capability and the analysis efficiency of the multi-dimensional integer data set or the data set containing integer data dimensions.
In terms of visualization methods, the parallel coordinate improvement method for multidimensional integer-valued datasets proposed in the present invention has not found similar techniques in published documents.
Disclosure of Invention
The invention aims to provide a parallel coordinate improvement method for a multidimensional integer value type data set or a data set containing integer value type data dimensions, which comprises the following steps:
step 1: and counting the number of the types of the data values of each integer data dimension in the data set, and calculating the ratio of each data value.
For one of the integer data dimensions (set to D)i) The calculation method of (2) is as follows:
step 1.1: integrating data dimension DiIs extracted as a vector (denoted as V)i). If the number of data records in the data set is T, then ViThe number of component data of the vector is T.
Step 1.2: statistics ViNumber of types of data values in vector (denoted NV)i)。
Step 1.3: statistics ViAnd the record number of each data value in the vector is sorted from more records to less records. Will ViConverting the data values in the vector from 1 to NV according to the sequence of the record number from more to lessiThe data value named conversion value j is VijNaming the data dimension DiIn satisfy Vi=VijIs NVij
In the invention, V isiThe converted data value of each data value in the vector is referred to as the "conversion value", ViThe conversion value of each data value in the vector ranges from 1 to NVi
Step 1.4: calculating ViAnd recording ratio corresponding to each data value in the vector. Satisfy Vi=VijIs recorded in the ratio RijThe calculation method is as the formula (1):
Figure BSA0000159153730000021
where T is the total number of records in the data set described in step 1.1.
Step 2: and establishing a coordinate axis according to the data distribution of all integer data dimensions in the data set. If the non-integer data dimension exists in the data set at the same time, the coordinate axis establishing method corresponding to the non-integer data dimension remains unchanged from the traditional method.
For integer data dimension DiThe corresponding coordinate axis establishing method comprises the following steps:
dividing coordinate axes into NViEach segment, called coordinate axis segment, corresponding to a data dimension DiThe height of each coordinate axis segment is related to the proportion of the corresponding data value. The coordinate axis corresponding to the integer data dimension established by the method is formed by segmenting coordinate axes corresponding to different types of data values, and the coordinate axis is called as a segmented coordinate axis in the invention.
The calculation method of each piece of segmentation information in the coordinate axis comprises the following steps:
step 2.1: and calculating the height of the segmentation coordinate axis corresponding to each data value according to the height (expressed as height) of the coordinate axis in the parallel coordinate system in the final visualization result.
In an integer data dimension DiFor example, the data value VijThe corresponding coordinate axis segmentation height calculation method is as the formula (2):
Hij=height*Rij(2)
wherein R isijTo satisfy V as derived in step 1.4i=VijIs recorded to the ratio.
Step 2.2: the starting height and ending height of each "segmentation coordinate axis" are calculated. By a data value VijThe corresponding "segmentation coordinate axis" is taken as an example, and the calculation method of the starting height is as the following formula (3):
Figure BSA0000159153730000031
the calculation method of the ending height is as the formula (4):
Figure BSA0000159153730000032
and step 3: for each "coordinate axis segment" of the coordinate axis corresponding to all integer data dimensions, the offset height of a data record is calculated. By a data value VijCorresponding "coordinate axis segmentation" is taken as an example, and the offset height of one record is calculated according to the formula (5):
Figure BSA0000159153730000033
wherein HijFor the data value V obtained in step 2.1ijCorresponding axis segment height, NVijTo satisfy V as derived in step 1.3i=VijThe number of records of (2).
And 4, step 4: and 3, calculating data value mapping basic data of all coordinate axes corresponding to the integer data dimension according to the adjacent relation of the coordinate axes established in the step 3.
The invention maps different heights aiming at the same data value in different records in the integral data dimension, thereby solving the problem of same-point mapping and effectively reducing the intersection of connecting lines. This mapping method is named as "offset mapping" method in the present invention.
In the offset mapping method, the mapping height of the data value depends on two factors. The first is the sequence of the record in the data set, and the second is the mapping height of other dimensional data values of the record on the left adjacent coordinate axis (the coordinate axis corresponding to the current data dimension is the leftmost coordinate axis, and the factor is not considered).
The specific mapping method is divided into two cases: one is that the data dimension corresponding to the left adjacent coordinate axis is an integer data dimension, in which case step 5 is continued; and the other is that the data dimension corresponding to the left adjacent coordinate axis is a non-integer data dimension or the coordinate axis corresponding to the current data dimension is the leftmost coordinate axis, and in this case, the step 6 is skipped.
And 5: in the step, under the condition that the data dimension corresponding to the left adjacent coordinate axis is a numerical data dimension, the data value mapping basic data is calculated.
For the current coordinate axis, the step of calculating the data value mapping basic data is as follows:
step 5.1: setting an integer data dimension D corresponding to an adjacent coordinate axis on the left side of the coordinate axis of the current integer data dimensionuSetting the vector extracted by the integer data dimension as VuVector VuThe number of kinds of medium data value is NVu(NVuCalculated according to step 1.2).
Step 5.2: statistical integer data dimension DuAnd DiI.e. for any VuConverted values p and V ofiThe conversion value q of (a), statistics satisfy Vu=VupAnd V isi=ViqIs named as
Figure BSA0000159153730000041
Wherein VupV corresponding to conversion value puData value of (1), ViqV corresponding to the conversion value qiThe data value of (1).
Step 5.3: according to vector VuNumber of kinds NV of medium data valuesuThe data value V in the current coordinate axis is usediqCorresponding division of coordinate axis into NVuA "coordinate axis sub-segment".
Step 5.4: and calculating the heights of all the coordinate axis subsections in the current coordinate axis.
With Vu=VupAnd V isi=ViqFor example, the height of the coordinate axis sub-segment corresponding thereto
Figure BSA0000159153730000051
The calculation method is as the formula (6):
Figure BSA0000159153730000052
in NViqTo satisfy V as derived in step 1.3i=ViqThe number of records of (1), HiqIs obtained according to step 2.1Vi=ViqThe corresponding axis segment height is set to be,
Figure BSA0000159153730000058
to satisfy V as derived in step 5.2u=VupAnd V isi=ViqThe number of records of (2).
Step 5.5: and calculating the starting heights of all the coordinate axis subsections in the current coordinate axis.
With Vu=VupAnd V isi=ViqFor example, the starting height of the corresponding coordinate axis sub-segment is recorded as
Figure BSA0000159153730000053
The calculation method is as the formula (7):
Figure BSA0000159153730000054
wherein HstartiqIs the data value V obtained according to step 2.2iqThe starting height of the corresponding coordinate axis segment,
Figure BSA0000159153730000055
to satisfy V as derived in step 5.4u=VukAnd V isi=ViqThe height of the coordinate axis sub-segment.
Step 5.6: the "next mapping height" of the "coordinate axis sub-segment" is set for each "coordinate axis sub-segment" of the current coordinate axis.
With Vu=VupAnd V isi=ViqFor example, the next mapping height of the coordinate axis sub-segment to which it corresponds
Figure BSA0000159153730000056
Is assigned as the starting height of the coordinate axis segment in which it is located
Figure BSA0000159153730000057
Jump to step 7.
Step 6: in the step, under the condition that the adjacent coordinate axis on the left side is a non-integer data dimension or the current data dimension is the coordinate axis corresponding to the leftmost data dimension, the data value mapping basic data is calculated.
Because the left side does not have a coordinate axis corresponding to the integer data dimension, the segmentation of the current coordinate axis does not need to be continuously divided into coordinate axis sub-segments, and the next mapping height of all coordinate axis segments is directly set.
With Vi=ViqFor example, it corresponds to the next mapping height Hnext of the coordinate axis segmentiqIs assigned to the corresponding HstartiqI.e. the data value V obtained according to step 2.2iqThe starting height of the corresponding coordinate axis segment.
And 7: and calculating the mapping height of each dimension data value on the corresponding coordinate axis of each record in the data set.
For each record, if the current data dimension is a non-integer data dimension, calculating the mapping height of the data value on the corresponding coordinate axis by using a traditional method;
if the current data dimension is an integer data dimension and the coordinate axis adjacent to the left side of the corresponding coordinate axis is an integer data dimension coordinate axis, continuing to execute the step 7.1;
if the current data dimension is an integer data dimension and the corresponding coordinate axis is the leftmost coordinate axis or the left adjacent coordinate axis is a non-integer data dimension, continue to execute step 7.3.
Step 7.1: in an integer data dimension DiData value V ofi=ViqFor example, a data dimension vector (named V) corresponding to the left coordinate axisu) The data value (named V) of the record is obtainedup) I.e. the record satisfies Vu=VupAnd V isi=Viq
According to Vu=VupAnd V isi=ViqIn step 5, the next mapping height of the corresponding coordinate axis sub-segment is obtained
Figure BSA0000159153730000061
That is, the piece of data is in the data dimension DiThe mapping height on the corresponding coordinate axis.
Step 7.2: according to Vi=ViqIn step 3, the data value V is obtainediqOne recording offset height I of the corresponding "coordinate axis segmentiqUpdate
Figure BSA0000159153730000062
As in the formula (8),
Figure BSA0000159153730000071
jump to step 8.
Step 7.3: in an integer data dimension DiData value V ofi=ViqFor example, V obtained in step 6iqNext mapping height Hnext of corresponding coordinate axis sub-segmentiq,HnextiqThat is, the piece of data is in the data dimension DiThe mapping height on the corresponding coordinate axis.
Step 7.4: according to Vi=ViqIn step 3, the data value V is obtainediqOne recording offset height I of the corresponding "coordinate axis segmentiqUpdate HnextiqAs shown in the formula (9),
Hnextiq=Hnextiq+Iiq(9)
and 8: in order to distinguish each coordinate axis segment in the coordinate axis corresponding to the integer value type data dimension, different textures can be set for each coordinate axis segment, and the textures can be selected by using distinctive colors or shading.
And step 9: and drawing an improved parallel coordinate visualization result of the current data set according to the coordinate axis information obtained in the steps 1 to 8, the mapping heights of all records and the segmented textures of the coordinate axes.
Advantageous effects
By the parallel coordinate improvement method provided by the invention, the recording number ratio condition of each data value in each data dimension can be intuitively obtained through the height ratio of each segment in the coordinate axis; in the data screening interaction process, the association relationship and the association strength among all dimensional data can be rapidly obtained; the visual analysis capability for the multidimensional integer-value data set is improved.
Drawings
FIG. 1 is a flow chart of an implementation of a method for improving parallel coordinates of a multidimensional integer data set according to an embodiment of the present invention;
FIG. 2 is a parallel coordinate improvement method for a multidimensional integer value type data set, which is applied to the visualization effect of a pesticide residue detection result data set (desensitization and decryption).
FIG. 3 is a visualization result after interactive screening based on the visualization result of FIG. 2.
Detailed Description
The invention is further described below with reference to the accompanying drawings and examples.
Taking the data set of the pesticide residue detection result as an example, the data dimension comprises (area, year, month, agricultural product, pesticide), and the number of data records is 1241, wherein the first 10 data records are shown in table 1.
Table 1 pesticide residue detection result data set example data
Figure BSA0000159153730000081
The original data was subjected to integer data conversion as shown in table 2.
Table 2 conversion of sample data of pesticide residue detection result data set into integral value type data set
Figure BSA0000159153730000082
An implementation flow chart of the parallel coordinate improvement method for the multidimensional integral value type data set in the embodiment is shown in fig. 1, and specific operation processes of the method are described in combination with the pesticide residue detection result data set as follows:
step 1: and counting the number of the types of the data values of each integer data dimension in the data set, and calculating the ratio of each data value.
For one of the integer data dimension "month" (set to D)3) The calculation method of (2) is as follows:
step 1.1: integrating data dimension D3Is extracted as a vector (denoted as V)3),V3(11, 3, 1.., 2, 1). The number of data records in the data set is T-1241, then V3The number of component data of the vector is T1241.
Step 1.2: data dimension D3Vector V3The number NV of data value types in (11, 3, 1., 2, 1)3=9。
Step 1.3: statistics V3And the record number of each data value in the vector is sorted from more records to less records. Will V3Converting the data values in the vector from 1 to NV according to the sequence of the record number from more to less3(NV3Calculated from step 1.2) with a conversion value of 1, 2 months (named V)3,1) Data dimension D3In satisfy V3=V3,1Is NV3,1=283。
In the invention, V is3The converted data value of each data value in the vector is referred to as the "conversion value", V3The conversion value of each data value in the vector ranges from 1 to NV3(calculated from step 1.2).
Step 1.4: calculating V3The log ratio for each data value in the (11, 3, 1., 2, 1) vector. Satisfy V3=V3,1Is recorded in the ratio R3,1R is obtained by calculation according to the formula (1)3,1The calculation method of (2) is as in formula (10):
Figure BSA0000159153730000091
where T1241 is the number of all records in the data set described in step 1.1.
Step 2: and establishing a coordinate axis by the data distribution of all integer data dimensions in the data set. If the non-integer data dimension exists in the data set at the same time, the coordinate axis establishing method corresponding to the non-integer data dimension remains unchanged from the traditional method.
For integer data dimension D3The corresponding coordinate axis establishing method comprises the following steps:
dividing coordinate axes into NV3Each segment, called coordinate axis segment, corresponding to a data dimension D3The height of each coordinate axis segment is related to the proportion of the corresponding data value. The coordinate axis corresponding to the integer data dimension established by the method is formed by segmenting coordinate axes corresponding to different types of data values, and the coordinate axis is called as a segmented coordinate axis in the invention.
The calculation method of each piece of segmentation information in the coordinate axis comprises the following steps:
according to step 2.1: the height of the "segmentation coordinate axis" corresponding to each data value is calculated according to the height (indicated as height 520) of the coordinate axis in the "parallel coordinate system" in the final visualization result.
In an integer data dimension D3For example, the data value V is calculated according to the formula (2)3,1The corresponding coordinate axis segment height is calculated according to the formula (11):
Figure BSA0000159153730000101
wherein R is3,1To satisfy V as derived in step 1.43=V3,1Is recorded to the ratio.
Step 2.2: the starting height and ending height of each "segmentation coordinate axis" are calculated. By a data value V3,1Taking the corresponding "segmented coordinate axis" as an example, the calculation method for calculating the starting height according to the formula (3) is as the formula (12):
Hstart3,1=0(j=1) (12)
the calculation method for calculating the ending height according to the formula (4) is as the formula (13):
Figure BSA0000159153730000102
according to step 3: for each "coordinate axis segment" of the coordinate axis corresponding to all integer data dimensions, the offset height of a data record is calculated. By a data value V3,1For example, the offset height of a record is calculated according to formula (5) as shown in formula (14):
Figure BSA0000159153730000103
wherein
Figure BSA0000159153730000111
For the data value V obtained in step 2.13,1Corresponding axis segment height, NV3,1To satisfy V as derived in step 1.33=V3,1The number of records of (2).
And 4, step 4: and 3, calculating data value mapping basic data of all coordinate axes corresponding to the integer data dimension according to the adjacent relation of the coordinate axes established in the step 3.
The invention maps different heights aiming at the same data value in different records in the integral data dimension, thereby solving the problem of same-point mapping and effectively reducing the intersection of connecting lines. This mapping method is named as "offset mapping" method in the present invention.
In the offset mapping method, the mapping height of the data value depends on two factors. The first is the sequence of the record in the data set, and the second is the mapping height of other dimensional data values of the record on the left adjacent coordinate axis (the coordinate axis corresponding to the current data dimension is the leftmost coordinate axis, and the factor is not considered).
The specific mapping method is divided into two cases: one is that the data dimension corresponding to the left adjacent coordinate axis is an integer data dimension, in which case step 5 is continued; and the other is that the data dimension corresponding to the left adjacent coordinate axis is a non-integer data dimension or the coordinate axis corresponding to the current data dimension is the leftmost coordinate axis, and in this case, the step 6 is skipped.
And 5: in the step, under the condition that the data dimension corresponding to the left adjacent coordinate axis is a numerical data dimension, the data value mapping basic data is calculated.
For the current coordinate axis, the step of calculating the mapping height of each record on the integral data dimension is as follows:
step 5.1: setting an integer data dimension D corresponding to an adjacent coordinate axis on the left side of the coordinate axis of the current integer data dimension2Setting the vector extracted by the integer data dimension as V2Vector V2The number of kinds of medium data value is NV2=3(NV2Calculated according to step 1.2).
Step 5.2: statistical integer data dimension D2And D3I.e. for any V2Is 1 and V3The conversion value q of (1) is satisfied with V2=V2,1And V is3=V3,1Number of records (named
Figure BSA0000159153730000112
The number of records
Figure BSA0000159153730000121
Wherein V2,1V corresponding to 1 for conversion value p2Data value of (1), V3,1V corresponding to 1 for conversion value q3The data value of (1).
Step 5.3: according to vector V2Number of kinds NV of medium data values23, the data value V in the current coordinate axis3,1Corresponding division of coordinate axis into NV23 "coordinate axis subsections".
Step 5.4: and calculating the heights of all the coordinate axis subsections in the current coordinate axis.
With V2=V2,1And V is3=V3,1For example, the height of the corresponding coordinate axis sub-segment is calculated according to equation (6)
Figure BSA0000159153730000122
The calculation method of (2) is as in formula (15):
Figure BSA0000159153730000123
in NV3,1To satisfy V as derived in step 1.33=V3,1The number of records of (1), H3,1For V obtained according to step 2.13=V3,1The corresponding axis segment height is set to be,
Figure BSA0000159153730000124
to satisfy V as derived in step 5.22=V2,1And V is3=V3,1The number of records of (2).
Step 5.5: and calculating the starting heights of all the coordinate axis subsections in the current coordinate axis.
With V2=V2,1And V is3=V3,1For example, the starting height of the corresponding coordinate axis sub-segment is calculated according to the formula (7)
Figure BSA0000159153730000125
The calculation method is as the formula (16):
Figure BSA0000159153730000126
wherein Hstart3,1Is the data value V obtained according to step 2.23,1The starting height of the corresponding coordinate axis segment.
Step 5.6: the "next mapping height" of the "coordinate axis sub-segment" is set for each "coordinate axis sub-segment" of the current coordinate axis.
With V2=V2,1And V is3=V3,1For example, the next mapping height of the coordinate axis sub-segment to which it corresponds
Figure BSA0000159153730000127
Is assigned as the starting height of the coordinate axis segment in which it is located
Figure BSA0000159153730000128
Jump to step 7.
Step 6: in the step, under the condition that the adjacent coordinate axis on the left side is a non-integer data dimension or the current data dimension is the coordinate axis corresponding to the leftmost data dimension, the data value mapping basic data is calculated.
Because the left side does not have a coordinate axis corresponding to the integer data dimension, the segmentation of the current coordinate axis does not need to be continuously divided into coordinate axis sub-segments, and the next mapping height of all coordinate axis segments is directly set.
With V1=V1,1For example, it corresponds to the next mapping height Hnext of the coordinate axis segment1,1Is assigned to the corresponding Hstart1,10, i.e. the data value V obtained in step 2.21,1The starting height of the corresponding coordinate axis segment.
And 7: and calculating the mapping height of each dimension data value on the corresponding coordinate axis of each record in the data set.
For each record, if the current data dimension is a non-integer data dimension, calculating the mapping height of the data value on the corresponding coordinate axis by using a traditional method;
if the current data dimension is an integer data dimension and the coordinate axis adjacent to the left side of the corresponding coordinate axis is an integer data dimension coordinate axis, continuing to execute the step 7.1;
if the current data dimension is an integer data dimension and the corresponding coordinate axis is the leftmost coordinate axis or the left adjacent coordinate axis is a non-integer data dimension, continue to execute step 7.3.
Step 7.1: in an integer data dimension D3Data value V of3=V3,1For example, a data dimension vector (named V) corresponding to the left coordinate axis2) The data value (named V) of the record is obtained2,1) I.e. the record satisfies V2=V2,1And V is3=V3,1
According to V2=V2,1And V is3=V3,1In step 5, the next mapping height of the corresponding coordinate axis sub-segment is obtained
Figure BSA0000159153730000131
That is, the piece of data is in the data dimension D3The mapping height on the corresponding coordinate axis.
Step 7.2: according to V3=V3,1In step 3, the data value V is obtained3,1One recording offset height I of the corresponding "coordinate axis segment3,1Update
Figure BSA0000159153730000132
Calculated according to the formula (8)
Figure BSA0000159153730000133
The calculation formula is as (17):
Figure BSA0000159153730000141
jump to step 8.
Step 7.3: in an integer data dimension D3Data value V of3=V3,1For example, V obtained in step 63,1Next mapping height Hnext of corresponding coordinate axis sub-segment3,1,Hnext3,1That is, the piece of data is in the data dimension D3The mapping height on the corresponding coordinate axis.
Step 7.4: according to V3=V3,1In step 3, the data value V is obtained3,1One recording offset height I of the corresponding "coordinate axis segment3,1Updating Hnext according to equation (9)3,1
And 8: in order to distinguish each coordinate axis segment in the coordinate axis corresponding to the integer value type data dimension, different textures can be set for each coordinate axis segment, and the textures can be selected by using distinctive colors or shading.
In the invention, the diagonal stripes and the cross stripes are selected as coordinate axis segmentation textures.
And step 9: and drawing an improved parallel coordinate visualization result of the current data set according to the coordinate axis information obtained in the steps 1 to 8, the mapping heights of all records and the segmented textures of the coordinate axes.
FIG. 2 is a parallel coordinate improvement method for a multidimensional integer value type data set, which is applied to the visualization effect of a pesticide residue detection result data set (desensitization and decryption). From the visualization result, the visual analysis conclusion of multi-dimensional comparison on the pesticide residue detection data set example data comprises the following steps:
(1) after the method is applied in the process of drawing the parallel coordinates, each coordinate axis is divided into a plurality of sections, the height of each section represents the data record number of the data value of the section, and the comparison of the data record number is realized. In the "region" dimension, the number of data records in the sunny region is the highest, and the number of data records in the mountain region is the lowest. In the "years" dimension, 2012's of data records are the most, followed by 2014 and finally 2013. The number of data records for february is the largest and the number of data records for february is the smallest in the "month" dimension. The number of records in the "day" dimension is the largest for number five and the smallest for number eighteen. The cucumber records are the most and the peach records are the least in the "agricultural products" dimension. In the "pesticide" dimension, the most pesticide was not detected, indicating that pesticide use in most agricultural products is standard.
(2) The incidence relation analysis between every two coordinate axes can be realized by adjusting the coordinate axis sequence. The "year" dimension may analyze the association with the "region" dimension, the association with the "month" dimension, or the association with the "day" dimension.
In the invention, the distribution condition of the data values of different dimensions in each data value of other dimensions can be analyzed by screening the data values of different dimensions. Fig. 3 is a data visualization result obtained by screening the sunny region based on the visualization result of fig. 2, and data records with data values of the "sunny region" in the region dimension are displayed, and the rest are not displayed. From the visualization results after screening, the analysis conclusion can be that: in the "year" dimension, the number of data records is distributed most in 2012, and the distribution of 2014 and 2013 is approximately the same; in the 'month' dimension, the number of data records is distributed to the maximum in October, and the number of data records in September is the minimum; in the "day" dimension, the number of data records 10 is the largest and the number of data records 14 is the smallest.

Claims (1)

1. The parallel coordinate improvement method for the multidimensional integer value type data set comprises the following steps:
step 1: counting the number of types of data values of each integer data dimension in the data set, and calculating the ratio of each data value;
for one of the integer data dimensions DiThe calculation method of (2) is as follows:
step 1.1: integrating data dimension DiIs extracted as a vector ViIf the number of data records in the data set is T, then ViThe number of the component data of the vector is T;
step 1.2: statistics ViNumber of types NV of data values in a vectori
Step 1.3: statistics ViThe record number of each data value in the vector is sorted according to the record number from more to less, and V is obtainediConverting the data values in the vector from 1 to NV according to the sequence of the record number from more to lessiThe data value named conversion value j is VijNaming an integer data dimension DiIn satisfy Vi=VijIs NVij
Will ViThe converted data value of each data value in the vector is referred to as the "conversion value", ViThe conversion value of each data value in the vector ranges from 1 to NVi
Step 1.4: calculating ViRecording proportion corresponding to each data value in the vector;
step 2: establishing a coordinate axis according to the data distribution of all integer data dimensions in the data set, and if non-integer data dimensions exist in the data set at the same time, keeping the coordinate axis establishing method corresponding to the non-integer data dimensions unchanged in the traditional method;
for integer data dimension DiThe corresponding coordinate axis establishing method comprises the following steps:
dividing coordinate axes into NViEach segment is called a coordinate axis segment, and each coordinate axis segment corresponds to an integer data dimension DiThe height of each coordinate axis segment is related to the proportion of the corresponding data value;
the calculation method of each piece of segmentation information in the coordinate axis comprises the following steps:
step 2.1: calculating the height H of the coordinate axis segmentation corresponding to each data value according to the height of the coordinate axis in the parallel coordinate system in the final visualization resultij
Step 2.2: calculate the starting height Hstart of each "coordinate axis segmentijAnd a finish height Hendij
And step 3: calculating the offset height of a data record aiming at each coordinate axis segment corresponding to all integer data dimensions, and enabling the integer data dimensions DiThe offset height of a data record of a coordinate axis segment with a transition value j is named Iij
And 4, step 4: calculating data value mapping basic data of all coordinate axes corresponding to the integer data dimension according to the adjacent relation of the coordinate axes established in the step 3;
the specific calculation method is divided into two cases: one is that the data dimension corresponding to the left adjacent coordinate axis is an integer data dimension, in which case step 5 is continued; the other is that the data dimension corresponding to the left adjacent coordinate axis is a non-integer data dimension or the coordinate axis corresponding to the current data dimension is the leftmost coordinate axis, and in this case, the step 6 is skipped;
and 5: the method comprises the following steps of calculating data value mapping basic data under the condition that data dimensions corresponding to left adjacent coordinate axes are numerical data dimensions;
for a current integer data dimension DiAnd calculating data value mapping basic data according to the corresponding coordinate axis as follows:
step 5.1: setting the current "integer-type data dimension DiAdjacent coordinate axes to the left of the coordinate axes "Corresponding to an integer data dimension DuThe integer data dimension DuThe extracted vector is VuVector VuThe number of kinds of medium data value is NVuIn which NV isuCan be calculated according to the method of step 1.2;
step 5.2: statistical integer data dimension DuAnd an integer data dimension DiFor an integer data dimension DuAnd each conversion value p of the integer data dimension Di, and conforming the condition "D" in the data setuDimension conversion value is p 'and' DiThe number of records with a dimension conversion value of q "is named
Figure FSB0000186370040000021
Step 5.3: according to vector VuNumber of kinds NV of medium data valuesuThe data value V in the current coordinate axis is usediqCorresponding division of coordinate axis into NVu"coordinate axis subsections";
step 5.4: calculating the heights of all coordinate axis subsections in the current coordinate axis, and enabling the integral value type data dimension DuConversion value of p and integer data dimension DiThe height of the coordinate axis subsection with the conversion value of q is named as
Figure FSB0000186370040000031
Step 5.5: calculating the initial heights of all coordinate axis subsections in the current coordinate axis, and enabling the integral value type data dimension DuConversion value of p and integer data dimension DiThe starting height of the coordinate axis sub-segment with the conversion value q is recorded as
Figure FSB0000186370040000032
Step 5.6: setting the next mapping height of the coordinate axis sub-segment for each coordinate axis sub-segment of the current coordinate axis, and assigning an initial value as the initial height of the coordinate axis sub-segment;
step 6: the step is that a needle is usedInteger data dimension DiThe left adjacent coordinate axis is a non-integer data dimension or an integer data dimension DiA calculation method for mapping the data value to the basic data under the condition of the coordinate axis corresponding to the leftmost data dimension;
for integer data dimension DiThe next mapping height Hnext of the coordinate axis segment with the conversion value of qiqIs assigned as the Hstart of the coordinate axis segmentiq,HstartiqI.e. the data value V obtained according to step 2.2iqThe starting height of the corresponding coordinate axis segment;
and 7: calculating the mapping height of each dimension data value on the corresponding coordinate axis of each record in the data set;
for each record, if the current data dimension is a non-integer data dimension, calculating the mapping height of the data value on the corresponding coordinate axis by using a traditional method;
if the current data dimension is an integer data dimension and the coordinate axis adjacent to the left side of the corresponding coordinate axis is an integer data dimension coordinate axis, continuing to execute the step 7.1;
if the current data dimension is an integer data dimension and the corresponding coordinate axis is the leftmost coordinate axis or the left adjacent coordinate axis is a non-integer data dimension, continuing to execute the step 7.3;
step 7.1: integer data dimension D for the left sideuConversion value is p and current integer data dimension DiConverting the data record with the value q, and acquiring the next mapping height of the coordinate axis sub-segment corresponding to the data record in step 5
Figure FSB0000186370040000041
I.e. the data in the integer data dimension DiMapping height on corresponding coordinate axes;
step 7.2: in an integer data dimension D according to the current recordiThe conversion value q of (a) is obtained in step 3 as a recording offset height I of the corresponding "coordinate axis segmentiqUpdate
Figure FSB0000186370040000042
Skipping to step 8;
step 7.3: for a current integer data dimension DiConverting the data records with value q, V obtained in step 6iqNext mapping height Hnext of corresponding coordinate axis sub-segmentiq,HnextiqThat is, the piece of data is in the data dimension DiMapping height on corresponding coordinate axes;
step 7.4: in an integer data dimension D according to the current recordiThe conversion value q of (a) is obtained in step 3 as a recording offset height I of the corresponding "coordinate axis segmentiqUpdate Hnextiq=Hnextiq+Iiq
And 8: in order to distinguish each coordinate axis segment in the coordinate axes corresponding to the integer value type data dimension, different textures can be set for each coordinate axis segment, and the textures can be selected by using distinguishable colors or shading;
and step 9: and drawing an improved parallel coordinate visualization result of the current data set according to the coordinate axis information obtained in the steps 1 to 8, the mapping heights of all records and the segmented textures of the coordinate axes.
CN201810131947.4A 2018-02-09 2018-02-09 Parallel coordinate improvement method for multi-dimensional integer data set Active CN108319715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810131947.4A CN108319715B (en) 2018-02-09 2018-02-09 Parallel coordinate improvement method for multi-dimensional integer data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810131947.4A CN108319715B (en) 2018-02-09 2018-02-09 Parallel coordinate improvement method for multi-dimensional integer data set

Publications (2)

Publication Number Publication Date
CN108319715A CN108319715A (en) 2018-07-24
CN108319715B true CN108319715B (en) 2020-05-22

Family

ID=62903250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810131947.4A Active CN108319715B (en) 2018-02-09 2018-02-09 Parallel coordinate improvement method for multi-dimensional integer data set

Country Status (1)

Country Link
CN (1) CN108319715B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332056A (en) * 2011-09-27 2012-01-25 浙江工业大学 Information visualization technology-based house property data visualization system
CN106021529A (en) * 2016-05-25 2016-10-12 浙江工业大学 Visualization method for circulations of large files based on parallel coordinate system
CN106951903A (en) * 2016-10-31 2017-07-14 浙江大学 A kind of method for visualizing of crowd's movement law
CN107067427A (en) * 2017-05-18 2017-08-18 北京工商大学 A kind of polar coordinates layout method for visualizing for the residual detection data of agriculture

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8374431B2 (en) * 2010-07-27 2013-02-12 Aerotec, Llc Method and apparatus for direct detection, location, analysis, identification, and reporting of vegetation clearance violations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332056A (en) * 2011-09-27 2012-01-25 浙江工业大学 Information visualization technology-based house property data visualization system
CN106021529A (en) * 2016-05-25 2016-10-12 浙江工业大学 Visualization method for circulations of large files based on parallel coordinate system
CN106951903A (en) * 2016-10-31 2017-07-14 浙江大学 A kind of method for visualizing of crowd's movement law
CN107067427A (en) * 2017-05-18 2017-08-18 北京工商大学 A kind of polar coordinates layout method for visualizing for the residual detection data of agriculture

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于平行坐标的多视图协同可视分析方法;陈谊等;《系统仿真学报》;20130131;第25卷(第1期);第81-86页 *

Also Published As

Publication number Publication date
CN108319715A (en) 2018-07-24

Similar Documents

Publication Publication Date Title
JP5390506B2 (en) Video detection system and video detection method
Geng et al. Angular histograms: Frequency-based visualizations for large, high dimensional data
US8022952B2 (en) Generating a visualization to show mining results produced from selected data items and attribute(s) in a selected focus area and other portions of a data set
Krstajic et al. Cloudlines: Compact display of event episodes in multiple time-series
US20090310861A1 (en) Image processing
CN106844664B (en) Time series data index construction method based on abstract
CN1379887A (en) System for analyzing and improving pharmaceutical and other capital-intensive manufacturing processes
US20030214504A1 (en) Method for visualizing graphical data sets having a non-uniform graphical density for display
US7643029B2 (en) Method and system for automated visual comparison based on user drilldown sequences
JP2004118867A (en) Graph generation support program, recording medium recorded with the same, and automatic graph generation method
EP1635277A2 (en) System and methods for visualizing and manipulating multiple data values with graphical views of biological relationships
US7760203B1 (en) Graphic color-pixel-based visual-analytic representations for datasets
Filzmoser et al. Robust statistical analysis
CN101300576A (en) Image comparison
WO2019200739A1 (en) Data fraud identification method, apparatus, computer device, and storage medium
CN102013049A (en) Virtual organization-based KPI analysis method and statistical analysis system
CN110795463B (en) Mass time series data visualization method for transient analysis of power system
CN109543525B (en) Table extraction method for general table image
CN108319715B (en) Parallel coordinate improvement method for multi-dimensional integer data set
CN112819918A (en) Intelligent generation method and device of visual chart
CN112825084A (en) Multidimensional data visualization method based on parallel coordinate optimization
CN108648245B (en) Information extraction method and device for well logging interpretation curve
US8122056B2 (en) Interactive aggregation of data on a scatter plot
CN110866689A (en) Method for selecting maximum scanning window in space scanning statistics
US8924843B1 (en) Visualizing a plurality of times series in corresponding cell-based lines of a display region

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240220

Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Dragon totem Technology (Hefei) Co.,Ltd.

Country or region after: China

Address before: 100048 Beijing Haidian District Fucheng Road 33 Beijing University of Industry and Commerce

Patentee before: BEIJING TECHNOLOGY AND BUSINESS University

Country or region before: China