CN113505798A

CN113505798A - Time-varying data feature extraction and tracking method

Info

Publication number: CN113505798A
Application number: CN202110692086.9A
Authority: CN
Inventors: 马骥; 陈金金
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-10-15
Anticipated expiration: 2041-06-22
Also published as: CN113505798B

Abstract

A characteristic extraction and tracking method of time-varying data, let users choose their interesting characteristic on two slices of a certain time step of time-varying data at first, construct a series of algorithms on the basis of this and obtain a series of optimization GMM criterions that can be used for extracting the characteristic; secondly, extracting all features similar to the features selected by the user from each time step of the time-varying data by utilizing an optimized GMM (Gaussian mixture model) criterion; thirdly, constructing a global tracking graph for all the extracted features in all the time steps to record all the tracking information among the features; and finally, visualizing the tracking characteristics and the environment in which the tracking characteristics are located in an animation mode by utilizing a volume rendering algorithm. The invention can track the feature in the whole time-varying data only by providing a small amount of feature information (only feature information on two slices) by the user; the extracted features can be tracked from a global angle, so that tracking errors generated by using a local tracking method can be avoided, and the accuracy of feature tracking is improved.

Description

Time-varying data feature extraction and tracking method

Technical Field

The patent relates to the field of visualization and visual analysis, and in particular relates to a method for extracting and tracking characteristics of time-varying data by using an optimized Gaussian Mixture Model (GMM) criterion and a global tracking map.

Background

Scientific simulations often produce a wide variety of time-varying data because the natural or technical phenomena studied by these scientific simulations are time-dependent. Examples of such simulations are many, such as weather forecasting, computational fluid dynamics, combustion science, computational cosmology, climate pattern research, and the like. These generated time-varying data tend to be complex, large-scale, contain many variables and features, span large spaces and times. This data is originally useless to scientists, but can help scientists understand and gain insight into these complex time-varying phenomena as long as we can discover and reveal the trends and features hidden behind them. This is the object of time-varying data visualization. However, efficient feature extraction, feature tracking, and feature visualization of these time-varying data is not a simple task. Over the past two decades, many scholars have continually proposed various approaches to try to solve this problem.

In a recent review of the study, Bai et al systematically reviewed a number of Visualization techniques on time-varying data (references 1Z.H.Bai, Y.B.Tao, H.Lin.time-varying volume Visualization: a survey.journal of Visualization,23:745-761,2020. Z.H.Bai, Y.B.Tao, H.Lin. time-varying volume Visualization: review. Visualization journal, 23:745-761,2020.), and summarized and analyzed the respective techniques. From this review, it is clear that many of the proposed feature extraction and tracking methods require the user to provide a large amount of feature data (e.g., a volume of data) to their model in order to search, extract and track the feature over the entire simulated time span. In addition, when tracking features, these methods typically track the feature of interest locally based on two consecutive time steps. However, such local tracking methods sometimes result in erroneous tracking results (e.g., erroneously tracking one feature to another), and are susceptible to noise.

Disclosure of Invention

In order to solve the two problems, the invention provides a method for extracting and tracking features of time-varying data, which only needs a user to select two slices (instead of one slice) from the time-varying data and manually mark features of interest on the two slices, and then the features can be automatically extracted in all time steps. Furthermore, we propose a global tracking method that can track the extracted features from a global perspective, thereby avoiding tracking errors that can be generated with local tracking methods.

The technical scheme of the invention is as follows:

a feature extraction and tracking method of time-varying data, the method comprising the following four steps:

1) the optimized GMM criteria are generated as follows:

1.1, for original time-varying data, applying an automatic contrast enhancement method based on a histogram to enhance the contrast of the original time-varying data, and normalizing the original time-varying data into a range of [0,1] by utilizing a global maximum value and a global minimum value;

1.2, the user needs to observe the time-varying data with enhanced contrast, select a time step containing the feature of interest, and choose two slices from the time step and freely mark the feature of interest on the slices by using a mouse;

1.3, for each voxel marked as a feature by a user, finding a neighborhood which takes the voxel as a center and takes 11 multiplied by 11 as a window size, and calculating GMM of data in the neighborhood by utilizing an offline expection knowledge (EM) algorithm, wherein the GMM can simply represent the data distribution condition in the voxel neighborhood; the Gaussian mixture models generated by all the voxels marked as features form a set, which is called as a candidate GMM criterion;

1.4, applying a genetic algorithm to the candidate GMM criteria to filter out GMM criteria that may produce false positives, thereby retaining a set of GMM criteria that may produce true positives, which are referred to as optimized GMM criteria;

further, the process of 1.4 is as follows:

1.4.1, encoding the candidate GMM criterion into a binary character string s, wherein each bit of s corresponds to a specific candidate GMM criterion, if a certain bit of s is 1, the candidate GMM criterion corresponding to the bit is selected as an optimized GMM criterion, and if the bit of s is 0, the candidate GMM criterion is not selected as the optimized GMM criterion;

1.4.2, based on the encoding, a set of binary strings s of the parent population can be generated, where each bit of s is randomly assigned to 0 or 1; for each binary character string s in the father population, the fitness (fitness) is provided, and the higher the fitness is, the better the GMM criterion combination corresponding to the character s can predict the target characteristic; on the contrary, if the fitness is lower, the GMM criterion combination corresponding to the representative s cannot well predict the target feature; let v represent the foreground voxels on two selected slices, n_s(v)Representing the number of GMM criteria in a binary string s that a voxel v can match, t representing a user selected feature, then the following set is defined:

wherein, TP_sRepresenting a true positive (true positive) set in which v not only belongs to the labeled features, but also matches the GMM criterion in s; TN (twisted nematic)_sRepresenting a true negative (true negative) set in which v does not belong to a signature feature nor matches any GMM criterion in s; FP_sRepresents a false positive (false positive) set in which v does not belong to a signature but it matches the GMM criterion in s; FN (FN)_sRepresenting a false negative (false negative) set in which v belongs to a labeled feature but it does not match any GMM criterion in s, P represents a set of voxels belonging to a labeled feature, N represents a set of voxels that are not a feature; with the above sets, the fitness of each string s is calculated using equation (2):

1.4.3, using the perception Selection algorithm to randomly select binary strings in the parent population that possess high fitness and apply crossover and variation to them to obtain a set of binary strings s for the children, where equations (1) and (2) are again used to calculate the fitness of each binary string s for that child;

1.4.4, changing the filial generation into the parent generation, and using them to continuously generate the next generation;

1.4.5, repeating 1.4.3 and 1.4.4 until the maximum fitness of each generation converges, and finally, obtaining the optimized GMM criterion by decoding the binary string s of the last generation with the maximum fitness score.

2) Global feature extraction, the process is as follows:

2.1, calculating the Babbitt distance d (v) of the GMM of each foreground voxel neighborhood and the optimized GMM criterion by using the formulas (3) and (4):

wherein, w and w' respectively represent two Gaussian component weights; μ, μ' represents the average of two gaussian components; Σ, Σ' represents the variance of two gaussian components;

2.2, converting the babbit distance into probability by using the formula (5):

wherein exp () represents an exponential function, p (v) represents the probability that the voxel v belongs to the feature, and the larger the value of p (v), the larger the probability that the voxel v belongs to the feature; conversely, if the smaller the value of p (v), the lower the probability that a voxel v belongs to a feature, D is calculated by equation (6):

here, MD represents a matching degree parameter, which is specified by a user and is used to control the severity of a foreground voxel v belonging to a feature, and the larger MD value, the larger the foreground voxel having d (v) may also belong to the feature; conversely, if the MD value is smaller, foreground voxels with larger d (v) are unlikely to belong to the feature;

2.3, filtering out the foreground voxels with smaller probability values p (v) by adopting a threshold method; so far, for each time step of the time-varying data, features similar to the user marks are extracted from the time-varying data;

3) global feature tracking, the process is as follows:

3.1, applying 3D connected component analysis to probability data p (v) corresponding to each time step, thereby filtering out the characteristics with smaller connected components, namely, if a characteristic connected component is less than a threshold value, setting the probability to be 0; meanwhile, in the process of applying the 3D connected domain, all the characteristics of each time step are correspondingly labeled;

3.2, for any two features at every two consecutive time steps, e.g. a certain feature f at time step t_tAnd a certain characteristic f of time step t +1_t+1We calculate the Euclidean distance d between their centroids_c：

Wherein the content of the first and second substances,

representing a feature f_tThe centroid vector of (a) is,

representing a feature f_t+1The centroid vector of (a);

3.3, calculating the similarity d between the Chi-Squared histogram distance shown in the formula (8)_h：

Wherein

And

respectively represent histograms hf_tAnd hf_t+1The ith column of (1); further, d is normalized using formula (9)_h：

Wherein sf_tAnd sf_t+1Representing a feature f_tAnd f_t+1A set of voxels of (a);

3.4 in characteristic f_tAnd f_t+1A directed edge e (f) is established between_t,f_t+1) Let the weight we (f) of the edge_t,f_t+1)＝d_hWeight we (f) of the edge_t,f_t+1) Is represented by the feature f_tTracing to feature f_t+1The higher the weight, the feature f_tTracing to feature f_t+1The lower the probability of (a); conversely, if the weight is lower, the feature f_tTracing to feature f_t+1The higher the probability of (c); to this end, a directed acyclic graph is created in which each node represents an independent feature at a certain time step, and the weight d of the directed edge between the features_hRepresenting inter-feature tracking possibilities; since the graph records the possibility of tracing among all features in all time steps, the graph is called a global tracing graph GTG; to make the GTG more sparse, we set up a condition: if d is_cIf the value is less than a threshold value, establishing the edge, otherwise, not establishing the edge, wherein the condition accords with an assumption that the characteristic can slowly move between two continuous time steps;

3.5, applying Djikstra algorithm on GTG to track the features selected by the user; to do this, the user needs to point out two nodes on the GTG: one is a feature start node and the other is a feature end node, and based on the two nodes, the Djikstra algorithm can automatically track the feature in a global angle;

4) visualization, the process is as follows:

the tracked features and their environment are visualized in animation using volume rendering.

Further, in the step 4), in order to avoid introducing a new color during the volume rendering process, nearest neighbor interpolation is used.

The beneficial effects of the invention are as follows: the user need only provide little feature-related information (features on only two slices) to track out the features of interest in the time-varying data. In addition, the invention provides a global tracking method, which can track the extracted features from a global angle, thereby avoiding tracking errors generated by using a local tracking method and improving the accuracy of feature tracking.

Drawings

FIG. 1 is an overall flow chart of the present invention.

Fig. 2 is an optimized GMM criterion generated from user-tagged features in a 3D Flow dataset.

Fig. 3 is the result of tracking and visualizing the user-marked features in the 3D Flow dataset using the method of the present invention (where the black objects indicated by the black box arrows are the extracted and tracked features).

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, a method for extracting and tracking features of time-varying data includes four steps: optimized GMM criterion generation, global feature extraction, global feature tracking and visualization; in the following, we will describe each of these four steps in detail.

1) The optimized GMM criteria are generated as follows:

1.2, the user needs to observe the contrast enhanced time varying data, select a time step from them that contains the feature of their interest, and choose two slices from this time step and freely mark the feature of their interest on these slices with the mouse.

1.3, for each voxel marked as a feature by a user, finding a neighborhood with the voxel as a center and with a window size of 11 × 11, and calculating the GMM of the data in the neighborhood by using an offline expection knowledge (EM) algorithm, wherein the GMM can simply represent the data distribution in the voxel neighborhood. The Gaussian mixture models generated by all the voxels marked as features form a set, which is called as a candidate GMM criterion;

1.4, applying a genetic algorithm to the candidate GMM criteria to filter out GMM criteria that may produce false positives, thereby retaining a set of GMM criteria that may produce true positives, which are referred to as optimized GMM criteria; FIG. 2 shows optimized GMM criteria generated from characteristics of user tags.

Further, the process of 1.4 is as follows:

1.4.2, based on the encoding, a set of binary strings s of the parent population can be generated, where each bit of s is randomly assigned to 0 or 1; for each binary character string s in the father population, the fitness (fitness) is provided, and the higher the fitness is, the better the GMM criterion combination corresponding to the character s can predict the target characteristic; on the contrary, if the fitness is lower, the GMM criterion combination corresponding to the representative s cannot well predict the target feature; let v represent the foreground voxels on two selected slices, n_s(v)Representing a binary string to which a voxel v can be matchedThe number of GMM criteria in s, t representing the user selected feature, then define the set:

TP_s＝{v:n_s(v)＝1&label(v)＝t}

wherein, TP_sRepresenting a true positive set, wherein v not only belongs to the labeled features, but also matches the GMM criterion in s; TN (twisted nematic)_sRepresenting true negative set, in which v does not belong to the labeled feature and does not match any GMM criterion in s; FP_sRepresents a false positive set in which v does not belong to the annotated feature, but it matches the GMM criterion in s; FN (FN)_sRepresenting a false negative set in which v belongs to the annotation feature but it does not match any GMM criterion in s, P representing a set of voxels belonging to the annotation feature and N representing a set of voxels that are not a feature; with the above sets, the fitness of each string s is calculated using equation (2):

2) Global feature extraction, the process is as follows:

2.2, converting the babbit distance into probability by using the formula (5):

3) global feature tracking, the process is as follows:

Wherein the content of the first and second substances,

representing a feature f_tThe centroid vector of (a) is,

representing a feature f_t+1The centroid vector of (a);

Wherein

And

3.5, applying Djikstra algorithm on GTG to track the features selected by the user; to do this, the user needs to point out two nodes on the GTG: one is a feature start node and the other is a feature end node; based on these two nodes, the Djikstra algorithm can automatically track the feature from a global perspective;

4) visualization, the process is as follows:

visualizing the tracked features and the environment in an animation mode by using volume rendering; fig. 3 shows the tracking result of tracking a feature in a 3D Flow data set by using the method of the present invention (the black object indicated by the black box arrow is the tracking feature), from which the whole evolution process of the feature from appearance to disappearance can be clearly seen.

The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims

1. A method for extracting and tracking characteristics of time-varying data is characterized in that: the method comprises the following steps:

1) the optimized GMM criteria are generated as follows:

2) global feature extraction, the process is as follows:

2.2, converting the babbit distance into probability by using the formula (5):

3) global feature tracking, the process is as follows:

Wherein the content of the first and second substances,

representing a feature f_tThe centroid vector of (a) is,

representing a feature f_t+1The centroid vector of (a);

Wherein

And

4) visualization, the process is as follows:

2. The method of claim 1, wherein the method comprises: in the step 4), in order to avoid introducing new colors during the volume rendering process, nearest neighbor interpolation is used.

3. A method for feature extraction and tracking of time-varying data as claimed in claim 1 or 2, wherein: the process of 1.4 is as follows: