CN116595399B

CN116595399B - Analysis method for inconsistent element correlation problem in coal

Info

Publication number: CN116595399B
Application number: CN202310704173.0A
Authority: CN
Inventors: 许娜; 汪茹; 朱伟; 李强; 王志玮
Original assignee: China University of Mining and Technology Beijing CUMTB
Current assignee: China University of Mining and Technology Beijing CUMTB
Priority date: 2023-06-14
Filing date: 2023-06-14
Publication date: 2024-01-05
Anticipated expiration: 2043-06-14
Also published as: CN116595399A

Abstract

The invention relates to the technical field of analysis of elemental data in coal, and discloses an analysis method for the problem of inconsistent elemental correlation in coal, which comprises the following steps: step one, performing data processing on elements in coal by a method based on symmetrical pivot coordinates; step two, a method for forming a weighted pivot coordinate by adding a weight coefficient to construct a symmetrical pivot coordinate; step three, improving a weighted pivot coordinate method, constructing an orthogonal coordinate system, calculating a weighting coefficient of coordinates, defining a variation matrix, obtaining a weight value relation, and calculating a weighted pivot coordinate; and fourthly, performing hierarchical clustering analysis on the element data in the processed coal. Therefore, the analysis method for the inconsistent problem of the element correlation in the coal can solve the inconsistent problem under two references, and the analysis and interpretation of the geochemistry of the coal can be intuitively carried out again according to the clustered result.

Description

Analysis method for inconsistent element correlation problem in coal

Technical Field

The invention relates to the technical field of analysis of elemental data in coal, in particular to an analysis method for the problem of inconsistent elemental correlation in coal.

Background

The occurrence state of elements in coal is of great significance to coal exploitation and research on mineral sources rich in coal. The elements in coal are mainly divided into two major categories, namely major elements and minor elements. The research significance of trace elements in coal mainly comprises 3 aspects, wherein the trace elements are often rich in precious strategic metal elements. The metal elements in the coal have great influence on the industrial application and environment, the content distribution rule and occurrence state of the metal elements are researched, and the metal elements have important theoretical guiding significance and practical significance for correctly evaluating the positive and negative effects of the metal elements in the coal on the industrial application of the coal, preventing the adverse effect of harmful trace metal elements in the coal on the environment and protecting the ecological environment. For example, finkelman considers that determining the presence of an element in coal helps to evaluate the effect of the element on the environment. The occurrence state of an element refers to the physical and chemical state of the element in a certain process of geochemical migration and the combination characteristics of the symbiotic element. From the geochemical point of view, the existence state of an element mainly refers to the binding state of the element, that is, the existence form of the element.

The methods for exploring the occurrence states of elements in coal can be roughly classified into physical experiment methods, chemical experiment methods, and mathematical analysis methods. The mathematical analysis methods mainly include correlation analysis (and ash correlation analysis, various sulfur correlation analysis, and macroelement correlation analysis), aggregation-like analysis, factor analysis, discriminant analysis, and the like. Correlation analysis and aggregation-like analysis among them are very common methods. There are several standard methods that require ashing of the coal prior to geochemical analysis. However, researchers are often interested in the compositional properties of whole coal, not its ash. The geochemical given sample data for any coal can be converted to each other on either the ash basis or the whole coal basis of its analysis. The composition data used by researchers may differ based on the measurement results of the same set of samples, single variable (mean, variance, distribution, etc.) and double variable (correlation coefficient, etc.), which may vary significantly. These differences are not true, but rather are "artifacts" created by the constituent nature of most geochemical constituent data. Since the composition data are forced to be constant sums, e.g. 100% or 1000000ppm, they possess curvilinear properties making the euclidean principle on which most statistical tests depend unsuitable, leading to erroneous results. The application of some conversion methods allows representation in euclidean space without fear of producing mathematically inconsistent results.

With the continuous and deep research, many students at home and abroad explore the occurrence characteristics, material sources, enrichment mechanisms and causes of the metal elements enriched in coal and byproducts in the migration process. The correlation analysis method is a method for researching the correlation degree between two or more groups of elements, and the occurrence state of the elements in the coal can be judged by using the correlation analysis method, namely, the occurrence state is judged according to the ash yield of the coal and the correlation coefficient between the trace element and the macroelement content. For example, when the occurrence state is determined based on the pearson correlation coefficient between the element and ash in the coal, positive correlation means that the element is in an inorganic bonding state, and negative correlation means that the element is in an organic bonding state.

The trace element formation in coal is important from both a scientific and environmental point of view, since the trace element behavior in coal depends not only on its content but also on its chemical morphology or formation. Statistical methods are one of the most commonly used indirect methods of interpreting element occurrence patterns. Many researchers have shown that the sum of the elemental levels in coal is a constant. The airchison indicates that the data with such a definite and constraint is component data. The localization and confinement of elements in coal appears at the full coal level: the sum of the element content (excluding the organic C, H, N and S) and the loss on ignition (LOI, loi=100% -ashields) is 100%; the ash content at the ash reference is expressed as: the sum of the contents of the macroelements and the microelements is 100 percent.

In 1866, the concept of component data was proposed, but there was little progress in the processing method of component data for a long time thereafter. In 1897 Person in the article discussing the problem of spurious correlations indicates that the processing of the component data is relatively complex and that direct correlation analysis of the component data may result in erroneous results. When component data is processed, the relationship between the components cannot be ignored. The traditional statistical analysis method mainly analyzes unrestricted data, and the direct analysis of the component data can obtain erroneous results. Until 1986, aitchison pointed out that the space in which the component data was located was a simplex space, which differs from the euro space in that: data in euclidean space can take values in the real domain, while fractional data in simplex space can be limited by definite and constraint. The limitation of the definition and the constraint makes the dimension component data actually only represent dimension information, for example, the three-dimensional component data space can be actually represented by only two-dimensional space, and the space is an equilateral triangle with side length of 1; the two-dimensional component data actually represents only one-dimensional information, and the space to be expanded is a line segment with a side length of 1. Typically, the distance metric commonly used in Euclidean space is Euclidean distance, and Aitchison considers the distance metric in simplex space to be Acheson distance. When analyzing component data using statistical analysis methods in euclidean space, the component data is converted into euclidean space, that is, the constraint of definite and constraint between the component data is eliminated.

For the component data conversion method, in 1986, aitchison proposed an asymmetric logarithmic transformation method (alr), which selects any one of the data as a denominator, the transformed data can overcome the constraint and the constraint of the component data and the transformed data takes a value in the real domain. On the basis of an asymmetric logarithmic ratio conversion method, aitchison also provides a symmetric logarithmic ratio conversion method (clr), which uses a geometric mean as a denominator of logarithmic ratios, but the sum of the converted data is zero and the definite sum constraint is not overcome. In 2002, wang Huiwen, liu Jiang et al propose a spherical coordinate transformation method which, unlike the two above-mentioned transformation methods, allows the presence of zero values, which makes the application of the method broader. In 2003, the asymmetric logarithmic transformation method and the symmetric logarithmic transformation method are improved on the basis of the component data geometry, and an equidistant logarithmic transformation method (ilr) is proposed, and the core of the method is to define new data by using standard orthonormal basis, so that the acheson distance of two variables in simplex space is equal to the euclidean distance after transformation into euclidean space.

Filzmoser considers a special case of the equidistant log-ratio conversion method as a pivot coordinate method (PC) when studying the component data conversion method. In 2009, filzmoser indicated that the statistical results obtained when component data were averaged and analyzed by variance were affected by component data determination and constraints. In 2010, filzmoser also reached the same conclusion when correlation analysis was performed on the constituent data. Meanwhile, filzmoser provides a Stability method (Stability) for measuring the relation between component data according to an equidistant logarithmic ratio conversion method. In 2013, geboy applied the stability method to elements in the bond Creek coal in the united states, which indicated that stability was not an entirely measure of correlation, and was greatly affected by the difference between the two data. In 2017, hron proposed a weighted pivot coordinate method (WPC) according to the pivot coordinate method, which introduces a weight coefficient based on the pivot coordinate method, but the method changes the data from the original dimension to the dimension as in the pivot coordinate method. A symmetrical pivot coordinate method (SPC) was proposed in the same year, which represents the strength of association between each data and the other data by selecting a specific orthogonal coordinate. 2021, hron et al have also proposed a weighted symmetric pivot coordinate method (WSPC) that assigns a coordinate system to each data, each coordinate system representing information for the data based on the logarithmic ratio of the current data to the other data. The weighted symmetric pivot coordinate method also introduces a weight coefficient, which reduces the weight corresponding to data with larger variance, thereby inhibiting their effect on other data.

WSPC has several drawbacks, including the following:

processing of high-dimensional data is difficult: the WSPC algorithm is a sample point-based projection method that requires the selection of a few representative sample points and then the projection of all data points onto the orthogonal axes of these sample points. When the data dimension is high, it becomes more difficult to select representative sample points, and errors may occur in the results obtained after projection.

The scalability is not enough: the WSPC algorithm is difficult to handle for large-scale data because it requires processing each sample point and comparing it with the center point, and such a calculation amount increases with the increase of the data size, resulting in insufficient scalability of the algorithm.

Is sensitive to noise and outliers: the WSPC algorithm is a projection method based on sample points, and when noise or abnormal values exist in the sample points, the result of the algorithm can be greatly influenced, so that the dimension reduction result is inaccurate. Therefore, when using the WSPC algorithm, attention is paid to the quality problem of the data.

Disclosure of Invention

The invention aims to provide an analysis method for the problem of inconsistent element correlation in coal, and solves the problem in the background technology.

In order to achieve the above object, the present invention provides an analysis method for a problem of inconsistent correlation of elements in coal, comprising the steps of:

step one, performing data processing on elements in coal by a method based on symmetrical pivot coordinates;

step two, a method for forming a weighted pivot coordinate by adding a weight coefficient to construct a symmetrical pivot coordinate;

step three, improving a weighted pivot coordinate method, constructing an orthogonal coordinate system, calculating a weighting coefficient of coordinates, defining a variation matrix, obtaining a weight value relation, and calculating a weighted pivot coordinate;

and fourthly, performing hierarchical clustering analysis on the element data in the processed coal.

Preferably, in the first step, data processing analysis is carried out on elements in the coal by a method based on symmetrical pivot coordinates;

the elements in the coal belong to the component data, and for the D-dimensional component data x= (x) ₁ ,...,x _D ) Two data form a logarithmic ratio, D data form togetherGroup log ratios, the log numbers are as follows:

ξ _ij represents a logarithmic ratio, x _i Represents the ith data of D-dimensional component data, x _i Represents the j-th data of the D-dimensional component data.

Preferably, the symmetrical pivot coordinate method is generated in the selection of a special orthogonal basis of the equidistant logarithmic transformation method, and the expression is as follows:

Z _i representing pivot coordinates, D representing D-dimensional data, x _i Represents the ith data of D-dimensional component data, k represents the data of which number, x _k Represents a kth element;

for x ₁ Is expressed as:

Z _i representing pivot coordinates, D representing D-dimensional data, x ₁ Represents the 1 st data of D-dimensional component data, i represents the data of which number, x ₂ Represents the 2 nd element;

element x ₁ The logarithmic ratios with other elements are respectively:

x ₁ represents the 1 st data of D-dimensional component data, x ₂ Represents the 2 nd data of D-dimensional component data, x ₃ Represents the 3 rd data of the D-dimensional component data, x _D Represents the D-th data of the D-dimensional component data.

Preferably, in the third step, a weight coefficient is added to each logarithmic ratio by adding a weight coefficient to construct a symmetrical pivot coordinate and providing a weighted pivot coordinate method, namely:

α ₂ ...α _D representing a weight coefficient;

wherein alpha is ₂ +...+α _D Weight w in the weighted pivot coordinate method _i According to alpha in the formula _k Further calculations are performed for Z in the weighted pivot coordinate method _i Converting x in coordinates ₁ The corresponding weights are expressed as:

w _i represents a weight set, w _i ＝(w _1i ,...,w _iD )'，Representing a weight coefficient;

directed to x _i The weight set is as follows:

is a weight coefficient.

Preferably, in the fourth step, the weighted symmetrical pivot coordinate method is further improved based on the weighted pivot coordinate method, and the weighted pivot coordinate method constructs an orthogonal coordinate system, wherein the first two coordinatesAnd->Representing component data x= (X) ₁ ,...,x _n ) X in the middle ₁ And x ₂ Weight coefficient of>And->Is defined as follows, wherein the gamma, alpha and beta average component data X are calculated:

gamma, alpha and beta are all intermediate variables for calculating normalized weights;

wherein, for the variation matrix definition of the component data:

t represents a variation matrix, T represents a vector, x _i Represents the i-th element, x _j Represents a j-th element;

for x ₁ Normalized weight alpha of (a) ^* And x ₂ Normalized weight beta of (2) ^* Expressed as:

the values of gamma and c are according to alpha ^* And beta ^* Further performing calculation:

the relationship between the finally obtained weight values is:

weighted symmetrical pivot coordinates are calculated based on the above expression.

Preferably, in the fourth step, hierarchical clustering is performed on the element data in the processed coal, and the occurrence state of the element in the coal is analyzed through the combination of the clustering result graphs.

Therefore, the analysis method for the inconsistent element correlation problem in the coal has the following beneficial effects:

(1) The invention finds a method capable of intuitively solving the problem of consistency by means of computer technologies such as machine learning, and the like, can solve the problem of inconsistency under two references, and intuitively re-analyzes and interprets the geochemistry of the coal according to clustered results.

(2) The method adopted by the invention has more reasonable geochemical interpretation, reduces or eliminates the unreal results reflected in the low-quality data set by the traditional statistical method, so that the method can really provide help for the occurrence pattern analysis of the coal, improves the interpretability of the mathematical statistical analysis result, and can further provide valuable reference information for the exploitation and utilization of the coal.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a schematic flow chart of a method for analyzing a problem of inconsistent element correlation in coal according to the present invention;

FIG. 2 is a graph of elemental data processing in a large moat mining area coal in accordance with an embodiment of the present invention;

FIG. 3 is a graph of elemental data processing in coal in an African helminth mining area according to an embodiment of the present invention;

FIG. 4 is a graph of clustering results for a large moat mining area according to an embodiment of the present invention;

fig. 5 is a graph of clustering results of an ajar helminth mining area according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further described below through the attached drawings and the embodiments.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

Examples

FIG. 1 is a schematic flow chart of a method for analyzing a problem of inconsistent element correlation in coal according to the present invention; FIG. 2 is a graph of elemental data processing in a large moat mining area coal in accordance with an embodiment of the present invention; FIG. 3 is a graph of elemental data processing in coal in an African helminth mining area according to an embodiment of the present invention; FIG. 4 is a graph of clustering results for a large moat mining area according to an embodiment of the present invention; fig. 5 is a graph of clustering results of an ajar helminth mining area according to an embodiment of the present invention.

As shown in fig. 1, the method for analyzing the inconsistent correlation problem of elements in coal according to the invention comprises the following steps:

and step one, carrying out data processing on elements in the coal by a method based on symmetrical pivot coordinates.

The elements in the coal belong to component data, and the method for solving the component data mainly comprises logarithmic transformation. The conventional component data conversion method including asymmetric logarithmic ratio data conversion, symmetric logarithmic ratio conversion, equidistant logarithmic ratio conversion, weighted symmetric pivot coordinates and stability method is studied, and the asymmetric logarithmic ratio conversion method is improved according to the conventional component data conversion method in the present invention. The invention provides a brand new coal geological composition data analysis method based on a symmetrical pivot coordinate method.

For D-dimensional component data x= (x ₁ ,...,x _D ) Two data form a logarithmic ratio, D data form togetherGroup log ratios, which are expressed as linear combinations with each other, the log ratios are as follows:

Redundant information exists in these logarithmic ratios due to the definite and constraint of the constituent data. In fact, only D-1 sets of logarithmic ratios are needed to infer correlations between all data. However, the selection of the D-1 set of log ratios is difficult, as is the case with the asymmetric log ratio conversion method, which sacrifices one-dimensional data arbitrarily, and the equidistant log ratio conversion method sacrifices the last-dimensional data. The pivot coordinate method is similar to the equidistant logarithmic transformation method in solving the problem.

The symmetrical pivot coordinate method is generated in the selection of a special orthogonal basis of the equidistant logarithmic transformation method, and the expression is as follows:

Z _i representing pivot coordinates, D representing D-dimensional data, x _i Represents the ith data of D-dimensional component data, k represents the data of which number, x _k Representing the kth element.

For example, the calculation of the symmetrical pivot method, which is described in the calculation data x ₁ When corresponding conversion data, x ₁ Is contained only in the coordinate z ₁ But not in other coordinates. If it is an analysis of another part, e.g. of x ₂ Of interest, then by combining x ₂ Placed at the first position of all data, for x ₂ And the rest of the data are subjected to z _i And (5) pivot coordinate transformation. In this way, a D-1 dimensional pivot coordinate can be constructed for the component data, which are all rotations relative to each other, with only the first coordinate being used to interpret the respective portion. The biggest difference between the pivot coordinate method and the equidistant logarithmic ratio conversion method is that the dimension reduction is not realized, the pivot coordinate still obtains D dimension, the conversion method also belongs to a one-to-many conversion method, and the conversion is complex when the conversion is carried out back to the element data space in the coal.

For x ₁ In turn, can be expressed as:

Z _i representing pivot coordinates, D representing D-dimensional data, x ₁ Represents the 1 st data of D-dimensional component data, i represents the data of which number, x ₂ Representing element 2.

If only for element x ₁ Element x ₁ The logarithmic ratios with other elements are respectively:

And step two, a method for forming the weighted pivot coordinates by adding weight coefficients to construct the symmetrical pivot coordinates.

The two expressions are found to be very similar by the above formula, differing only by one factor. The symmetrical pivot coordinates are constructed by adding weight coefficients and a weighted pivot coordinate method is provided, wherein a weight coefficient is added to each logarithmic ratio, namely:

α ₂ ...α _D representing a weight coefficient;

wherein alpha is ₂ +...+α _D Weight w in the weighted pivot coordinate method _i According to alpha in the formula _k Further performing calculation, aiming at x in the conversion coordinates in the weighted pivot coordinate method ₁ The corresponding weights are expressed as:

directed to x _i The weight set is as follows:

is a weight coefficient.

And thirdly, improving a weighted pivot coordinate method, constructing an orthogonal coordinate system, calculating a weighting coefficient of the coordinate, defining a variation matrix, obtaining a weight value relation, and calculating the weighted pivot coordinate.

The core of the weighted pivot coordinate method is to introduce weight coefficients to represent the different importance between the data components, the purpose of which is to construct a orthonormal coordinate system.

Further improving the weighted symmetrical pivot coordinate method based on the weighted pivot coordinate method, the weighted pivot coordinate method constructs an orthogonal coordinate system, wherein the first two coordinatesAnd->Representing component data x= (X) ₁ ,...,x _n ) X in the middle ₁ And x ₂ Is used for the weighting coefficients of (a). />And->Wherein, gamma, alpha and beta can be calculated from the component data X:

gamma, alpha and beta are all intermediate variables for calculating normalized weights.

First, according to the elemental composition data x= (X) ₁ ,...,x _n ) The first two elements x of (2) ₁ And x ₂ The first two coordinates of the weighted symmetrical pivot coordinates are calculated, where there are n elements in total, and the correspondence between the coordinates is determined. Two elements involved in the calculation of the association (e.g. x _i And x _j ) Assigned to position x ₁ And x ₂ . Variation matrix of component data set X is constructed and addedThe basis of weight symmetry pivot coordinate weights. The definition of the variation matrix is as follows:

t represents a variation matrix, T represents a vector, x _i Represents the i-th element, x _j Representing the j-th element.

the relationship between the finally obtained weight values is:

And step four, carrying out data analysis on the element data in the processed coal, and calculating out pearson correlation coefficients of the two mining areas.

Taking the analysis of elemental data from large moat mining areas as an example, fig. 1.

Taking the analysis of the elemental data of the America helminth mining area as an example, fig. 2.

Hierarchical clustering is carried out on the pearson correlation coefficients obtained in the two mining areas, and occurrence states of elements in the coal are analyzed through combination of clustering result graphs, as shown in fig. 4 and 5.

Therefore, the analysis method for the inconsistent problem of the element correlation in the coal can solve the inconsistent problem under two references, and the analysis and interpretation of the geochemistry of the coal can be intuitively carried out again according to the clustered result.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. An analysis method for a problem of inconsistent element correlation in coal is characterized by comprising the following steps: the method comprises the following steps:

ξ _ij represents a logarithmic ratio, x _i Represents the ith data of D-dimensional component data, x _i Represents the j-th data of the D-dimensional component data;

the symmetrical pivot coordinate method is used for calculating data x ₁ When corresponding conversion data, x ₁ Is contained only in the coordinate z ₁ Not in other coordinates, if another part is analyzed, e.g. for x ₂ Of interest, then by combining x ₂ Placed at the first position of all data, for x ₂ And the rest of the data are subjected to z _i Pivot coordinate transformation, constructing D-1 dimensional pivot coordinates for the component data, which are all rotations relative to each other, wherein only the first coordinate is used to interpret the respective portion;

for x ₁ Is expressed as:

element x ₁ The logarithmic ratios with other elements are respectively:

x ₁ represents the 1 st data of D-dimensional component data, x ₂ Represents the 2 nd data of D-dimensional component data, x ₃ Represents the 3 rd data of the D-dimensional component data, x _D D data representing D-dimensional component data;

in the second step, a method for constructing symmetrical pivot coordinates by adding weight coefficients to form weighted pivot coordinates is adopted, and one weight coefficient is added to each logarithmic ratio:

α ₂ ...α _D representing a weight coefficient;

wherein alpha is ₂ +...+α _D Weight w in the weighted pivot coordinate method _i According to alpha in the formula _k Calculation is performed for Z in the weighted pivot coordinate method _i Converting x in coordinates ₁ The corresponding weights are expressed as:

directed to x _i The weight set is as follows:

is a weight coefficient;

in the third step, the first step is performed,

the weighted symmetrical pivot coordinate method is improved on the basis of the weighted pivot coordinate method, and the weighted pivot coordinate method constructs an orthogonal coordinate system, wherein the first two coordinatesAnd->Representing component data x= (X) ₁ ,...,x _n ) X in the middle ₁ And x ₂ Weight coefficient of>And->Is defined as follows, wherein the gamma, alpha and beta average component data X are calculated:

wherein, for the variation matrix definition of the component data:

the relationship between the finally obtained weight values is:

calculating weighted symmetrical pivot coordinates based on the above expression;

step four, performing hierarchical clustering analysis on the element data in the processed coal;

and fourthly, hierarchical clustering is carried out on the element data in the processed coal, and the occurrence state of the elements in the coal is analyzed through the combination of the clustering result graphs.