CN113222328B

CN113222328B - Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity

Info

Publication number: CN113222328B
Application number: CN202110320263.0A
Authority: CN
Inventors: 康宇; 陈杰; 曹洋; 许镇义; 夏秀山; 李兵兵
Original assignee: Anhui Ecological Environment Monitoring Center Anhui Heavy Pollution Weather Forecast And Early Warning Center; Institute of Advanced Technology University of Science and Technology of China
Current assignee: Anhui Ecological Environment Monitoring Center Anhui Heavy Pollution Weather Forecast And Early Warning Center; Institute of Advanced Technology University of Science and Technology of China
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2022-02-25
Anticipated expiration: 2041-03-25
Also published as: CN113222328A

Abstract

The invention discloses an air quality monitoring equipment point arrangement and site selection method based on road segment pollution similarity, which comprises the following steps of constructing a graph convolution network model, and obtaining K tail gas pollution categories of urban road network roads by utilizing the driving tracks of motor vehicles in the urban road network and combining the topological connectivity of the urban road structure; according to the tail gas pollution categories of the roads, the distribution information entropy of the roads belonging to each category is calculated, and the establishment priority is marked, so that a proper point location is selected. The invention introduces a graph convolution network, combines road topological connectivity, and integrates external factors such as meteorological conditions, POIs distribution and the like, thereby extracting spatial characteristics. The invention fully utilizes the topological structure information and the road communication information of the traffic network and combines the external traffic information which is easy to obtain, thereby obtaining the road section arrangement priority and providing the recommendation of arrangement point positions for the supervision department, and the applicability is stronger.

Description

Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity

Technical Field

The invention relates to the technical field of environmental monitoring, in particular to an air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity.

Background

The method can accurately predict the air quality distribution of urban areas, and has important significance for government environment governance, daily health prevention of people and the like. The problem of monitoring equipment point distribution and site selection is a premise of urban atmospheric pollutant prediction, how to accurately predict the air quality of the whole city by using a small amount of monitoring equipment depends on whether the point position distribution of the monitoring equipment is reasonable or not.

Over the years, research on site selection and distribution mainly utilizes mathematical models and physical knowledge to solve the site selection problem through computational simulation. These methods are all determined by a series of optimization techniques, including the use of mathematical methods such as statistical analysis methods of correlation analysis and cluster analysis, or multiobjective optimization. To achieve steady state, the simulation process not only requires complex system programming, but also consumes a lot of computing power, and unrealistic assumptions and simplifications in modeling will further reduce the model efficiency; in addition, these methods typically ignore the road network topology in urban areas and some external influencing factors related to air quality distribution.

Disclosure of Invention

The invention provides a method for arranging and selecting sites of air quality monitoring equipment based on road section pollution similarity, which can solve the technical problems.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for arranging and selecting sites of air quality monitoring equipment based on road section pollution similarity comprises the following steps:

constructing a graph convolution network model, and obtaining K tail gas pollution categories of urban road network roads by using the driving tracks of the motor vehicles in the urban road network and combining the topological connectivity of the urban road structure;

according to the tail gas pollution categories of the roads, the distribution information entropy of the roads belonging to each category is calculated, and the establishment priority is marked, so that a proper point location is selected.

Further, the method for obtaining K tail gas pollution categories of urban road network roads by using the running track of the motor vehicle in the urban road network and combining the topological connectivity of the urban road structure specifically comprises the following steps:

s1, acquiring the running track data of the motor vehicle in the urban road network, and carrying out road matching according to the longitude and latitude information;

s2, dividing each track into a series of sub-tracks according to the intersection distribution of the road sections, namely the starting points of the road sections, combining the sub-tracks related to the same road section together to form a group of initial clusters, and expressing as IC ═ IC {₁，ic₂，...，ic_nWhere n is expressed as total number of links, ic_i＝{ic_i1，ic_i2，...，ic_iJThe track cluster of each road section is composed of sub tracks of all the vehicle types;

s3, calculating the road pollution emission X of the motorcade as { ice ] by using the localized emission factors according to the emission factor model₁，ice₂，...，ice_NWhere N is represented as the total number of road segments, ice_i＝{ice_i1，ice_i2，...，ice_idD represents the calculated total road section emission time number, wherein the total fleet emission amount at the time t of the road section i is as follows:

wherein, Ef_mRepresenting the emission factor of the m-th vehicle type，desity(ic_im) Representing the number of sub-tracks, len (ic), of the m-th vehicle type at time t on road section i_i) Representing the road length of the section i, and J representing the number of vehicle types;

the S3 further includes a link conversion for abstracting the link into nodes, a connection between two nodes represents connectivity of two corresponding links in the road network, and a given link connectivity graph G ═ V₀ε, W), where V0 is a finite set of vertices made up of N road segments, ε is a set of edges representing connectivity between the road segments, and W ∈ R^N×NRepresenting the adjacency weight matrix of graph G, each vertex v_i∈V₀The method comprises the following steps of (1) calculating an adjacency matrix of a mobile source pollution graph through road section connectivity, wherein the final spatial weighting adjacency matrix can be recorded as:

wherein, dist_ijRepresenting a section of road v_iAnd v_jNormalized distance between geographical locations, link (i, j) representing a road section v_iAnd v_jIf link (i, j) value is 1, v is_iAnd v_jWith the cells being connected, otherwise they are not, θ²0.05 is used to control adjacency matrix scale and sparsity;

the S3 further comprises that the exhaust emission X belongs to R in the urban road network^N×dInputting end-to-end structural clustering frame to obtain the clustering result of road section, dividing the whole road network section into several clustering sub-clusters (X)₁，X₂，...，X_K) Wherein K represents the number of clusters;

each row x_iE X represents the ith sample, i.e. the ith road segment, N is the total number of road segments, and d is the dimension, i.e. the total number of discharge moments.

Furthermore, the graph convolution network model comprises a BERT module for extracting pollution emission rules, a GCN module for extracting a road network structure, a DNN module for extracting external factors, and a double self-supervision module for unifying emission rule feature extraction and road network structure extraction.

Further, the BERT module extracts the exhaust emission X belonging to R^N×dUsing a Transformer encoder to iteratively calculate the discharge ice of each road section_i＝{ice_i1，ice_i2，...，ice_idEvery position ice of }_itIs hidden representation of each layer l

Stacking

Form a

Wherein d is₀Representing the characteristic dimension, d represents the time length of the time sequence, and the Transformer encoder is composed of two sublayers: the multi-head self-attention mechanism sublayer and the position are all connected with a feedforward network;

the multi-head attention mechanism sublayer projects Q, K, V through h different linear transformations, and finally splices different attention results:

MH(H₁ ^(l))＝[head₁；head₂；...；head_h]W^O

wherein the projection matrix of each head

All are learnable parameters, the parameters are not shared between layers, and the Attention function uses a telescopic dot product Attention mechanism:

wherein query Q, key K, and value V are derived from the same matrix H₁ ^lProjection, temperatureDegree of rotation

Is introduced to produce a softer attention distribution, avoiding extremely small gradients;

the position full-connection feedforward network is mainly based on linear projection, in order to endow model nonlinearity, the position full-connection feedforward network is applied to the output of the self-attention sublayer, the position full-connection feedforward network is the same at each position and consists of two affine transformations, and Gaussian Error Linear Unit (GELU) activation is arranged in the middle:

FFN(x)＝GELU(xW⁽¹⁾+b⁽¹⁾)W⁽²⁾+b⁽²⁾

gelu (x) x Φ (x) where Φ (x) is the cumulative distribution function of a standard gaussian distribution,

is a learnable parameter and each location is shared;

stacking the transform sub-layers to form a transform encoder layer, using dropout for the output of each sub-layer, using residual concatenation around the sub-layers, and then normalizing:

Trm(H₁ ^(l-1))＝LN(A^(l-1)+Dropout(PFFN(A^(l-1))))

A^(l-1)＝LN(H₁ ^(l-1)+Dropout(MH(H₁ ^(l-1))))

wherein LN is a defined layer normalization function;

embedding a location into an entry, whichInput device

Configured as an addition of the respective sequence value and the position embedding:

wherein ice_it∈ice_iIs d₀Embedding of the dimension at time t drainage, p_tIs d₀Position embedding of dimensions, using fixed sine embedding instead of learnable position embedding;

in the decoder, an encoder-decoder attention sublayer is added between a multi-head self-attention mechanism sublayer and a position full-connection feedforward network, in the encoder-decoder attention, Q is from the last output of the decoder, K and V are from the output of an encoder, the calculation mode is the same as that of the encoder, and the decoder outputs a reconstructed vector X' and has the following objective function:

furthermore, the DNN module is used for extracting external factors, wherein the external factors comprise meteorological data and urban interest point data Y belonging to R^N×d′Providing additional information for feature extraction of road movement source emission, wherein d' represents the dimension of external factor data, and an automatic encoder is adopted to learn the features of the external factors; assuming that the self-encoder has a total of L layers, L represents a specific number of layers, and the characteristics learned by the L-th layer encoding part are expressed as:

where φ is an activation function of the fully-connected layer, such as the Relu or Sigmoid functions,

and

respectively, the weight matrix and the offset of the l-th layer of the encoder, denoted H₂ ⁽⁰⁾Represents the original data Y;

the encoded part is followed by a decoder to reconstruct the input, which uses some fully-connected layers, denoted as:

wherein the content of the first and second substances,

and

respectively representing the weight matrix and the deviation of the decoder;

the decoder outputs reconstructed extrinsic factor data Y' with the following objective function:

further, the graph convolution network model further comprises a feature fusion module, and the external factor features and the emission rule features of each layer are added to form fusion features:

H^(l)＝H₁ ^(l)+H₂ ^(l)。

furthermore, the GCN module is used for extracting a road network structure, and G ═ V (V) is determined according to a road network connectivity graph₀Epsilon, W), integrating the fusion features into the GCN, which can learn to represent that three different pieces of information are accommodated: the relationship between the emission data itself, external factors, and data; the representation of the l-th layer GCN learning can be performed by the following convolution operation:

wherein

I is the unit diagonal matrix of the self-circulating adjacency matrix W for each node, Z^(l-1)Adjacency matrix by normalization

Propagate to obtain a new representation Z^(l)Considering H^lCombines the time sequence characteristics and the external factor characteristics of discharge and combines Z^(l-1)And H^(l-1)Binding gives a stronger representation:

wherein ∈ is an equilibrium coefficient; in this way, the BERT module, the DNN module and the GCN module are connected layer by layer;

generating the representation Z as an input to the first layer GCN^(l)：

Fusion feature H^(l-1)By normalizing the matrix

The information of each layer of fusion features is different, the fusion features of each layer are transferred to the corresponding GCN layer for information propagation, and the propagation operator works in the whole model for L times;

the inputs to the first layer GCN are raw emission data X:

the last layer of the GCN module is a multi-layer classification layer with a softmax function:

result z_ijE is Z, the probability sample i belongs to the clustering center j, and Z is probability distribution.

Further, the probability distribution Z is supervised using the target distribution P:

the overall loss function is:

wherein, alpha is a coefficient for controlling the GCN module to embed the space interference.

Further, the calculating, according to the tail gas pollution categories of the roads, the distribution information entropy of the road belonging to each category, and marking the establishment priority, thereby selecting a suitable point location specifically includes:

setting cluster labels for road sections, selecting probability distribution Z as a final clustering result, finding a value with the highest quantization probability from the distribution Z, and distributing the value as the cluster labels of the road sections i:

i.e. the whole network segment can be divided into several cluster sub-clusters (X)₁，X₂，...，X_K) Wherein K denotes the number of clusters, cluster sub-cluster (X)₁，X₂，...，X_K) Actually represents K categories of urban exhaust pollution;

a plurality of clustering sub-clusters (X) obtained by dividing road sections of the whole road network₁，X₂，...，X_K) Respectively, respectivelyCounting and obtaining each sub-cluster X_KNumber N of included roads_KFurther calculating each sub-cluster X_KNumber n of air quality monitoring stations to be constructed_i：

For each sub-cluster X_KAnd calculating the distribution information entropy of the probability distribution Z of each road i, which is defined as follows:

H(Z)＝{-∑[Zlog(Z)+(1-Z)log(1-Z)]}

calculating the numerical value of the information entropy to mark the established priority of each road section i according to the calculated distribution information entropy of each road i; the smaller the value of the distribution information entropy is, the road and the sub-cluster X which the road belongs to are indicated_KThe higher the relevance of other roads, i.e. the more representative the road can be of the corresponding pollution class sub-cluster X_K；

In each sub-cluster X_KThe priority is labeled and established according to the value of the distribution information entropy of each road i, and the higher the value of the information entropy is, the higher the priority is established for the road with the larger value of the information entropy;

in each sub-cluster X_KIn the method, n is constructed according to the construction priority of the road_iAnd (4) an air quality monitoring station.

Further, the balance coefficients ∈, both set to 0.5

According to the technical scheme, the air quality monitoring equipment point arrangement and site selection method based on the road section pollution similarity introduces a graph convolution network from the technical innovation point of view, combines road topological connectivity, and integrates external factors such as meteorological conditions and POI distribution, so as to extract spatial features. Obtaining the tail gas pollution category of all roads in a city by using the running track of the motor vehicle in the city road network; and then, according to the tail gas pollution categories of the roads, calculating the distribution information entropy of the roads belonging to each category, and marking the establishment priority, thereby achieving the purpose of recommending the position of a newly established station. The invention fully utilizes the topological structure information and the road communication information of the traffic network and combines the external traffic information which is easy to obtain, thereby obtaining the road section arrangement priority and providing the recommendation of arrangement point positions for the supervision department, and the applicability is stronger.

The method is different from the traditional monitoring station position recommendation method, defines the urban road network as a graph structure by combining external complex environment characteristics, introduces a graph convolution network, and captures spatial characteristics by combining road topological connectivity. According to the obtained pollution category of each road in the urban road network, the road section arrangement priority is obtained, recommendation of arrangement point locations is provided for supervision departments, the newly-established air observation station can represent the air quality distribution of each category to the greatest extent, the cost is saved, and the method has certain value significance in practical application.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a schematic diagram of the structure of the present invention;

fig. 3 is a diagram of an application example of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

As shown in fig. 1 and fig. 2, the air quality monitoring device stationing and addressing method based on road segment pollution similarity according to the embodiment has a main task of selecting K positions from candidate areas to newly build an air quality monitoring device, and the K positions can improve the characterization capability of the air quality distribution of the whole city to the greatest extent. Firstly, acquiring K tail gas pollution categories of urban road networks by utilizing the running tracks of the motor vehicles in the urban road networks and combining the topological connectivity of the urban road structures; further, the establishment priority of each road category is calculated according to the distribution information entropy of the road, so that a proper point is selected. The method comprises the following specific steps:

wherein, Ef_mEmission factor, severity (ic) representing the m-th vehicle model_im) Representing the number of sub-tracks, len (ic), of the m-th vehicle type at time t on road section i_i) The road length of the link i is indicated, and J indicates the number of vehicle types.

Further, the link conversion abstracts the links into nodes, the connection between two nodes represents the connectivity of two corresponding links in the road network, and the given link connectivity graph G ═ V₀ε, W) in which V₀Is a finite set of vertices consisting of N road segments, epsilon is a set of edges representing connectivity between road segments, W is an element of R^N×NThe adjacency weight matrix of fig. G is represented. Each vertex v_i∈V₀Containing geographical location information. Calculating an adjacency matrix of the mobile source pollution graph through the road section connectivity, wherein the final spatial weighting adjacency matrix can be recorded as:

wherein, dist_ijRepresenting a section of road v_iAnd v_jNormalized distance between geographical locations, link (i, j) representing a road section v_iAnd v_jIf link (i, j) value is 1, v is_iAnd v_jThe cells are connected, otherwise they are not. Theta²0.05 is used to control adjacency matrix scale and sparsity.

Furthermore, the exhaust emission X in the urban road network belongs to R^N×dInputting end-to-end structural clustering frame to obtain the clustering result of road section, dividing the whole road network section into several clustering sub-clusters (X)₁，X₂，...，X_K) Where K represents the number of clusters. Each row x_iE X represents the ith sample, i.e. the ith road segment, N is the total number of road segments, and d is the dimension, i.e. the total number of discharge moments. Specifically, the method comprises a BERT module for extracting pollution discharge laws, a GCN module for extracting a road network structure, a DNN module for extracting external factors, and a dual self-supervision module for unifying discharge law feature extraction and road network structure extraction:

s4 and BERT module for extracting exhaust emission X from R^N×dUsing a Transformer encoder to iteratively calculate the discharge ice of each road section_i＝{ice_i1，ice_i2，...，ice_idEvery position ice of }_itIs hidden representation of each layer l

Stacking

Form a

Wherein d is₀Representing the characteristic dimension, d represents the time length of the time sequence, and the Transformer encoder is composed of two sublayers: multi-head self-attention deviceThe system layer and the position are all connected with the feedforward network.

MH(H₁ ^(l))＝[head₁；head₂；...；head_h]W^O

wherein the projection matrix of each head

wherein query Q, key K, and value V are derived from the same matrix H₁ ^lProjection, temperature

Is introduced to produce a softer attention profile, avoiding extremely small gradients.

FFN(x)＝GELU(xW⁽¹⁾+b⁽¹⁾)W⁽²⁾+b⁽²⁾

GELU(x)＝xΦ(x)

where Φ (x) is the cumulative distribution function of the standard gaussian distribution,

is a learnable parameter and is shared by locations.

Trm(H₁ ^(l-1))＝LN(A^(l-1)+Dropout(PFFN(A^(l-1))))

A^(l-1)＝LN(H₁ ^(l-1)+Dropout(MH(H₁ ^(l-1))))

where LN is a defined layer normalization function.

Embedding a location into an entry, the entry of which

wherein ice_it∈ice_iIs d₀Embedding of the dimension at time t drainage, p_tIs d₀Dimensional position embedding, fixed sine embedding is used instead of learnable position embedding.

In the decoder, an encoder-decoder attention sublayer is added between a multi-head self-attention mechanism sublayer and a position full-connection feedforward network, in the encoder-decoder attention, Q is from the last output of the decoder, K and V are from the output of an encoder, and the calculation mode is the same as that of the encoder. The decoder outputs a reconstructed vector X' with the following objective function:

s5, DNN module extracts external factors such as weather data and city interest point data Y ∈ R^N×d′Additional information may be provided for feature extraction of road movement source emissions, where d' is expressed as a dimension of the extrinsic factor data, and an auto-encoder is employed to learn the features of the extrinsic factors. Assuming that the self-encoder has a total of L layers, L represents a specific number of layers, and the characteristics learned by the L-th layer encoding part are expressed as:

and

respectively, weight matrix and offset of the l-th layer of the encoder, and the invention is further described in H₂ ⁽⁰⁾Representing the original data Y.

wherein the content of the first and second substances,

and

individual watchThe weight matrix and bias of the decoder are shown.

and S6, fusing the characteristics, namely adding the external factor characteristics and the emission rule characteristics of each layer to form fused characteristics.

H^(l)＝H₁ ^(l)+H₂ ^(l)

S7, GCN module extracts road network structure, according to road network connection graph G ═ (V)₀Epsilon, W), integrating the fusion features into the GCN, which can learn to represent that three different pieces of information are accommodated: emission data itself, external factors, relationships between data. The representation of the l-th layer GCN learning can be performed by the following convolution operation:

which is composed of

where e is the balance coefficient, the invention is all set to 0.5. In this way, the invention can connect the BERT module, the DNN module and the GCN module layer by layer.

Generating the representation Z as an input to the first layer GCN^(l)：

Fusion feature H^(l-1)By normalizing the matrix

Propagation, because the representation information of the fused features of each layer is different, in order to keep as much information as possible, the invention transfers the fused features of each layer to the corresponding GCN layer for information propagation, and as shown in FIG. 1, propagation operators work L times in the whole model.

The inputs to the first layer GCN are raw emission data X:

The invention uses the target distribution P to supervise the probability distribution Z:

overall, the overall loss function of the framework of the invention is:

S8, obtaining a stable result after training, setting cluster labels for road sections, selecting probability distribution Z as a final clustering result, finding a value with the highest quantization probability from the distribution Z, and distributing the value as the cluster label of the road section i:

i.e. the whole network segment can be divided into several cluster sub-clusters (X)₁，X₂，...，X_K) Where K represents the number of clusters.

Cluster son (X)₁，X₂，...，X_K) Actually, K categories of urban tail gas pollution are shown, and the method only needs to find the roads which can represent the corresponding categories most among the K basic categories, and then builds the corresponding quantity of air quality monitoring equipment, namely, the K air quality monitoring equipment with the required construction can be used for reflecting the air quality distribution of the whole city to the maximum extent.

S9, dividing road sections of the whole road network into a plurality of clustering sub-clusters (X)₁，X₂，...，X_K) Respectively counting and obtaining each sub-cluster X_KNumber N of included roads_KFurther calculating each sub-cluster X_KNumber n of air quality monitoring stations to be constructed_i：

S10, aiming at each sub-cluster X_KAnd calculating the distribution information entropy of the probability distribution Z of each road i, which is defined as follows:

H(Z)＝{-∑[Zlog(Z)+(1-Z)log(1-Z)]}

and S11, calculating the numerical value of the information entropy according to the calculated distribution information entropy of each road i to mark the established priority of each road section i. The smaller the value of the distribution information entropy is, the road and the sub-cluster X which the road belongs to are indicated_KThe higher the relevance of other roads, i.e. the more representative the road can be of the corresponding pollution class sub-cluster X_K. The significance of the road of the constructed air quality monitoring station is to improve the air quality distribution precision of the whole city to the greatest extent, and new monitoring equipment should be preferentially established on the road with high relevance, so that the method is more representative and can more directly represent the air quality distribution of the whole city. Therefore, in each sub-cluster X_KAnd marking and establishing the priority according to the value of the distribution information entropy of each road i. The road with larger information entropy value has higher priority.

S12, in each sub-cluster X_KIn the method, n is constructed according to the construction priority of the road_iAnd (4) an air quality monitoring station.

As shown in fig. 3, this is recommended by air quality monitoring sites in the beijing city. The green symbol mark g is the original air quality monitoring site in Beijing, and the orange symbol mark o is marked with 1, 2, 3, 4 and 5 which are the optimal sites in the corresponding candidate areas recommended by the patent.

In summary, the air quality monitoring device point arrangement and site selection method based on the road section pollution similarity has the advantages that: different from the traditional monitoring station position recommendation method, the method combines external complex environment characteristics, defines the urban road network as a graph structure, introduces a graph convolution network, and combines road topology connectivity to capture spatial characteristics. According to the obtained pollution category of each road in the urban road network, the road section arrangement priority is obtained, recommendation of arrangement point locations is provided for supervision departments, the newly-established air observation station can represent the air quality distribution of each category to the greatest extent, the cost is saved, and the method has certain value significance in practical application.

In another aspect, the present invention discloses a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.

It is understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and the explanation, the example and the beneficial effects of the related contents can refer to the corresponding parts in the method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. .

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for arranging and selecting sites of air quality monitoring equipment based on road section pollution similarity is characterized by comprising the following steps,

according to the tail gas pollution categories of roads, calculating the distribution information entropy of the roads belonging to each category, and marking the construction priority level, thereby selecting a proper point location;

the method for obtaining K tail gas pollution categories of the urban road network road by utilizing the running track of the motor vehicle in the urban road network and combining the topological connectivity of the urban road structure specifically comprises the following steps:

wherein, Ef_mEmission factor, severity (ic) representing the m-th vehicle model_im) Representing the number of sub-tracks, len (ic), of the m-th vehicle type at time t on road section i_i) Representing the road length of the section i, and J representing the number of vehicle types;

the S3 further includes a link conversion for abstracting the link into nodes, a connection between two nodes represents connectivity of two corresponding links in the road network, and a given link connectivity graph G ═ V₀ε, W) in which V₀Is a finite set of vertices consisting of N road segments, epsilon is a set of edges representing connectivity between road segments, W is an element of R^N×NRepresenting the adjacency weight matrix of graph G, each vertex v_i∈V₀The method comprises the following steps of (1) calculating an adjacency matrix of a mobile source pollution graph through road section connectivity, wherein the final spatial weighting adjacency matrix can be recorded as:

wherein, dist_ijRepresenting a section of road v_iAnd v_jNormalized distance between geographical locations, link (i, j) representing road segmentv_iAnd v_jIf link (i, j) value is 1, v is_iAnd v_jWith the cells being connected, otherwise they are not, θ²0.05 is used to control adjacency matrix scale and sparsity;

2. The air quality monitoring device point placement and site selection method based on road segment pollution similarity according to claim 1, characterized by comprising the following steps: the graph convolution network model comprises a BERT module for extracting pollution emission laws, a GCN module for extracting a road network structure, a DNN module for extracting external factors and a double self-supervision module for unifying emission law feature extraction and road network structure extraction.

3. The air quality monitoring device point placement and site selection method based on road segment pollution similarity according to claim 2, characterized by comprising the following steps:

BERT module extracts exhaust emission X from R^N×dUsing a Transformer encoder to iteratively calculate the discharge ice of each road section_i＝{ice_i1，ice_i2，...，ice_idEvery position ice of }_itIs hidden representation of each layer l

Stacking

Form a

MH(H₁ ^(l))＝[head₁；head₂；...；head_h]W^O

wherein the projection matrix of each head

FFN(x)＝GELU(xW⁽¹⁾+b⁽¹⁾)W⁽²⁾+b⁽²⁾

GELU(x)＝xΦ(x)

is a learnable parameter and each location is shared;

Trm(H₁ ^(l-1))＝LN(A^(l-1)+Dropout(PFFN(A^(l-1))))

A^(l-1)＝LN(H₁ ^(l-1)+Dropout(MH(H₁ ^(l-1))))

wherein LN is a defined layer normalization function;

embedding a location into an entry, the entry of which

4. the air quality monitoring device point placement and site selection method based on road segment pollution similarity according to claim 3, characterized by comprising the following steps:

the DNN module is used for extracting external factors, wherein the external factors comprise meteorological data and urban interest point data Y belonging to R^N×d′Providing additional information for feature extraction of road movement source emission, wherein d' represents the dimension of external factor data, and an automatic encoder is adopted to learn the features of the external factors; assuming that the self-encoder has a total of L layers, L represents a specific number of layers, and the characteristics learned by the L-th layer encoding part are expressed as:

and

wherein the content of the first and second substances,

and

respectively representing the weight matrix and the deviation of the decoder;

5. the air quality monitoring device point placement and site selection method based on road segment pollution similarity according to claim 4, wherein the method comprises the following steps: the graph convolution network model further comprises a feature fusion module, and the feature fusion module is used for adding the external factor features and the emission rule features of each layer to form fusion features:

H^(l)＝H₁ ^(l)+H₂ ^(l)。

6. the air quality monitoring device point placement and site selection method based on road segment pollution similarity according to claim 5, wherein the method comprises the following steps:

the GCN module is used for extracting a road network structure and according to a road network connected graph G ═ V₀Epsilon, W), integrating the fusion features into the GCN, which can learn to represent that three different pieces of information are accommodated: the relationship between the emission data itself, external factors, and data; the representation of the l-th layer GCN learning can be performed by the following convolution operation:

wherein

generating the representation Z as an input to the first layer GCN^(l)：

Fusion feature H^(l-1)By normalizing the matrix

the inputs to the first layer GCN are raw emission data X:

7. The air quality monitoring device point placement and site selection method based on road segment pollution similarity according to claim 6, wherein the method comprises the following steps:

the probability distribution Z is supervised using the target distribution P:

the overall loss function is:

8. The air quality monitoring device point placement and site selection method based on road segment pollution similarity according to claim 7, characterized by comprising the following steps:

the method comprises the following steps of calculating distribution information entropy of roads belonging to each category according to tail gas pollution categories of the roads, and marking and establishing priority levels, so as to select proper point locations, and specifically comprises the following steps:

a plurality of clustering sub-clusters (X) obtained by dividing road sections of the whole road network₁，X₂，...，X_K) Respectively counting and obtaining each sub-cluster X_KNumber N of included roads_KFurther calculating each sub-cluster X_KNumber n of air quality monitoring stations to be constructed_i：

H(Z)＝{-∑[Zlog(Z)+(1-Z)log(1-Z)]}

9. The air quality monitoring device point placement and site selection method based on road segment pollution similarity according to claim 6, wherein the method comprises the following steps: the balance coefficients e are all set to 0.5.