CN113239075B

CN113239075B - Construction data self-checking method and system

Info

Publication number: CN113239075B
Application number: CN202110524398.9A
Authority: CN
Inventors: 张艳红; 侯芸; 董元帅; 闫旭亮; 樊永伟; 钱振宇
Original assignee: Checsc Highway Maintenance And Test Technology Co ltd; China Highway Engineering Consultants Corp
Current assignee: Checsc Highway Maintenance And Test Technology Co ltd; China Highway Engineering Consultants Corp; CHECC Data Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2023-05-12
Anticipated expiration: 2041-05-13
Also published as: CN113239075A

Abstract

The invention provides a construction data self-checking method and a construction data self-checking system, wherein the construction data self-checking method comprises the following steps: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error. According to the construction data self-checking method and system, the association relation among the construction data under the same construction scene is extracted, the data anomaly detection model is constructed, the anomaly data is automatically detected according to the mean square error between the input and the output of the model, the existence of false data invalid data is prevented, the self-checking precision of the construction data is effectively improved, and the self-checking efficiency is greatly improved.

Description

Construction data self-checking method and system

Technical Field

The invention relates to the technical field of data processing, in particular to a construction data self-checking method and system.

Background

In highway engineering, along with the continuous development of electronic informatization, data in the construction management process are required to be recorded and archived and are ready for acceptance inspection. The data is recorded accurately and objectively. However, occasionally, the field data is not filled normally due to the reasons of limited service level, low responsibility and the like of some construction technicians, and false data, invalid data and the like are easy to appear. In actual construction acceptance detection, the false data or invalid data cannot truly reflect construction quality and various process records, so that acceptance is unqualified, and further the whole construction engineering is continuously modified and repeatedly detected, and the construction progress is seriously affected.

In order to ensure that the construction self-checking data is reliable and can truly reflect the actual quality condition of the engineering, prevent false data from spreading to damage the engineering quality of the construction project, provide the truest and reliable original quality data for the later maintenance operation, and have important significance for verifying the construction self-checking data.

Currently, the common detection methods are all based on statistical tests. Including detecting the validity of the data from statistical indicators such as variance, mean, etc. However, the method based on statistical test cannot effectively utilize the association relationship between data to realize the detection of the validity of the data, so that the detection efficiency is low and the detection precision is poor.

Disclosure of Invention

Aiming at the defects existing in the process of realizing the data self-checking in the prior art, the embodiment of the invention provides a construction data self-checking method and a construction data self-checking system, which can effectively improve the precision of the construction data self-checking.

The invention provides a construction data self-checking method, which comprises the following steps: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error.

According to the construction data self-checking method provided by the invention, the relevance is a pearson relevance coefficient, and the calculating of the relevance among all data fields in the construction data matrix comprises the following steps:

wherein X is the record data of data field X, Y is the record data of data field Y, sigma _X Is X standard deviation, sigma _Y Is Y standard deviation, cov (X, Y) is the covariance of X and Y.

According to the construction data self-checking method provided by the invention, before the feature extraction of the signature matrix is carried out by utilizing the pre-trained data anomaly detection model, the method further comprises the step of pre-training the pre-constructed data anomaly detection model in an unsupervised learning mode; the objective function of the unsupervised learning is:

/>

wherein ,Sⁱ In order to sign a matrix sample,

for outputting matrix samples, M is the iteration number of training; and Θ is a parameter to be trained of the data anomaly detection model, and i is an intermediate parameter.

According to the construction data self-checking method provided by the invention, the constructed data anomaly detection model is constructed based on a coding and decoding framework; the coding and decoding framework comprises a coding layer and a decoding layer; the coding layer comprises a convolution layer L ₁ Pool layer L ₂ Full connection layer L ₃ Full connection layer L ₄ And full connection layer L ₅ The method comprises the steps of carrying out a first treatment on the surface of the The convolution layer L ₁ The convolution kernel size of (2) is 3×3, the output channel is 6, and the input channel is 1; the pooling layer L ₂ The convolution kernel size of (2 x 2); the full connection layer L ₃ The input dimension of (n-2)/2 is rounded, the output dimension is 64, and n is the field number of the construction data recorded in the construction data matrix; the full connection layer L ₄ Is 64 in the input dimension and is 64 in the output dimension32; the full connection layer L ₅ Is 32, and the output dimension is 8; correspondingly, the decoding layer comprises a full connection layer L ₆ Full connection layer L ₇ And full connection layer L ₈ Convolutional layer L ₉ And pooling layer L ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer L ₆ Is 8, and the output dimension is 32; the full connection layer L ₇ Is 32, and the output dimension is 64; the full connection layer L ₈ The input dimension of (2) is 64, and the output dimension is n-2/2; the convolution layer L ₉ The convolution kernel size of (2 x 2); the pooling layer L ₁₀ The convolution kernel size of (2) is 3x3, the output channel is 6, and the input channel is 1.

According to the construction data self-checking method provided by the invention, the calculation of the mean square error between the output matrix and the signature matrix comprises the following steps:

wherein ,

for the output matrix, S ^a For the signature matrix, lossE is the mean square error between the output matrix and the signature matrix.

According to the construction data self-checking method provided by the invention, before the self-checking results of all construction data in the construction scene are determined, the error mean value of normal construction data is determined as a reference value; and determining self-checking results of all construction data in the construction scene according to the mean square error, wherein the self-checking results comprise: under the condition that the mean square error is larger than the reference value, determining the self-checking result as abnormal; and under the condition that the mean square error is not greater than the reference value, determining that the self-checking result is normal.

According to the construction data self-checking method provided by the invention, after the pre-built data anomaly detection model is pre-trained by adopting an unsupervised learning mode, the construction data self-checking method further comprises the following steps: constructing a verification data set which consists of normal verification data generated by an autoregressive system and abnormal data after partial normal verification data are modified; determining the precision and recall of the pre-trained data anomaly detection model by using the verification data set; and determining the credibility of the pre-trained data anomaly detection model according to the precision and recall ratio.

The invention also provides a construction data self-checking system, which comprises: the initial data acquisition unit is used for acquiring all construction data in a construction scene and constructing a construction data matrix; the correlation calculation unit is used for calculating the correlation among the data fields in the construction data matrix so as to obtain a signature matrix; the feature extraction unit is used for extracting features of the signature matrix by utilizing a pre-trained data anomaly detection model to obtain an output matrix; the error calculation unit is used for calculating the mean square error between the output matrix and the signature matrix; and the self-checking identification unit is used for determining self-checking results of all construction data under the construction scene according to the mean square error.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the construction data self-checking method according to any one of the above.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the construction data self-checking method as described in any one of the above.

According to the construction data self-checking method and system, the association relation among the construction data under the same construction scene is extracted, the data anomaly detection model is constructed, the anomaly data is automatically detected according to the mean square error between the input and the output of the model, the existence of false data invalid data is prevented, the self-checking precision of the construction data is effectively improved, and the self-checking efficiency is greatly improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a construction data self-checking method provided by the invention;

FIG. 2 is a second flow chart of the construction data self-checking method provided by the invention;

FIG. 3 is a schematic diagram of a data anomaly detection model provided by the present invention;

FIG. 4 is a schematic diagram of sampling all construction data provided by the present invention;

FIG. 5 is a schematic diagram of a construction data self-checking system provided by the invention;

fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that in the description of embodiments of the present invention, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. The orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description and to simplify the description, and are not indicative or implying that the apparatus or elements in question must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present invention. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

In the construction data, there is also a logical relationship between the respective data. For example, the relationship between the compactness and the thickness of the asphalt pavement, each total thickness corresponds to one upper layer thickness and one compactness data, so that the relationship exists for the data in the same construction scene. However, the currently adopted statistical test-based method cannot effectively utilize the association relationship between construction data. In view of this, the present invention provides a method for self-checking construction data based on a neural network, and the method and system for self-checking construction data provided by the embodiments of the present invention are described below with reference to fig. 1 to 6.

Fig. 1 is a schematic flow chart of a construction data self-checking method provided by the present invention, as shown in fig. 1, including but not limited to the following steps:

step S1: acquiring all construction data in a construction scene, and constructing a construction data matrix;

step S2: calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix;

step S3: extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix;

step S4: calculating a mean square error between the output matrix and the signature matrix;

step S5: and determining self-checking results of all construction data under the construction scene according to the mean square error.

Constructing a construction data matrix L, L epsilon R for all the collected construction data under the construction scene to be detected ^n×m The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is an n multiplied by m dimension matrix, n is the field number of the construction data recorded in the construction data matrix L, m is the field length of each construction data recorded in the construction data matrix L, and R is a symbol representing dimension in mathematics.

It should be noted that, the construction data self-checking method provided by the invention takes all the construction data as a whole to comprehensively detect whether the acquired data is correct or not in the construction scene.

Furthermore, according to the construction data self-checking method provided by the invention, the correlation among all construction data is considered, so that the correlation among all construction data recorded in the construction data matrix is required to be calculated.

As an alternative embodiment, the correlation may be pearson correlation coefficient, and the method for calculating the correlation between the data fields in the construction data matrix includes:

wherein X is the record data of the data field X, and if the length of the record data is T, X is E R ^1×T The method comprises the steps of carrying out a first treatment on the surface of the Y is the record data of data field Y, and its length is also T, i.e. Y E R ^1×T ；σ _X Is X standard deviation, sigma _Y Is Y standard deviation, cov (X, Y) is the covariance of X and Y.

After calculating the correlation between the construction data in the construction data matrix by using the above equation 1, the signature matrix S can be obtained ^a ，S ^a ∈R ^n×n The method comprises the steps of carrying out a first treatment on the surface of the Each data S in the signature matrix _xy Expressed as:

S _xy ＝abs(ρ _XY ) Equation 2

Wherein, the function abs (·) is calculated as taking absolute value.

Due to the signature matrix S ^a The correlation characteristics of each data field are described, but in order to better extract the potential characteristics of the entered construction data, the construction data self-checking method provided by the invention utilizesThe pre-trained data anomaly detection model further generates a signature matrix S ^a Extracting features to obtain corresponding output matrix

To a certain extent, output matrix

Can be regarded as a signature matrix S ^a Is a reproduction of (a). Signature matrix S to be correlated with construction data to be detected ^a After input to the data anomaly detection model, a corresponding output matrix is obtained>

The further available output matrix +.>

And signature matrix S ^a The mean square error between them is denoted as LossE, ">

Finally, the output matrix can be obtained

And signature matrix S ^a And (5) judging whether all the construction data in the construction scene are accurate or not according to the mean square error Losse.

According to the construction data self-checking method and system, the association relation among the construction data under the same construction scene is extracted, the data anomaly detection model is constructed, the anomaly data is automatically detected according to the mean square error between the input and the output of the model, false data and invalid data are prevented, the self-checking precision of the construction data is effectively improved, and the self-checking efficiency is greatly improved.

Based on the foregoing embodiment, as an optional embodiment, before extracting the features of the signature matrix by using a pre-trained data anomaly detection model, pre-training the pre-constructed data anomaly detection model by using an unsupervised learning method is further included; the objective function of the unsupervised learning is:

wherein ,Sⁱ In order to sign a matrix sample,

Fig. 2 is a second flow chart of the construction data self-checking method provided by the invention, as shown in fig. 2, before the feature extraction of the construction data matrix is performed by using the data anomaly detection model, the data anomaly detection model is further required to be pre-trained.

As an alternative embodiment, the invention adopts an unsupervised learning mode, adopts a mean square error as a loss function and calculates an output matrix

And signature matrix S ^a The error between the two is ensured to be as small as possible, and the pretraining is performed, so that the objective function adopted by the pretraining is shown as a formula 3.

Optionally, a random gradient descent method may be adopted in the whole pre-training process, the learning rate is set to 0.01 (which may be adjusted according to the actual situation), the training iteration number is set to 500 (which may be adjusted according to the actual situation), until the training result converges, and the trained data anomaly detection model is obtained.

Then, the obtained signature matrix is input into a trained data anomaly detection model, and an output matrix output by the model is obtained.

And finally, calculating the mean square error between the signature matrix and the output matrix, and taking the mean square error as a self-checking basis of the construction data.

FIG. 3 is a schematic diagram of a data anomaly detection model provided by the present invention, and as shown in FIG. 3, the data anomaly detection model provided by the present invention is constructed based on a coding and decoding framework; the coding and decoding framework comprises a coding layer and a decoding layer; the coding layer comprises a convolution layer L ₁ Pool layer L ₂ Full connection layer L ₃ Full connection layer L ₄ And full connection layer L ₅ The method comprises the steps of carrying out a first treatment on the surface of the The convolution layer L ₁ The convolution kernel size of (2) is 3×3, the output channel is 6, and the input channel is 1; the pooling layer L ₂ The convolution kernel size of (2 x 2); the full connection layer L ₃ The input dimension of (n-2)/2 is rounded, the output dimension is 64, and n is the field number of the construction data recorded in the construction data matrix; the full connection layer L ₄ Is 64, and the output dimension is 32; the full connection layer L ₅ Is 32 and the output dimension is 8.

Correspondingly, the decoding layer comprises a full connection layer L ₆ Full connection layer L ₇ And full connection layer L ₈ Convolutional layer L ₉ And pooling layer L ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer L ₆ Is 8, and the output dimension is 32; the full connection layer L ₇ Is 32, and the output dimension is 64; the full connection layer L ₈ The input dimension of (2) is 64, and the output dimension is n-2/2; the convolution layer L ₉ The convolution kernel size of (2 x 2); the pooling layer L ₁₀ The convolution kernel size of (2) is 3x3, the output channel is 6, and the input channel is 1.

In general, the input of the data anomaly detection model is a signature matrix, a feature vector E is obtained through an encoding layer, then the feature vector E is used as the input of a decoding layer, and finally an output matrix is obtained.

The data operation process of the anomaly detection model specifically comprises the following steps:

firstly, obtaining a signature matrix S after association relation calculation of a construction data matrix L in a target construction scene ^a ∈R ^n×n 。

Then, the signature matrix S ^a As a convolution layer L ₁ The input of (convolution kernel 3x 3) passes through convolution layer L ₁ Thereafter, the output data dimension becomes (n-3+1) x (n-3+1), i.e., (n-2) x (n-2).

At the pooling layer L ₂ (convolution kernel size is 2 x 2), the output data dimension is further reduced, the kernel size is 2 x 2, representing a pooling operation every 2 rows and 2 columns (maximum pooling may be employed here, i.e., 2 columns per 2 rows); thus, the data dimension would be divided by 2. If it is not divisible by 2, i.e. there is a row or column that is not pooled, then no processing is done at this step by default. Therefore, after pooling, the data dimension becomes m ' ×m ', where m ' = (n-2)/2 is rounded.

Pooling (Pooling) can be regarded as normalizing the feature map values in a Pooling window, and randomly sampling and selecting according to the normalized probability value of the feature map, i.e. the selected probability of large element values is also large.

Then L is taken up ₂ The pooled data is transferred into the full connection layer L ₃ Full connection layer L ₃ The input dimension of (2) is (n-2)/2.

The input dimension and the output dimension are lengths of a certain line. For example, at the full connection layer L ₃ Its input data is m ' x m ', specifically the length of the latter m '. If a data is 3x4 in size (i.e., 3 rows and 4 columns), it is input to a fully-connected layer, the input dimension of the fully-connected layer is 4.

As can be appreciated from the above data processing procedure, at the encoding layer, after the data passes through the pooling layer, the output data dimensions of each layer are changed as follows: (n-2)/2 rounding-64-32-8; i.e. the process of changing the data from high dimension to low dimension in the whole coding layer.

Accordingly, after the feature vector E is input to the decoding layer, the whole data processing process and the data processing process in the encoding layer are symmetrical inverse processes, which is not described in detail herein.

It should be noted that, when performing convolution, the input data is a matrix L (the number of channels is 1), and to represent the feature information with higher dimensions, we will generally tense a matrix into a tensor (with multiple channels). For example, the current input is a 5×5 matrix whose dimension can be expressed as 1×5×5 if the number of channels is added. After convolution, the feature dimension becomes n×m×m, where n is the number of output channels set by the convolution layer, and the value of m has a relationship with the convolution kernel size.

Wherein Convolution (Convolution) refers to the integration of the overlap length by the product of the overlap function values characterizing the flip and translation of the functions f and g by generating a mathematical operator of the third function from the two functions f and g.

In addition, the input dimension and the output dimension of each convolution layer in the encoding layer and the decoding layer in the anomaly detection model can be correspondingly set according to different construction scenes or according to the data characteristics of the collected construction data, and the anomaly detection method is not particularly limited.

According to the construction data self-checking method, the signature matrix containing the relevance among all construction data is subjected to feature extraction by constructing the anomaly detection model, so that the self-checking precision can be effectively improved.

Based on the content of the above embodiment, as an alternative embodiment, before determining the self-checking results of all the construction data in the construction scene, determining an error mean value of the normal construction data as a reference value; and determining self-checking results of all construction data in the construction scene according to the mean square error, wherein the self-checking results comprise: under the condition that the mean square error is larger than the reference value, determining the self-checking result as abnormal; and under the condition that the mean square error is not greater than the reference value, determining that the self-checking result is normal.

Specifically, after pre-training, the data anomaly detection model of the present invention can be obtained. The invention considers that after the abnormal data is input into the model, the obtained output matrix

And signature matrix S ^a The mean square error between (noted as

) Larger.

Assume that all construction data includes abnormal data X ^E After the signature matrix corresponding to the length K is input to the trained data anomaly detection model, the error LossE can be further calculated, and if the error LossE is greater than the LossM, the error LossE is greater than the LossM. Here LossM is the error mean of normal construction data. Thus, it can be judged whether the data is abnormal by whether LossE is greater than LossM. When the error LossE is greater than LossM, it can be determined that abnormal data exists in all the construction data; when the error LossE of the data is not greater than LossM, it can be judged that all the construction data are normal.

Based on the foregoing embodiment, as an alternative embodiment, after pre-training the pre-built data anomaly detection model by using an unsupervised learning manner, the method further includes:

constructing a verification data set which consists of normal verification data generated by an autoregressive system and abnormal data after partial normal verification data are modified; determining the precision and recall of the pre-trained data anomaly detection model by using the verification data set; and determining the credibility of the pre-trained data anomaly detection model according to the precision and recall ratio.

In order to further ensure the precision of construction data self-checking, the construction data self-checking method provided by the invention further verifies the data anomaly detection model by creating a test data set after the pre-training of the model, and if the verification result is qualified, the method is applied to actual detection.

The verification index mainly comprises a precision rate and a recall rate, wherein the precision rate is calculated by the following steps:

the calculation formula of the recall ratio is as follows:

wherein TP represents the number of test results for which the verification sample data is abnormal and the model classification result is abnormal, FP represents the number of test results for which the verification sample data is normal and the model classification result is abnormal, and FN represents the number of test results for which the verification sample data is abnormal and the model classification is normal.

The verification data set of the present invention is divided into two types including normal data and abnormal data. Wherein the normal data is simulation data generated based on the following autoregressive system.

The autoregressive system refers to a data value method related to the value of the current variable in the system and the value of the past moment.

As an alternative embodiment, the present invention provides a method for acquiring normal data:

x _1，t ＝0.5x _1,t-1 +ε _1,t ；

x _2,t ＝0.6cos(x _2,t-1 )+ε _2,t ；

x _4,t ＝0.8x _7,t-1 +ε _4,t ；

x _5,t ＝0.9x _8,t-1 +ε _5,t ；

x _7,t ＝2cos(x _2,t-1 )+0.6sin(x _10,t-1 )+ε _7,t ；

x _8,t ＝0.8cos(x _3,t-1 )+cos(x _6,t-1 )+1+ε _8,t ；

x _9,t ＝sin(x _4,t-1 )-0.8x _7,t-1 +ε _9,t ；

the variable x can be obtained in total in the above embodiment ₁ -x ₁₀ The value at the time t is that 10 normal data x are generated _1,t -x _10,t 。

Further, the abnormal data may be obtained by randomly modifying part of the normal data, such as: by modifying any three variables, x ₁ ，x ₄ and x₉ The values of (2) yield the following three outliers:

x′ _1,t ＝0.5x _1,t-1 +∈ _1,t ；

x′ _4,t ＝0.8x _7,t-1 +∈ _4,t ；

x′ _9,t ＝sin(x _4,t-1 )-0.8x _7,t-1 +∈ _9,t ；

where ε is Gaussian white noise and ε is a random number of 0 to 1.

And constructing a verification data set by using the normal data and the abnormal data, and verifying the trained data abnormal detection model, wherein the obtained verification result is that the precision rate can reach 0.88 and the recall ratio can reach 0.83.

A set of thresholds may be set, such as setting the precision threshold and the recall threshold to 0.8, and only when the precision and recall in the verification result are both greater than the corresponding thresholds, the trained data anomaly detection model is considered to be qualified.

According to the method, under a certain construction scene, the input data which are qualified in inspection are sorted, and an original construction data set is obtained. Assume that there are 10 fields in the dataset, each field being 1000 bytes in length.

FIG. 4 is a schematic diagram of sampling all construction data provided by the present invention, as shown in FIG. 4, preprocessing an original construction data set, including: each field of the original construction dataset is sampled by length T (t=45). When sampling, the starting point of the last sampling is recorded as s _p Then the start of the next sample is s _p +5 (let the start of the first sample be 0).

By adopting the method, 192 construction sample data can be obtained. All samples may be partitioned at 7:3 to construct a training sample set and a validation sample set, respectively.

And generating a construction data matrix aiming at the training sample set, and calculating the correlation degree among all data in the matrix by adopting a formula 1 to obtain a signature matrix.

And inputting the signature matrix into a data anomaly detection model to be trained so as to obtain an output matrix. And then pre-training the data anomaly detection model by using an unsupervised training method corresponding to the formula 3. The training process can be based on sampling random gradient descent for training, the learning rate is 0.01, and the iteration number is 500.

Then, after training of the data anomaly detection model is achieved, it is verified using a verification sample set.

On the basis of determining that the verification result is qualified, the method in the above embodiment can be sampled for the construction data to be detected, a signature matrix is generated, and then the signature matrix is input into a trained data anomaly detection model, so that the error rate LossM is finally counted. If the LossE value is larger than the LossM, the construction data to be detected is abnormal, otherwise, the construction data to be detected is normal.

Fig. 5 is a schematic structural diagram of self-checking construction data provided by the present invention, as shown in fig. 5, including but not limited to an initial data acquisition unit 501, a correlation calculation unit 502, a feature extraction unit 503, an error calculation unit 504, and a self-checking identification unit 505, wherein:

the initial data acquisition unit 501 is mainly used for acquiring all construction data in a construction scene and constructing a construction data matrix;

the correlation calculation unit 502 is mainly used for calculating the correlation degree between each data field in the construction data matrix to obtain a signature matrix;

the feature extraction unit 503 is mainly configured to perform feature extraction on the signature matrix by using a pre-trained data anomaly detection model, so as to obtain an output matrix;

the error calculation unit 504 is mainly configured to calculate a mean square error between the output matrix and the signature matrix;

the self-checking identification unit 505 is mainly configured to determine self-checking results of all construction data under the construction scene according to the mean square error.

It should be noted that, when the construction data self-checking system provided in the embodiment of the present invention is specifically executed, the construction data self-checking system may be implemented based on the construction data self-checking method described in any one of the foregoing embodiments, which is not described in detail in this embodiment.

According to the construction data self-checking system provided by the invention, the association relation among the construction data under the same construction scene is extracted, the data anomaly detection model is constructed, the automatic detection of the anomaly data is realized according to the mean square error between the input and the output of the model, the existence of false data invalid data is prevented, the self-checking precision of the construction data is effectively improved, and the self-checking efficiency is greatly improved.

Fig. 6 is a schematic structural diagram of an electronic device according to the present invention, and as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. Processor 610 may invoke logic instructions in memory 630 to perform a construction data self-test method comprising: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the construction data self-checking method provided by the above methods, the method comprising: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error.

In still another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the construction data self-checking method provided in the above embodiments, the method comprising: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The construction data self-checking method is characterized by comprising the following steps of:

acquiring all construction data in a construction scene, and constructing a construction data matrix;

calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix;

the correlation degree is a pearson correlation coefficient, and the calculating the correlation degree between the data fields in the construction data matrix comprises the following steps:

wherein X is the record data of data field X, Y is the record data of data field Y, sigma _X Is X standard deviation, sigma _Y Is Y standard deviation, cov (X, Y) is the covariance of X and Y;

extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix;

calculating a mean square error between the output matrix and the signature matrix;

according to the mean square error, determining self-checking results of all construction data in the construction scene;

the data anomaly detection model is constructed based on a coding and decoding framework; the coding and decoding framework comprises a coding layer and a decoding layer;

the coding layer comprises a convolution layer L ₁ Pool layer L ₂ Full connection layer L ₃ Full connection layer L ₄ And full connection layer L ₅ The method comprises the steps of carrying out a first treatment on the surface of the The convolution layer L ₁ The convolution kernel size of (2) is 3x3, the output channel6, the input channel is 1; the pooling layer L ₂ The convolution kernel size of (2 x 2); the full connection layer L ₃ The input dimension of (n-2)/2 is rounded, the output dimension is 64, and n is the field number of the construction data recorded in the construction data matrix; the full connection layer L ₄ Is 64, and the output dimension is 32; the full connection layer L ₅ Is 32, and the output dimension is 8;

correspondingly, the decoding layer comprises a full connection layer L ₆ Full connection layer L ₇ And full connection layer L ₈ Convolutional layer L ₉ And pooling layer L ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer L ₆ Is 8, and the output dimension is 32; the full connection layer L ₇ Is 32, and the output dimension is 64; the full connection layer L ₈ The input dimension of (2) is 64, and the output dimension is (n-2)/2 is rounded; the convolution layer L ₉ The convolution kernel size of (2 x 2); the pooling layer L ₁₀ The convolution kernel size of (2) is 3x3, the output channel is 6, and the input channel is 1.

2. The construction data self-checking method according to claim 1, further comprising pre-training a pre-constructed data anomaly detection model by means of unsupervised learning before feature extraction of the signature matrix by using the pre-trained data anomaly detection model; the objective function of the unsupervised learning is:

wherein ,Sⁱ In order to sign a matrix sample,

3. The construction data self-checking method according to claim 1, wherein said calculating a mean square error between the output matrix and the signature matrix comprises:

wherein ,

4. A construction data self-checking method according to claim 3, wherein, before determining self-checking results of all construction data in the construction scene, an error mean value of normal construction data is determined as a reference value; and determining self-checking results of all construction data in the construction scene according to the mean square error, wherein the self-checking results comprise:

under the condition that the mean square error is larger than the reference value, determining the self-checking result as abnormal;

and under the condition that the mean square error is not greater than the reference value, determining that the self-checking result is normal.

5. The construction data self-checking method according to claim 1, further comprising, after pre-training the pre-constructed data anomaly detection model by means of unsupervised learning:

constructing a verification data set which consists of normal verification data generated by an autoregressive system and abnormal data after partial normal verification data are modified;

determining the precision and recall of the pre-trained data anomaly detection model by using the verification data set;

and determining the credibility of the pre-trained data anomaly detection model according to the precision and recall ratio.

6. A construction data self-checking system, comprising:

the initial data acquisition unit is used for acquiring all construction data in a construction scene and constructing a construction data matrix;

the correlation calculation unit is used for calculating the correlation among the data fields in the construction data matrix so as to obtain a signature matrix;

the feature extraction unit is used for extracting features of the signature matrix by utilizing a pre-trained data anomaly detection model to obtain an output matrix;

the coding layer comprises a convolution layer L ₁ Pool layer L ₂ Full connection layer L ₃ Full connection layer L ₄ And full connection layer L ₅ The method comprises the steps of carrying out a first treatment on the surface of the The convolution layer L ₁ The convolution kernel size of (2) is 3×3, the output channel is 6, and the input channel is 1; the pooling layer L ₂ The convolution kernel size of (2 x 2); the full connection layer L ₃ The input dimension of (n-2)/2 is rounded, the output dimension is 64, and n is the field number of the construction data recorded in the construction data matrix; the full connection layer L ₄ Is 64, and the output dimension is 32; the full connection layer L ₅ Is 32, and the output dimension is 8;

accordingly, the decodingThe layers include a full connection layer L ₆ Full connection layer L ₇ And full connection layer L ₈ Convolutional layer L ₉ And pooling layer L ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer L ₆ Is 8, and the output dimension is 32; the full connection layer L ₇ Is 32, and the output dimension is 64; the full connection layer L ₈ The input dimension of (2) is 64, and the output dimension is (n-2)/2 is rounded; the convolution layer L ₉ The convolution kernel size of (2 x 2); the pooling layer L ₁₀ The convolution kernel size of (2) is 3×3, the output channel is 6, and the input channel is 1;

an error calculation unit for calculating a mean square error between the output matrix and the signature matrix;

and the self-checking identification unit is used for determining self-checking results of all construction data under the construction scene according to the mean square error.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the construction data self-checking method steps of any one of claims 1 to 5.

8. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the construction data self-checking method steps of any one of claims 1 to 5.