CN113239075B - Construction data self-checking method and system - Google Patents

Construction data self-checking method and system Download PDF

Info

Publication number
CN113239075B
CN113239075B CN202110524398.9A CN202110524398A CN113239075B CN 113239075 B CN113239075 B CN 113239075B CN 202110524398 A CN202110524398 A CN 202110524398A CN 113239075 B CN113239075 B CN 113239075B
Authority
CN
China
Prior art keywords
data
construction
matrix
layer
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110524398.9A
Other languages
Chinese (zh)
Other versions
CN113239075A (en
Inventor
张艳红
侯芸
董元帅
闫旭亮
樊永伟
钱振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Checsc Highway Maintenance And Test Technology Co ltd
China Highway Engineering Consultants Corp
CHECC Data Co Ltd
Original Assignee
Checsc Highway Maintenance And Test Technology Co ltd
China Highway Engineering Consultants Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Checsc Highway Maintenance And Test Technology Co ltd, China Highway Engineering Consultants Corp filed Critical Checsc Highway Maintenance And Test Technology Co ltd
Priority to CN202110524398.9A priority Critical patent/CN113239075B/en
Publication of CN113239075A publication Critical patent/CN113239075A/en
Application granted granted Critical
Publication of CN113239075B publication Critical patent/CN113239075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention provides a construction data self-checking method and a construction data self-checking system, wherein the construction data self-checking method comprises the following steps: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error. According to the construction data self-checking method and system, the association relation among the construction data under the same construction scene is extracted, the data anomaly detection model is constructed, the anomaly data is automatically detected according to the mean square error between the input and the output of the model, the existence of false data invalid data is prevented, the self-checking precision of the construction data is effectively improved, and the self-checking efficiency is greatly improved.

Description

Construction data self-checking method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a construction data self-checking method and system.
Background
In highway engineering, along with the continuous development of electronic informatization, data in the construction management process are required to be recorded and archived and are ready for acceptance inspection. The data is recorded accurately and objectively. However, occasionally, the field data is not filled normally due to the reasons of limited service level, low responsibility and the like of some construction technicians, and false data, invalid data and the like are easy to appear. In actual construction acceptance detection, the false data or invalid data cannot truly reflect construction quality and various process records, so that acceptance is unqualified, and further the whole construction engineering is continuously modified and repeatedly detected, and the construction progress is seriously affected.
In order to ensure that the construction self-checking data is reliable and can truly reflect the actual quality condition of the engineering, prevent false data from spreading to damage the engineering quality of the construction project, provide the truest and reliable original quality data for the later maintenance operation, and have important significance for verifying the construction self-checking data.
Currently, the common detection methods are all based on statistical tests. Including detecting the validity of the data from statistical indicators such as variance, mean, etc. However, the method based on statistical test cannot effectively utilize the association relationship between data to realize the detection of the validity of the data, so that the detection efficiency is low and the detection precision is poor.
Disclosure of Invention
Aiming at the defects existing in the process of realizing the data self-checking in the prior art, the embodiment of the invention provides a construction data self-checking method and a construction data self-checking system, which can effectively improve the precision of the construction data self-checking.
The invention provides a construction data self-checking method, which comprises the following steps: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error.
According to the construction data self-checking method provided by the invention, the relevance is a pearson relevance coefficient, and the calculating of the relevance among all data fields in the construction data matrix comprises the following steps:
Figure BDA0003065262640000021
wherein X is the record data of data field X, Y is the record data of data field Y, sigma X Is X standard deviation, sigma Y Is Y standard deviation, cov (X, Y) is the covariance of X and Y.
According to the construction data self-checking method provided by the invention, before the feature extraction of the signature matrix is carried out by utilizing the pre-trained data anomaly detection model, the method further comprises the step of pre-training the pre-constructed data anomaly detection model in an unsupervised learning mode; the objective function of the unsupervised learning is:
Figure BDA0003065262640000022
/>
wherein ,Si In order to sign a matrix sample,
Figure BDA0003065262640000023
for outputting matrix samples, M is the iteration number of training; and Θ is a parameter to be trained of the data anomaly detection model, and i is an intermediate parameter.
According to the construction data self-checking method provided by the invention, the constructed data anomaly detection model is constructed based on a coding and decoding framework; the coding and decoding framework comprises a coding layer and a decoding layer; the coding layer comprises a convolution layer L 1 Pool layer L 2 Full connection layer L 3 Full connection layer L 4 And full connection layer L 5 The method comprises the steps of carrying out a first treatment on the surface of the The convolution layer L 1 The convolution kernel size of (2) is 3×3, the output channel is 6, and the input channel is 1; the pooling layer L 2 The convolution kernel size of (2 x 2); the full connection layer L 3 The input dimension of (n-2)/2 is rounded, the output dimension is 64, and n is the field number of the construction data recorded in the construction data matrix; the full connection layer L 4 Is 64 in the input dimension and is 64 in the output dimension32; the full connection layer L 5 Is 32, and the output dimension is 8; correspondingly, the decoding layer comprises a full connection layer L 6 Full connection layer L 7 And full connection layer L 8 Convolutional layer L 9 And pooling layer L 10 The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer L 6 Is 8, and the output dimension is 32; the full connection layer L 7 Is 32, and the output dimension is 64; the full connection layer L 8 The input dimension of (2) is 64, and the output dimension is n-2/2; the convolution layer L 9 The convolution kernel size of (2 x 2); the pooling layer L 10 The convolution kernel size of (2) is 3x3, the output channel is 6, and the input channel is 1.
According to the construction data self-checking method provided by the invention, the calculation of the mean square error between the output matrix and the signature matrix comprises the following steps:
Figure BDA0003065262640000031
wherein ,
Figure BDA0003065262640000032
for the output matrix, S a For the signature matrix, lossE is the mean square error between the output matrix and the signature matrix.
According to the construction data self-checking method provided by the invention, before the self-checking results of all construction data in the construction scene are determined, the error mean value of normal construction data is determined as a reference value; and determining self-checking results of all construction data in the construction scene according to the mean square error, wherein the self-checking results comprise: under the condition that the mean square error is larger than the reference value, determining the self-checking result as abnormal; and under the condition that the mean square error is not greater than the reference value, determining that the self-checking result is normal.
According to the construction data self-checking method provided by the invention, after the pre-built data anomaly detection model is pre-trained by adopting an unsupervised learning mode, the construction data self-checking method further comprises the following steps: constructing a verification data set which consists of normal verification data generated by an autoregressive system and abnormal data after partial normal verification data are modified; determining the precision and recall of the pre-trained data anomaly detection model by using the verification data set; and determining the credibility of the pre-trained data anomaly detection model according to the precision and recall ratio.
The invention also provides a construction data self-checking system, which comprises: the initial data acquisition unit is used for acquiring all construction data in a construction scene and constructing a construction data matrix; the correlation calculation unit is used for calculating the correlation among the data fields in the construction data matrix so as to obtain a signature matrix; the feature extraction unit is used for extracting features of the signature matrix by utilizing a pre-trained data anomaly detection model to obtain an output matrix; the error calculation unit is used for calculating the mean square error between the output matrix and the signature matrix; and the self-checking identification unit is used for determining self-checking results of all construction data under the construction scene according to the mean square error.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the construction data self-checking method according to any one of the above.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the construction data self-checking method as described in any one of the above.
According to the construction data self-checking method and system, the association relation among the construction data under the same construction scene is extracted, the data anomaly detection model is constructed, the anomaly data is automatically detected according to the mean square error between the input and the output of the model, the existence of false data invalid data is prevented, the self-checking precision of the construction data is effectively improved, and the self-checking efficiency is greatly improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a construction data self-checking method provided by the invention;
FIG. 2 is a second flow chart of the construction data self-checking method provided by the invention;
FIG. 3 is a schematic diagram of a data anomaly detection model provided by the present invention;
FIG. 4 is a schematic diagram of sampling all construction data provided by the present invention;
FIG. 5 is a schematic diagram of a construction data self-checking system provided by the invention;
fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that in the description of embodiments of the present invention, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. The orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description and to simplify the description, and are not indicative or implying that the apparatus or elements in question must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present invention. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the construction data, there is also a logical relationship between the respective data. For example, the relationship between the compactness and the thickness of the asphalt pavement, each total thickness corresponds to one upper layer thickness and one compactness data, so that the relationship exists for the data in the same construction scene. However, the currently adopted statistical test-based method cannot effectively utilize the association relationship between construction data. In view of this, the present invention provides a method for self-checking construction data based on a neural network, and the method and system for self-checking construction data provided by the embodiments of the present invention are described below with reference to fig. 1 to 6.
Fig. 1 is a schematic flow chart of a construction data self-checking method provided by the present invention, as shown in fig. 1, including but not limited to the following steps:
step S1: acquiring all construction data in a construction scene, and constructing a construction data matrix;
step S2: calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix;
step S3: extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix;
step S4: calculating a mean square error between the output matrix and the signature matrix;
step S5: and determining self-checking results of all construction data under the construction scene according to the mean square error.
Constructing a construction data matrix L, L epsilon R for all the collected construction data under the construction scene to be detected n×m The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is an n multiplied by m dimension matrix, n is the field number of the construction data recorded in the construction data matrix L, m is the field length of each construction data recorded in the construction data matrix L, and R is a symbol representing dimension in mathematics.
It should be noted that, the construction data self-checking method provided by the invention takes all the construction data as a whole to comprehensively detect whether the acquired data is correct or not in the construction scene.
Furthermore, according to the construction data self-checking method provided by the invention, the correlation among all construction data is considered, so that the correlation among all construction data recorded in the construction data matrix is required to be calculated.
As an alternative embodiment, the correlation may be pearson correlation coefficient, and the method for calculating the correlation between the data fields in the construction data matrix includes:
Figure BDA0003065262640000061
wherein X is the record data of the data field X, and if the length of the record data is T, X is E R 1×T The method comprises the steps of carrying out a first treatment on the surface of the Y is the record data of data field Y, and its length is also T, i.e. Y E R 1×T ;σ X Is X standard deviation, sigma Y Is Y standard deviation, cov (X, Y) is the covariance of X and Y.
After calculating the correlation between the construction data in the construction data matrix by using the above equation 1, the signature matrix S can be obtained a ,S a ∈R n×n The method comprises the steps of carrying out a first treatment on the surface of the Each data S in the signature matrix xy Expressed as:
S xy =abs(ρ XY ) Equation 2
Wherein, the function abs (·) is calculated as taking absolute value.
Due to the signature matrix S a The correlation characteristics of each data field are described, but in order to better extract the potential characteristics of the entered construction data, the construction data self-checking method provided by the invention utilizesThe pre-trained data anomaly detection model further generates a signature matrix S a Extracting features to obtain corresponding output matrix
Figure BDA0003065262640000071
To a certain extent, output matrix
Figure BDA0003065262640000072
Can be regarded as a signature matrix S a Is a reproduction of (a). Signature matrix S to be correlated with construction data to be detected a After input to the data anomaly detection model, a corresponding output matrix is obtained>
Figure BDA0003065262640000073
The further available output matrix +.>
Figure BDA0003065262640000074
And signature matrix S a The mean square error between them is denoted as LossE, ">
Figure BDA0003065262640000075
Finally, the output matrix can be obtained
Figure BDA0003065262640000076
And signature matrix S a And (5) judging whether all the construction data in the construction scene are accurate or not according to the mean square error Losse.
According to the construction data self-checking method and system, the association relation among the construction data under the same construction scene is extracted, the data anomaly detection model is constructed, the anomaly data is automatically detected according to the mean square error between the input and the output of the model, false data and invalid data are prevented, the self-checking precision of the construction data is effectively improved, and the self-checking efficiency is greatly improved.
Based on the foregoing embodiment, as an optional embodiment, before extracting the features of the signature matrix by using a pre-trained data anomaly detection model, pre-training the pre-constructed data anomaly detection model by using an unsupervised learning method is further included; the objective function of the unsupervised learning is:
Figure BDA0003065262640000077
wherein ,Si In order to sign a matrix sample,
Figure BDA0003065262640000081
for outputting matrix samples, M is the iteration number of training; and Θ is a parameter to be trained of the data anomaly detection model, and i is an intermediate parameter.
Fig. 2 is a second flow chart of the construction data self-checking method provided by the invention, as shown in fig. 2, before the feature extraction of the construction data matrix is performed by using the data anomaly detection model, the data anomaly detection model is further required to be pre-trained.
As an alternative embodiment, the invention adopts an unsupervised learning mode, adopts a mean square error as a loss function and calculates an output matrix
Figure BDA0003065262640000082
And signature matrix S a The error between the two is ensured to be as small as possible, and the pretraining is performed, so that the objective function adopted by the pretraining is shown as a formula 3.
Optionally, a random gradient descent method may be adopted in the whole pre-training process, the learning rate is set to 0.01 (which may be adjusted according to the actual situation), the training iteration number is set to 500 (which may be adjusted according to the actual situation), until the training result converges, and the trained data anomaly detection model is obtained.
Then, the obtained signature matrix is input into a trained data anomaly detection model, and an output matrix output by the model is obtained.
And finally, calculating the mean square error between the signature matrix and the output matrix, and taking the mean square error as a self-checking basis of the construction data.
FIG. 3 is a schematic diagram of a data anomaly detection model provided by the present invention, and as shown in FIG. 3, the data anomaly detection model provided by the present invention is constructed based on a coding and decoding framework; the coding and decoding framework comprises a coding layer and a decoding layer; the coding layer comprises a convolution layer L 1 Pool layer L 2 Full connection layer L 3 Full connection layer L 4 And full connection layer L 5 The method comprises the steps of carrying out a first treatment on the surface of the The convolution layer L 1 The convolution kernel size of (2) is 3×3, the output channel is 6, and the input channel is 1; the pooling layer L 2 The convolution kernel size of (2 x 2); the full connection layer L 3 The input dimension of (n-2)/2 is rounded, the output dimension is 64, and n is the field number of the construction data recorded in the construction data matrix; the full connection layer L 4 Is 64, and the output dimension is 32; the full connection layer L 5 Is 32 and the output dimension is 8.
Correspondingly, the decoding layer comprises a full connection layer L 6 Full connection layer L 7 And full connection layer L 8 Convolutional layer L 9 And pooling layer L 10 The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer L 6 Is 8, and the output dimension is 32; the full connection layer L 7 Is 32, and the output dimension is 64; the full connection layer L 8 The input dimension of (2) is 64, and the output dimension is n-2/2; the convolution layer L 9 The convolution kernel size of (2 x 2); the pooling layer L 10 The convolution kernel size of (2) is 3x3, the output channel is 6, and the input channel is 1.
In general, the input of the data anomaly detection model is a signature matrix, a feature vector E is obtained through an encoding layer, then the feature vector E is used as the input of a decoding layer, and finally an output matrix is obtained.
The data operation process of the anomaly detection model specifically comprises the following steps:
firstly, obtaining a signature matrix S after association relation calculation of a construction data matrix L in a target construction scene a ∈R n×n
Then, the signature matrix S a As a convolution layer L 1 The input of (convolution kernel 3x 3) passes through convolution layer L 1 Thereafter, the output data dimension becomes (n-3+1) x (n-3+1), i.e., (n-2) x (n-2).
At the pooling layer L 2 (convolution kernel size is 2 x 2), the output data dimension is further reduced, the kernel size is 2 x 2, representing a pooling operation every 2 rows and 2 columns (maximum pooling may be employed here, i.e., 2 columns per 2 rows); thus, the data dimension would be divided by 2. If it is not divisible by 2, i.e. there is a row or column that is not pooled, then no processing is done at this step by default. Therefore, after pooling, the data dimension becomes m ' ×m ', where m ' = (n-2)/2 is rounded.
Pooling (Pooling) can be regarded as normalizing the feature map values in a Pooling window, and randomly sampling and selecting according to the normalized probability value of the feature map, i.e. the selected probability of large element values is also large.
Then L is taken up 2 The pooled data is transferred into the full connection layer L 3 Full connection layer L 3 The input dimension of (2) is (n-2)/2.
The input dimension and the output dimension are lengths of a certain line. For example, at the full connection layer L 3 Its input data is m ' x m ', specifically the length of the latter m '. If a data is 3x4 in size (i.e., 3 rows and 4 columns), it is input to a fully-connected layer, the input dimension of the fully-connected layer is 4.
As can be appreciated from the above data processing procedure, at the encoding layer, after the data passes through the pooling layer, the output data dimensions of each layer are changed as follows: (n-2)/2 rounding-64-32-8; i.e. the process of changing the data from high dimension to low dimension in the whole coding layer.
Accordingly, after the feature vector E is input to the decoding layer, the whole data processing process and the data processing process in the encoding layer are symmetrical inverse processes, which is not described in detail herein.
It should be noted that, when performing convolution, the input data is a matrix L (the number of channels is 1), and to represent the feature information with higher dimensions, we will generally tense a matrix into a tensor (with multiple channels). For example, the current input is a 5×5 matrix whose dimension can be expressed as 1×5×5 if the number of channels is added. After convolution, the feature dimension becomes n×m×m, where n is the number of output channels set by the convolution layer, and the value of m has a relationship with the convolution kernel size.
Wherein Convolution (Convolution) refers to the integration of the overlap length by the product of the overlap function values characterizing the flip and translation of the functions f and g by generating a mathematical operator of the third function from the two functions f and g.
In addition, the input dimension and the output dimension of each convolution layer in the encoding layer and the decoding layer in the anomaly detection model can be correspondingly set according to different construction scenes or according to the data characteristics of the collected construction data, and the anomaly detection method is not particularly limited.
According to the construction data self-checking method, the signature matrix containing the relevance among all construction data is subjected to feature extraction by constructing the anomaly detection model, so that the self-checking precision can be effectively improved.
Based on the content of the above embodiment, as an alternative embodiment, before determining the self-checking results of all the construction data in the construction scene, determining an error mean value of the normal construction data as a reference value; and determining self-checking results of all construction data in the construction scene according to the mean square error, wherein the self-checking results comprise: under the condition that the mean square error is larger than the reference value, determining the self-checking result as abnormal; and under the condition that the mean square error is not greater than the reference value, determining that the self-checking result is normal.
Specifically, after pre-training, the data anomaly detection model of the present invention can be obtained. The invention considers that after the abnormal data is input into the model, the obtained output matrix
Figure BDA0003065262640000111
And signature matrix S a The mean square error between (noted as
Figure BDA0003065262640000112
) Larger.
Based on the content of the above embodiment, as an alternative embodiment, before determining the self-checking results of all the construction data in the construction scene, determining an error mean value of the normal construction data as a reference value; and determining self-checking results of all construction data in the construction scene according to the mean square error, wherein the self-checking results comprise: under the condition that the mean square error is larger than the reference value, determining the self-checking result as abnormal; and under the condition that the mean square error is not greater than the reference value, determining that the self-checking result is normal.
Assume that all construction data includes abnormal data X E After the signature matrix corresponding to the length K is input to the trained data anomaly detection model, the error LossE can be further calculated, and if the error LossE is greater than the LossM, the error LossE is greater than the LossM. Here LossM is the error mean of normal construction data. Thus, it can be judged whether the data is abnormal by whether LossE is greater than LossM. When the error LossE is greater than LossM, it can be determined that abnormal data exists in all the construction data; when the error LossE of the data is not greater than LossM, it can be judged that all the construction data are normal.
Based on the foregoing embodiment, as an alternative embodiment, after pre-training the pre-built data anomaly detection model by using an unsupervised learning manner, the method further includes:
constructing a verification data set which consists of normal verification data generated by an autoregressive system and abnormal data after partial normal verification data are modified; determining the precision and recall of the pre-trained data anomaly detection model by using the verification data set; and determining the credibility of the pre-trained data anomaly detection model according to the precision and recall ratio.
In order to further ensure the precision of construction data self-checking, the construction data self-checking method provided by the invention further verifies the data anomaly detection model by creating a test data set after the pre-training of the model, and if the verification result is qualified, the method is applied to actual detection.
The verification index mainly comprises a precision rate and a recall rate, wherein the precision rate is calculated by the following steps:
Figure BDA0003065262640000121
the calculation formula of the recall ratio is as follows:
Figure BDA0003065262640000122
wherein TP represents the number of test results for which the verification sample data is abnormal and the model classification result is abnormal, FP represents the number of test results for which the verification sample data is normal and the model classification result is abnormal, and FN represents the number of test results for which the verification sample data is abnormal and the model classification is normal.
The verification data set of the present invention is divided into two types including normal data and abnormal data. Wherein the normal data is simulation data generated based on the following autoregressive system.
The autoregressive system refers to a data value method related to the value of the current variable in the system and the value of the past moment.
As an alternative embodiment, the present invention provides a method for acquiring normal data:
x 1,t =0.5x 1,t-11,t
x 2,t =0.6cos(x 2,t-1 )+ε 2,t
Figure BDA0003065262640000123
x 4,t =0.8x 7,t-14,t
x 5,t =0.9x 8,t-15,t
Figure BDA0003065262640000124
x 7,t =2cos(x 2,t-1 )+0.6sin(x 10,t-1 )+ε 7,t
x 8,t =0.8cos(x 3,t-1 )+cos(x 6,t-1 )+1+ε 8,t
x 9,t =sin(x 4,t-1 )-0.8x 7,t-19,t
Figure BDA0003065262640000125
the variable x can be obtained in total in the above embodiment 1 -x 10 The value at the time t is that 10 normal data x are generated 1,t -x 10,t
Further, the abnormal data may be obtained by randomly modifying part of the normal data, such as: by modifying any three variables, x 1 ,x 4 and x9 The values of (2) yield the following three outliers:
x′ 1,t =0.5x 1,t-1 +∈ 1,t
x′ 4,t =0.8x 7,t-1 +∈ 4,t
x′ 9,t =sin(x 4,t-1 )-0.8x 7,t-1 +∈ 9,t
where ε is Gaussian white noise and ε is a random number of 0 to 1.
And constructing a verification data set by using the normal data and the abnormal data, and verifying the trained data abnormal detection model, wherein the obtained verification result is that the precision rate can reach 0.88 and the recall ratio can reach 0.83.
A set of thresholds may be set, such as setting the precision threshold and the recall threshold to 0.8, and only when the precision and recall in the verification result are both greater than the corresponding thresholds, the trained data anomaly detection model is considered to be qualified.
According to the method, under a certain construction scene, the input data which are qualified in inspection are sorted, and an original construction data set is obtained. Assume that there are 10 fields in the dataset, each field being 1000 bytes in length.
FIG. 4 is a schematic diagram of sampling all construction data provided by the present invention, as shown in FIG. 4, preprocessing an original construction data set, including: each field of the original construction dataset is sampled by length T (t=45). When sampling, the starting point of the last sampling is recorded as s p Then the start of the next sample is s p +5 (let the start of the first sample be 0).
By adopting the method, 192 construction sample data can be obtained. All samples may be partitioned at 7:3 to construct a training sample set and a validation sample set, respectively.
And generating a construction data matrix aiming at the training sample set, and calculating the correlation degree among all data in the matrix by adopting a formula 1 to obtain a signature matrix.
And inputting the signature matrix into a data anomaly detection model to be trained so as to obtain an output matrix. And then pre-training the data anomaly detection model by using an unsupervised training method corresponding to the formula 3. The training process can be based on sampling random gradient descent for training, the learning rate is 0.01, and the iteration number is 500.
Then, after training of the data anomaly detection model is achieved, it is verified using a verification sample set.
On the basis of determining that the verification result is qualified, the method in the above embodiment can be sampled for the construction data to be detected, a signature matrix is generated, and then the signature matrix is input into a trained data anomaly detection model, so that the error rate LossM is finally counted. If the LossE value is larger than the LossM, the construction data to be detected is abnormal, otherwise, the construction data to be detected is normal.
Fig. 5 is a schematic structural diagram of self-checking construction data provided by the present invention, as shown in fig. 5, including but not limited to an initial data acquisition unit 501, a correlation calculation unit 502, a feature extraction unit 503, an error calculation unit 504, and a self-checking identification unit 505, wherein:
the initial data acquisition unit 501 is mainly used for acquiring all construction data in a construction scene and constructing a construction data matrix;
the correlation calculation unit 502 is mainly used for calculating the correlation degree between each data field in the construction data matrix to obtain a signature matrix;
the feature extraction unit 503 is mainly configured to perform feature extraction on the signature matrix by using a pre-trained data anomaly detection model, so as to obtain an output matrix;
the error calculation unit 504 is mainly configured to calculate a mean square error between the output matrix and the signature matrix;
the self-checking identification unit 505 is mainly configured to determine self-checking results of all construction data under the construction scene according to the mean square error.
It should be noted that, when the construction data self-checking system provided in the embodiment of the present invention is specifically executed, the construction data self-checking system may be implemented based on the construction data self-checking method described in any one of the foregoing embodiments, which is not described in detail in this embodiment.
According to the construction data self-checking system provided by the invention, the association relation among the construction data under the same construction scene is extracted, the data anomaly detection model is constructed, the automatic detection of the anomaly data is realized according to the mean square error between the input and the output of the model, the existence of false data invalid data is prevented, the self-checking precision of the construction data is effectively improved, and the self-checking efficiency is greatly improved.
Fig. 6 is a schematic structural diagram of an electronic device according to the present invention, and as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. Processor 610 may invoke logic instructions in memory 630 to perform a construction data self-test method comprising: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the construction data self-checking method provided by the above methods, the method comprising: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error.
In still another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the construction data self-checking method provided in the above embodiments, the method comprising: acquiring all construction data in a construction scene, and constructing a construction data matrix; calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix; extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix; calculating a mean square error between the output matrix and the signature matrix; and determining self-checking results of all construction data under the construction scene according to the mean square error.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. The construction data self-checking method is characterized by comprising the following steps of:
acquiring all construction data in a construction scene, and constructing a construction data matrix;
calculating the correlation degree among all data fields in the construction data matrix to obtain a signature matrix;
the correlation degree is a pearson correlation coefficient, and the calculating the correlation degree between the data fields in the construction data matrix comprises the following steps:
Figure FDA0004117968770000011
wherein X is the record data of data field X, Y is the record data of data field Y, sigma X Is X standard deviation, sigma Y Is Y standard deviation, cov (X, Y) is the covariance of X and Y;
extracting features of the signature matrix by using a pre-trained data anomaly detection model to obtain an output matrix;
calculating a mean square error between the output matrix and the signature matrix;
according to the mean square error, determining self-checking results of all construction data in the construction scene;
the data anomaly detection model is constructed based on a coding and decoding framework; the coding and decoding framework comprises a coding layer and a decoding layer;
the coding layer comprises a convolution layer L 1 Pool layer L 2 Full connection layer L 3 Full connection layer L 4 And full connection layer L 5 The method comprises the steps of carrying out a first treatment on the surface of the The convolution layer L 1 The convolution kernel size of (2) is 3x3, the output channel6, the input channel is 1; the pooling layer L 2 The convolution kernel size of (2 x 2); the full connection layer L 3 The input dimension of (n-2)/2 is rounded, the output dimension is 64, and n is the field number of the construction data recorded in the construction data matrix; the full connection layer L 4 Is 64, and the output dimension is 32; the full connection layer L 5 Is 32, and the output dimension is 8;
correspondingly, the decoding layer comprises a full connection layer L 6 Full connection layer L 7 And full connection layer L 8 Convolutional layer L 9 And pooling layer L 10 The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer L 6 Is 8, and the output dimension is 32; the full connection layer L 7 Is 32, and the output dimension is 64; the full connection layer L 8 The input dimension of (2) is 64, and the output dimension is (n-2)/2 is rounded; the convolution layer L 9 The convolution kernel size of (2 x 2); the pooling layer L 10 The convolution kernel size of (2) is 3x3, the output channel is 6, and the input channel is 1.
2. The construction data self-checking method according to claim 1, further comprising pre-training a pre-constructed data anomaly detection model by means of unsupervised learning before feature extraction of the signature matrix by using the pre-trained data anomaly detection model; the objective function of the unsupervised learning is:
Figure FDA0004117968770000021
wherein ,Si In order to sign a matrix sample,
Figure FDA0004117968770000022
for outputting matrix samples, M is the iteration number of training; and Θ is a parameter to be trained of the data anomaly detection model, and i is an intermediate parameter.
3. The construction data self-checking method according to claim 1, wherein said calculating a mean square error between the output matrix and the signature matrix comprises:
Figure FDA0004117968770000023
wherein ,
Figure FDA0004117968770000024
for the output matrix, S a For the signature matrix, lossE is the mean square error between the output matrix and the signature matrix.
4. A construction data self-checking method according to claim 3, wherein, before determining self-checking results of all construction data in the construction scene, an error mean value of normal construction data is determined as a reference value; and determining self-checking results of all construction data in the construction scene according to the mean square error, wherein the self-checking results comprise:
under the condition that the mean square error is larger than the reference value, determining the self-checking result as abnormal;
and under the condition that the mean square error is not greater than the reference value, determining that the self-checking result is normal.
5. The construction data self-checking method according to claim 1, further comprising, after pre-training the pre-constructed data anomaly detection model by means of unsupervised learning:
constructing a verification data set which consists of normal verification data generated by an autoregressive system and abnormal data after partial normal verification data are modified;
determining the precision and recall of the pre-trained data anomaly detection model by using the verification data set;
and determining the credibility of the pre-trained data anomaly detection model according to the precision and recall ratio.
6. A construction data self-checking system, comprising:
the initial data acquisition unit is used for acquiring all construction data in a construction scene and constructing a construction data matrix;
the correlation calculation unit is used for calculating the correlation among the data fields in the construction data matrix so as to obtain a signature matrix;
the correlation degree is a pearson correlation coefficient, and the calculating the correlation degree between the data fields in the construction data matrix comprises the following steps:
Figure FDA0004117968770000031
wherein X is the record data of data field X, Y is the record data of data field Y, sigma X Is X standard deviation, sigma Y Is Y standard deviation, cov (X, Y) is the covariance of X and Y;
the feature extraction unit is used for extracting features of the signature matrix by utilizing a pre-trained data anomaly detection model to obtain an output matrix;
the data anomaly detection model is constructed based on a coding and decoding framework; the coding and decoding framework comprises a coding layer and a decoding layer;
the coding layer comprises a convolution layer L 1 Pool layer L 2 Full connection layer L 3 Full connection layer L 4 And full connection layer L 5 The method comprises the steps of carrying out a first treatment on the surface of the The convolution layer L 1 The convolution kernel size of (2) is 3×3, the output channel is 6, and the input channel is 1; the pooling layer L 2 The convolution kernel size of (2 x 2); the full connection layer L 3 The input dimension of (n-2)/2 is rounded, the output dimension is 64, and n is the field number of the construction data recorded in the construction data matrix; the full connection layer L 4 Is 64, and the output dimension is 32; the full connection layer L 5 Is 32, and the output dimension is 8;
accordingly, the decodingThe layers include a full connection layer L 6 Full connection layer L 7 And full connection layer L 8 Convolutional layer L 9 And pooling layer L 10 The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer L 6 Is 8, and the output dimension is 32; the full connection layer L 7 Is 32, and the output dimension is 64; the full connection layer L 8 The input dimension of (2) is 64, and the output dimension is (n-2)/2 is rounded; the convolution layer L 9 The convolution kernel size of (2 x 2); the pooling layer L 10 The convolution kernel size of (2) is 3×3, the output channel is 6, and the input channel is 1;
an error calculation unit for calculating a mean square error between the output matrix and the signature matrix;
and the self-checking identification unit is used for determining self-checking results of all construction data under the construction scene according to the mean square error.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the construction data self-checking method steps of any one of claims 1 to 5.
8. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the construction data self-checking method steps of any one of claims 1 to 5.
CN202110524398.9A 2021-05-13 2021-05-13 Construction data self-checking method and system Active CN113239075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110524398.9A CN113239075B (en) 2021-05-13 2021-05-13 Construction data self-checking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110524398.9A CN113239075B (en) 2021-05-13 2021-05-13 Construction data self-checking method and system

Publications (2)

Publication Number Publication Date
CN113239075A CN113239075A (en) 2021-08-10
CN113239075B true CN113239075B (en) 2023-05-12

Family

ID=77134218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110524398.9A Active CN113239075B (en) 2021-05-13 2021-05-13 Construction data self-checking method and system

Country Status (1)

Country Link
CN (1) CN113239075B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245492B (en) * 2023-03-28 2023-09-26 啄木鸟房屋科技有限公司 Data processing method and device for tracking engineering progress

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11169514B2 (en) * 2018-08-27 2021-11-09 Nec Corporation Unsupervised anomaly detection, diagnosis, and correction in multivariate time series data
CN112241605B (en) * 2019-07-17 2023-12-19 华北电力大学(保定) Breaker energy storage process state identification method for constructing CNN feature matrix by using acoustic vibration signals
CN111030992B (en) * 2019-11-08 2022-04-15 厦门网宿有限公司 Detection method, server and computer readable storage medium
CN111880998B (en) * 2020-07-30 2022-09-02 平安科技(深圳)有限公司 Service system anomaly detection method and device, computer equipment and storage medium
CN111861272B (en) * 2020-07-31 2022-12-09 西安交通大学 Multi-source data-based complex electromechanical system abnormal state detection method
CN111879397B (en) * 2020-09-01 2022-05-13 国网河北省电力有限公司检修分公司 Fault diagnosis method for energy storage mechanism of high-voltage circuit breaker
CN112380098B (en) * 2020-11-19 2024-03-19 平安科技(深圳)有限公司 Timing sequence abnormality detection method and device, computer equipment and storage medium
CN112637132B (en) * 2020-12-01 2022-03-11 北京邮电大学 Network anomaly detection method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁小欧等.基于相关性分析的工业时序数据异常检测.《软件学报》.2020,全文. *

Also Published As

Publication number Publication date
CN113239075A (en) 2021-08-10

Similar Documents

Publication Publication Date Title
CN109302410B (en) Method and system for detecting abnormal behavior of internal user and computer storage medium
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
CN106790019A (en) The encryption method for recognizing flux and device of feature based self study
CN111343147B (en) Network attack detection device and method based on deep learning
CN113065525B (en) Age identification model training method, face age identification method and related device
CN111506785B (en) Social text-based network public opinion topic identification method and system
CN112016097B (en) Method for predicting network security vulnerability time to be utilized
CN111340233B (en) Training method and device of machine learning model, and sample processing method and device
CN114037478A (en) Advertisement abnormal flow detection method and system, electronic equipment and readable storage medium
CN113239075B (en) Construction data self-checking method and system
CN113240113A (en) Method for enhancing network prediction robustness
CN111915595A (en) Image quality evaluation method, and training method and device of image quality evaluation model
CN113705092B (en) Disease prediction method and device based on machine learning
CN114742319A (en) Method, system and storage medium for predicting scores of law examination objective questions
CN113822336A (en) Cloud hard disk fault prediction method, device and system and readable storage medium
CN107067034B (en) Method and system for rapidly identifying infrared spectrum data classification
CN115205619A (en) Training method, detection method, device and storage medium for detection model
CN114638304A (en) Training method of image recognition model, image recognition method and device
CN114218574A (en) Data detection method and device, electronic equipment and storage medium
CN113298182A (en) Early warning method, device and equipment based on certificate image
CN111209567A (en) Method and device for judging perceptibility of improving robustness of detection model
CN115186597B (en) Rotary multi-component degradation coupling influence assessment method and system based on transfer learning
CN116305588B (en) Wind tunnel test data anomaly detection method, electronic equipment and storage medium
CN114399355B (en) Information pushing method and device based on user conversion rate and electronic equipment
CN116822585A (en) CNNs-sensitive neuron-based automatic test case construction method and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230607

Address after: 100089 2nd floor, Beijing municipal building, 17 Changyun palace, Zizhuqiao, West Third Ring Road, Haidian District, Beijing

Patentee after: CHINA HIGHWAY ENGINEERING CONSULTING Corp.

Patentee after: CHECSC HIGHWAY MAINTENANCE AND TEST TECHNOLOGY CO.,LTD.

Patentee after: ZHONGZI DATA CO.,LTD.

Address before: 100089 2nd floor, Beijing municipal building, 17 Changyun palace, Zizhuqiao, West Third Ring Road, Haidian District, Beijing

Patentee before: CHINA HIGHWAY ENGINEERING CONSULTING Corp.

Patentee before: CHECSC HIGHWAY MAINTENANCE AND TEST TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right