CN116720084A - Data identification method, device, electronic equipment and computer readable storage medium - Google Patents

Data identification method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN116720084A
CN116720084A CN202310839483.3A CN202310839483A CN116720084A CN 116720084 A CN116720084 A CN 116720084A CN 202310839483 A CN202310839483 A CN 202310839483A CN 116720084 A CN116720084 A CN 116720084A
Authority
CN
China
Prior art keywords
data
information system
target information
sample
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310839483.3A
Other languages
Chinese (zh)
Inventor
杜旭辉
郑皆思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kostal Huayang Automotive Electric Co Ltd
Kostal Shanghai Mechatronic Co Ltd
Original Assignee
Shanghai Kostal Huayang Automotive Electric Co Ltd
Kostal Shanghai Mechatronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kostal Huayang Automotive Electric Co Ltd, Kostal Shanghai Mechatronic Co Ltd filed Critical Shanghai Kostal Huayang Automotive Electric Co Ltd
Priority to CN202310839483.3A priority Critical patent/CN116720084A/en
Publication of CN116720084A publication Critical patent/CN116720084A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application discloses a data identification method, which comprises the following steps: data acquisition is carried out on the target information system according to preset data types, and sample data corresponding to each preset data type is obtained; training an initial neural network model by utilizing each sample data to obtain a data identification model; processing the operation data of the target information system by using the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type; and determining key influence factors of the target information system in all preset data types according to the identification result. By applying the technical scheme provided by the application, the target information system can be subjected to more efficient data identification so as to quickly find the key influence factors thereof. The application also discloses a data identification device, electronic equipment and a computer readable storage medium, and the data identification device and the electronic equipment have the technical effects.

Description

Data identification method, device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of information data processing technologies, and in particular, to a data identification method, and also to a data identification device, an electronic apparatus, and a computer readable storage medium.
Background
Currently, in the production practice of the automobile parts electronic manufacturing industry, DOE technology (Design OfExperiment, experimental design, a statistical method for optimizing and improving product process parameters and production flows, etc.) is commonly used to analyze various factors in the production process to find key influencing factors and improve. However, with the increasing development of the digital manufacturing industry, the wave soldering manufacturing process has more and more data, such as preheating temperature, wave soldering temperature, tin wave height, etc., and tens of thousands of data are generated every moment, and if DOE technology is continuously used, a great deal of experiments and data collection are required, which may cause waste of time and cost, and thus inefficiency of the identification process.
Therefore, how to perform more efficient data identification on the target information system to quickly find the key influencing factors is a problem to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a data identification method which can perform more efficient data identification on a target information system so as to quickly find key influence factors of the target information system; another object of the present application is to provide a data identification device, an electronic apparatus, and a computer readable storage medium, which all have the above advantages.
In a first aspect, the present application provides a data identification method, including:
data acquisition is carried out on the target information system according to preset data types, and sample data corresponding to each preset data type is obtained;
training an initial neural network model by utilizing each sample data to obtain a data identification model;
processing the operation data of the target information system by using the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type;
and determining key influence factors of the target information system in all preset data types according to the identification result.
Optionally, training the initial neural network model by using each sample data to obtain a data identification model, including:
dividing the sample data according to a preset proportion to obtain a training sample, a verification sample and a test sample;
performing iterative training on the initial neural network model by using the training sample and the verification sample to obtain an initial data identification model;
evaluating the initial data identification model by using the test sample to obtain an evaluation result;
Returning to the step of performing iterative training on the initial neural network model by using the training sample and the verification sample to obtain an initial data identification model when the evaluation result is that the evaluation fails;
and when the evaluation result is that the evaluation passes, determining the initial data recognition model as the data recognition model.
Optionally, before training the initial neural network model by using each sample data to obtain the data identification model, the method further includes:
preprocessing the sample data; the preprocessing includes one or more of data cleansing, outlier processing, data normalization, and missing value padding.
Optionally, the determining the key influencing factor of the target information system in all the preset data types according to the identification result includes:
determining the influence weight of each preset data type according to the identification result;
and taking the preset data type corresponding to the influence weight with the maximum value as the key influence factor of the target information system.
Optionally, the data identification method further includes:
screening the operation data corresponding to the key influence factors to obtain abnormal operation data;
Calculating the reject ratio of the target information system according to the quantity of the abnormal operation data;
and outputting the key influence factors, the abnormal operation data and the reject ratio.
Optionally, the data identification method further includes:
monitoring the target information system according to the key influence factors to obtain monitoring data;
when the monitoring data does not exceed the first threshold range, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data;
when the monitoring data exceeds the first threshold range and does not exceed the second threshold range, acquiring log information of the target information system;
when the log information does not have abnormal information, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data;
when the log information contains the abnormal information, outputting an alarm prompt;
and when the monitoring data exceeds the second threshold range, sending the monitoring data to a target terminal so that a target terminal user can remotely debug the target information system through the target terminal.
Optionally, when the monitored data does not exceed the first threshold range, the step of returning to monitor the target information system according to the key impact factor to obtain the monitored data further includes:
carrying out normal verification on the monitoring data to obtain a verification result;
and when the verification result does not meet the preset requirement, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data.
In a second aspect, the present application also discloses a data identification device, including:
the acquisition module is used for carrying out data acquisition on the target information system according to preset data types to obtain sample data corresponding to each preset data type;
the training module is used for training the initial neural network model by utilizing the sample data to obtain a data identification model;
the processing module is used for processing the operation data of the target information system by utilizing the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type;
and the determining module is used for determining key influence factors of the target information system in all the preset data types according to the identification result.
In a third aspect, the present application also discloses an electronic device, including:
a memory for storing a computer program;
a processor for implementing the steps of any one of the data recognition methods described above when executing the computer program.
In a fourth aspect, the present application also discloses a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the data recognition methods described above.
The application provides a data identification method, which comprises the following steps: data acquisition is carried out on the target information system according to preset data types, and sample data corresponding to each preset data type is obtained; training an initial neural network model by utilizing each sample data to obtain a data identification model; processing the operation data of the target information system by using the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type; and determining key influence factors of the target information system in all preset data types according to the identification result.
By applying the technical scheme provided by the application, firstly, data acquisition is carried out on a target information system according to preset data types to obtain corresponding sample data, wherein each preset data type is a potential influence factor in the target information system, and then, training of a neural network model is carried out by utilizing sample data corresponding to each preset data type to obtain a data identification model; therefore, for the target information system, the data recognition model obtained through training can be directly utilized to recognize the operation data of each preset data type, so that a recognition result is obtained, and key influence factors of the target information system are determined in all preset data types based on the recognition result. Obviously, compared with the DOE technology, the technical scheme DOEs not need to carry out a large amount of experiments, effectively simplifies the operation flow, can carry out more efficient data identification on the target information system so as to quickly find key influence factors of the target information system, and has better practicability and application prospect.
The data identification device, the electronic device and the computer readable storage medium provided by the application have the same technical effects as described above, and the application is not repeated here.
Drawings
In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the following will briefly describe the drawings that need to be used in the description of the prior art and the embodiments of the present application. Of course, the following drawings related to embodiments of the present application are only a part of embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any inventive effort, and the obtained other drawings also fall within the scope of the present application.
FIG. 1 is a schematic flow chart of a data identification method provided by the application;
FIG. 2 is a flow chart of another data identification method according to the present application;
FIG. 3 is a schematic diagram of a data recognition device according to the present application;
fig. 4 is a schematic structural diagram of an electronic device according to the present application.
Detailed Description
The core of the application is to provide a data identification method which can perform more efficient data identification on a target information system so as to quickly find key influence factors thereof; another core of the present application is to provide a data identification device, an electronic apparatus, and a computer readable storage medium, which all have the above advantages.
In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application provides a data identification method.
Referring to fig. 1, fig. 1 is a flowchart of a data identification method provided by the present application, where the data identification method may include the following steps S101 to S104.
S101: and acquiring data of the target information system according to the preset data types to obtain sample data corresponding to each preset data type.
Firstly, it should be noted that the data identification method provided by the present application aims to implement data identification, specifically, data identification is performed on a target information system to determine a key impact factor of the target information system, where the key impact factor refers to a data type that may have a key impact on an actual operation of the target information system. In one implementation, the target information system may be an MES system (a common management information system in the electronic manufacturing industry, for monitoring and optimizing real-time production data during manufacturing, etc.).
Further, this step is directed to enabling data acquisition to obtain sample data in the target information system, which is used to enable subsequent model training. In the sample data acquisition process, the target information system can be subjected to data acquisition according to preset data types so as to obtain sample data corresponding to each preset data type. The data identification method provided by the application is that the data type which has key influence on the actual operation of the target information system is found in all preset data types, namely the key influence factor. It will be appreciated that the number and variety of the preset data types may be set by the skilled person according to the actual situation, and the application is not limited thereto, for example, for the MES system, the preset data types may be set to include a preheating temperature zone temperature, a wave soldering temperature zone temperature, a tin wave height, and the like.
S102: and training the initial neural network model by utilizing each sample data to obtain a data identification model.
The method aims at realizing model training and obtaining a data identification model for carrying out data identification, wherein the data identification model is a network model based on a preset neural network. Specifically, an initial neural network model may be built first, including the neural network layers (such as an input layer, a hidden layer, and an output layer), the number of neurons and neuron activation functions, loss functions, and other various initial model parameters of each neural network layer, and then, the initial neural network model is trained by using sample data corresponding to each preset data type, so as to obtain a neural network model meeting requirements (such as that model loss reaches a preset range), that is, the data identification model.
S103: and processing the operation data of the target information system by using the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type.
The method aims at realizing data processing based on the data recognition model and obtaining a corresponding recognition result. It should be noted that, the above-mentioned sample data with multiple preset data types are used in the model training, and correspondingly, when the data identification processing is performed on the target information system, the operation data with multiple preset data types also need to be acquired and acquired, and then the operation data are input into the data identification model for processing, and the output of the data identification model is the final identification result. It can be understood that the recognition result of the data recognition model, that is, the influence degree of the preset data types on the target information system, is used for determining the key influence factors of the target information system in all preset data types according to the recognition result.
S104: and determining key influence factors of the target information system in all preset data types according to the identification result.
This step aims at achieving the determination of a key impact factor, i.e. the data type of all preset data types that has the most key impact on the actual operation of the target information system. As described above, the recognition result of the data recognition model, that is, the degree of influence of each preset data type on the target information system, thereby, the data type that has the most critical influence (the greatest degree of influence) on the target information system among all the preset data types, that is, the critical influence factor of the target information system, can be determined based on the recognition result.
It can be seen that, in the data identification method provided by the embodiment of the present application, firstly, for a target information system, data acquisition is performed on the target information system according to preset data types to obtain corresponding sample data, wherein each preset data type is a potential influencing factor in the target information system, and then, neural network model training is performed by using sample data corresponding to each preset data type to obtain a data identification model; therefore, for the target information system, the data recognition model obtained through training can be directly utilized to recognize the operation data of each preset data type, so that a recognition result is obtained, and key influence factors of the target information system are determined in all preset data types based on the recognition result. Obviously, compared with the DOE technology, the technical scheme DOEs not need to carry out a large amount of experiments, effectively simplifies the operation flow, can carry out more efficient data identification on the target information system so as to quickly find key influence factors of the target information system, and has better practicability and application prospect.
Based on the above embodiments:
in an embodiment of the present application, the training the initial neural network model by using each sample data to obtain the data identification model may include the following steps:
Dividing sample data according to a preset proportion to obtain a training sample, a verification sample and a test sample;
performing iterative training on the initial neural network model by using a training sample and a verification sample to obtain an initial data identification model;
evaluating the initial data identification model by using a test sample to obtain an evaluation result;
when the evaluation result is that the evaluation fails, returning to the step of performing iterative training on the initial neural network model by using the training sample and the verification sample to obtain an initial data identification model;
and when the evaluation result is that the evaluation passes, determining an initial data recognition model as a data recognition model. Specifically, sample data can be divided according to a preset proportion to obtain a training sample, a verification sample and a test sample, wherein the training sample is used for realizing model training, the verification sample is used for carrying out cross verification on a model trained in each iteration in the model training process, and the test sample is used for carrying out precision evaluation on a model obtained through final training. Therefore, the training sample and the verification sample can be used for carrying out iterative training on the initial neural network model to obtain an initial data identification model, then the initial data identification model is evaluated by using the test sample to obtain an evaluation result, and of course, the evaluation result is used for indicating whether the initial data identification model obtained by the training meets the preset requirement (such as whether the model precision meets the preset range or not), if the model precision meets the preset requirement, the evaluation is passed, and the initial data identification model obtained by the current training can be directly used as a final data identification model to complete the model training; otherwise, if the preset requirement is not met, the evaluation is not passed, and the iterative training step can be returned to carry out model training again until the data identification model with passed evaluation is obtained.
The embodiment of the application provides a training method of a data identification model.
In an embodiment of the present application, training the initial neural network model by using each sample data may further include, before obtaining the data identification model: preprocessing sample data; the preprocessing includes one or more of data cleansing, outlier processing, data normalization, and missing value padding.
The data identification method provided by the embodiment of the application can also realize a data preprocessing function so as to effectively ensure the integrity and accuracy of sample data, further ensure the high precision of a data identification model and improve the accuracy of an identification result. The preprocessing of the sample data may include one or more of data cleaning, outlier processing, data normalization, and missing value filling, and may be set by a technician according to actual requirements.
In an embodiment of the present application, the determining the key impact factor of the target information system in all preset data types according to the identification result may include the following steps:
determining the influence weight of each preset data type according to the identification result;
And taking the preset data type corresponding to the influence weight with the maximum value as a key influence factor of the target information system.
The embodiment of the application provides an implementation method for determining key influence factors in all preset data types. Specifically, the identification result of the data identification model on the running data can be specifically the influence weight of each preset data type on the target information system, and the larger the value of the influence weight is, the larger the influence degree of the corresponding preset data type on the target information system is, the smaller the value of the influence weight is, and the smaller the influence program of the corresponding preset data type on the target information system is. Therefore, the influence weight of each preset data type can be determined according to the recognition result of the data recognition model, and then the preset data type corresponding to the influence weight with the largest value is used as the key influence factor of the target information system.
In one embodiment of the present application, the data identification method may further include the steps of:
screening operation data corresponding to the key influence factors to obtain abnormal operation data;
calculating the reject ratio of the target information system according to the quantity of the abnormal operation data;
And outputting key influence factors, abnormal operation data and reject ratio.
The data identification method provided by the embodiment of the application can further realize the output function of the identification result. Specifically, after determining the key influence factor of the target information system, the operation data corresponding to the key influence factor can be screened to obtain abnormal operation data, wherein the abnormal operation data is the operation data which does not meet the corresponding standard threshold range in all the operation data, the quantity of the abnormal operation data is counted, the reject ratio of the target information system is further calculated, and finally, when data output is carried out, the output content can comprise three types of data information, namely the key influence factor, the abnormal operation data and the reject ratio.
In one embodiment of the present application, the data identification method may further include the steps of:
monitoring a target information system according to the key influence factors to obtain monitoring data;
when the monitoring data does not exceed the first threshold range, returning to the step of monitoring the target information system according to the key influence factors to obtain the monitoring data;
when the monitoring data exceeds the first threshold range and does not exceed the second threshold range, acquiring log information of the target information system;
When the log information does not have abnormal information, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data;
when abnormal information exists in the log information, outputting an alarm prompt;
and when the monitoring data exceeds the second threshold range, sending the monitoring data to the target terminal so that a target terminal user can remotely debug the target information system through the target terminal.
The data identification method provided by the embodiment of the application can further realize the data monitoring of key influence factors. Specifically, since the key influence factors which can have key influence on the operation of the target information system are determined, the actual operation data of the key influence factors can be monitored to obtain the monitoring data, the monitoring process can be real-time monitoring/timing monitoring, the application is not limited to the monitoring process, and then the monitoring data is subjected to threshold evaluation so as to determine the actual operation condition of the target information system according to the value condition of the monitoring data, and further, the corresponding operation and maintenance strategy is adopted to ensure the normal operation of the target information system.
In the implementation process, a first threshold range and a second threshold range may be preset, different threshold ranges correspond to different running conditions of the target information system, specifically, the first threshold range may be an early warning range, and the second threshold range may be an alarm range. Firstly, judging whether the monitoring data exceeds a first threshold range, if not, indicating that the operation of the target information system is abnormal, returning to the monitoring flow to continue the collection of the monitoring data, and if so, indicating that the target information system is possibly abnormal, thus the monitoring of the monitoring data can be continued; further, under the condition that the monitoring data exceeds the first threshold range, continuously judging whether the monitoring data exceeds the second threshold range, if not, indicating that the target information system is likely to have an abnormality, at this time, further determining by combining the log information of the target information system, if no abnormal log information exists in the log information, determining that the target information system runs without an abnormality, returning to the monitoring process to continuously collect the monitoring data, if abnormal log information exists in the log information, determining that the target information system runs really has an abnormality, and outputting an alarm prompt for reminding a technician that the target information system is abnormal currently so as to perform operation and maintenance processing in time; finally, when the monitoring data is determined to be beyond the second threshold range, it is indicated that the target information system is abnormal and the situation is relatively serious, at this time, the monitoring data can be sent to the target terminal, so that a technician (target terminal user) of the target terminal can remotely regulate and control the target information system according to the monitoring data, the target information system can be ensured to continue to operate, and inconvenience brought to users due to service interruption caused by system abnormality is avoided.
In an embodiment of the present application, before the step of returning to monitoring the target information system according to the key impact factor to obtain the monitoring data when the monitoring data does not exceed the first threshold range, the method may further include the following steps:
carrying out normal verification on the monitoring data to obtain a verification result;
and when the verification result does not meet the preset requirement, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data.
The data identification method provided by the embodiment of the application can also carry out the normalization check on the monitoring data before carrying out the threshold evaluation on the monitoring data, and the normalization check aims at avoiding the problem of inaccurate subsequent threshold evaluation caused by the acquisition error of the monitoring data. Specifically, after the monitoring data is obtained, the monitoring data can be subjected to normal verification, namely whether the monitoring data accords with normal distribution or not is judged, if so, the monitoring data acquisition process is not abnormal, the accuracy of the monitoring data can be determined, and at the moment, the threshold evaluation flow can be continuously entered; otherwise, if the monitoring data does not accord with normal distribution, the acquisition process is possibly abnormal, and the acquired monitoring data is inaccurate at the moment, so that the data monitoring flow can be returned to acquire the monitoring data again, the accuracy of the acquired monitoring data is effectively ensured, and the accuracy assurance is conveniently provided for subsequent abnormal operation and maintenance work.
Based on the above embodiments, the embodiment of the present application uses the MES system as an example, and provides another data identification method.
Referring to fig. 2, fig. 2 is a flow chart of another data identification method provided by the present application, and the implementation flow of the data identification method is as follows:
first, for the MES system, the following ten data types to be identified can be set: 6 preheating temperature areas, 2 wave soldering temperature areas and 2 tin wave heights (all according with normal distribution); further, sample data over a historical period of time (e.g., 10 months) may be derived from the MES system for neural network analysis to find key impact factors for the MES system.
Assuming that the wave soldering off-line reject ratio is f (x), using a willingness function, the smaller f (x) is, the better; simultaneously setting 10 suspicious factors of a wave soldering production line: preheating temperature zone 1=x1, preheating temperature zone 2=x2, preheating temperature zone 3=x3, preheating temperature zone 4=x4, preheating temperature zone 5=x5, preheating temperature zone 6=x6, wave soldering temperature zone 1=x7, wave soldering temperature zone 2=x8, tin wave height 1=x9, tin wave height 2=x10. Accordingly, the implementation steps may include:
(1) A Python (a high-level programming language) script is written to import requests library (a third party library in Python) and set up URLs (Uniform Resource Locator, addresses for locating and accessing internet resources) of MES system APIs (ApplicationPrograming Interface, application programming interfaces) and required API keys.
(2) The API of the MES is called using the request.get function and the API key, a request response is obtained, and the request response is checked to confirm whether the sample data was successfully obtained.
(3) When the sample data acquisition is determined to be successful, the json library is used for processing the request response of the API, and the obtained sample data is converted into a data structure list of Python.
(4) The list of data structures is saved to the csv file and then to the server.
(5) Sample data in the list of data structures is preprocessed, including, but not limited to, outlier identification, missing value population, data cleaning, data normalization, and the like.
(6) The preprocessed sample data is divided into a training set, a verification set and a test set, and is stored as a corresponding numpy array.
(7) A deep learning library TensorFlow (an open source machine learning framework) and associated modules are imported.
(8) An initial neural network model structure is designed in TensorFlow, and the neuron numbers of an input layer, a hidden layer and an output layer and a neuron activation function TanH (hyperbolic tangent function) are determined.
(9) Setting initial model parameters: gradient descent optimizer, initial learning rate, loss function, etc., wherein the loss function may select mean square error.
(10) Combining the designs in (8) and (9) into a compliance model:
model.compile(loss=loss_fn,optimizer=optimizer,metrics=["mse"])。
(11) Define training round (epochs) and batch size (batch size) in each round of training: epochs=100; batch_size=32.
(12) Training of the model on the training set in (10) is achieved:
history=model.fit(X_train,y_train,batch_size=batch_size,epochs=epochs,valid ation_data=(X_val,y_val),verbose=1);
wherein, X_train and y_train represent the characteristics and the labels of the training set, X_val and y_val represent the characteristics and the labels of the verification set, and the verbose parameter is set to 1 to represent the progress bar and the log information in the output training process.
(13) During each round of training, training loss and validation loss are output:
model_history=history.historytrain_loss=model_history["loss"]
val loss=model_history["val_loss"]
for epoch in range(epochs):
print(f"Epoch{epoch+1}/{epochs}")
print(f"Train loss:{train_loss[epoch]:.4f}")
print(f"Validation loss:{val_loss[epoch]:.4f}")。
(14) The training process is visualized, and a loss curve can be drawn by using matplotlib for visual display.
(15) When the loss is not reduced any more, super parameters such as learning rate and the like can be adjusted; for example, the learn rate may be automatically reduced using the reduce lronplateau callback function:
lr_scheduler=keras.callbacks.ReduceLROnPlateau(factor=0.5,patience=10,ve rbose=1)history=model.fit(X_train,y_train,batch_size=batch_size,epochs=epochs,validation_data=(X_val,y_val),callbacks=[lr_scheduler],verbose=1);
wherein the factor parameter represents a factor of decreasing the learning rate, the space parameter represents how many epochs are observed and then the learning rate is decreased if no progress is made, and the verbose parameter is set to 1 to represent log information of the output callback function.
(16) And adjusting the structures such as the layer number, the activation function, the loss function and the like of the neural network model, and evaluating the performance of the model by using a cross-validation method through a validation set.
(17) Training the model until the verification loss tends to be gentle, and recording optimal model parameters; wherein it may be determined to cease training when the validation loss is no longer improving using the EarlyStopping callback function.
(18) And (3) carrying out model evaluation by using the test set, calculating the mean square error and the R2 value of the model, and determining to obtain a final neural network model when the mean square error and the R2 value meet the actual requirements.
(19) Analyzing which factors in the ten types of data are most important by using the neural network model in (18):
importance=np.abs(model.layers[0].get_weights()[0]).sum(axis=1);
importance=importance/importance.sum()。
(20) Extracting influence weights and bias terms of influence factors from model output results, and analyzing the values to find rules in the influence weights and bias terms:
weights,biases=model.layer2.weight.squeeze(),model.layer2.bias
print ("weights:", weights)
print ("bias term:", biases)
Index for finding the most important factor #)
key_factor_index=torch.argmax(weights)
print ("key_factor_index").
(21) Output result (taking the key influence factor as tin wave height 2 as an example):
the key factor # is tin wave height 2 (x 10)
print ("key factor: tin wave height 2 (x 10)");
a value of tin wave height 2 (x 10) greater than 1500 in sample data #
above_1500=data[:,9][data[:,9]>1500];
print ("tin wave height 2 (x 10) is a value greater than 1500:", len (above_1500));
calculation of the failure rate when the tin wave height 2 (x 10) is greater than 1500
above_1500_tensor=torch.Tensor(above_1500.reshape(-1,10))output=model(above_1500_tensor)average_f=torch.mean(output).item()
print ("failure rate at tin wave height 2 (x 10) greater than 1500:", average_f).
(22) And (3) automatically judging according to the output result, and carrying out the following processing aiming at the identified key influence factors:
A. the process may use the pyModbusTCP library for ModbusTCP communication implementation by reading a predetermined number (e.g., 50) of operating data for tin wave height 2 at predetermined time intervals (e.g., 50 seconds):
client=ModbusTcpClient(host='192.168.1.1',port=502);
client.connect();
B. carrying out normal test on the operation data, and determining that the operation data accords with normal distribution when the P value is more than 0.05;
C. when the operation data of the tin wave height 2 does not accord with normal distribution, indicating that the data acquisition process is abnormal, and returning to the step A to perform operation data acquisition again;
D. when the operation data of the tin wave height 2 accords with the normal distribution, judging whether the operation data exceeds a first threshold range or not: an upper control line 1300 and a lower control line 1100;
E. if the first threshold range is not exceeded, the MES system is free of abnormality, and the operation data acquisition is continued by returning to the A;
F. if the first threshold range is exceeded, continuing to judge whether the operation data exceeds the second threshold range: an upper control line 1400 and a lower control line 1000;
G. if the second threshold range is not exceeded, reading log information of the MES system, and inquiring whether an abnormal log exists in the log information;
H. If no abnormal log exists, the MES system is indicated to be in normal fluctuation, and the operation data acquisition is continued by returning to the A;
I. if the abnormal log exists, outputting an alarm prompt to inform the equipment engineer; of course, stopping the automatic treatment process if the operational data is restored within the first threshold range;
J. if the abnormal operation data exceeds the second threshold range, the abnormal operation data is sent to a quality engineer terminal so that the quality engineer can remotely debug the MES system.
Based on the above A to J, the processing procedure of the operation data of the tin wave height 2 is as follows:
so far, the identification and the post-processing of key influencing factors in the MES system are realized.
It can be seen that, in the data identification method provided by the embodiment of the present application, firstly, for a target information system, data acquisition is performed on the target information system according to preset data types to obtain corresponding sample data, wherein each preset data type is a potential influencing factor in the target information system, and then, neural network model training is performed by using sample data corresponding to each preset data type to obtain a data identification model; therefore, for the target information system, the data recognition model obtained through training can be directly utilized to recognize the operation data of each preset data type, so that a recognition result is obtained, and key influence factors of the target information system are determined in all preset data types based on the recognition result. Obviously, compared with the DOE technology, the technical scheme DOEs not need to carry out a large amount of experiments, effectively simplifies the operation flow, can carry out more efficient data identification on the target information system so as to quickly find key influence factors of the target information system, and has better practicability and application prospect.
The embodiment of the application provides a data identification device.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a data identification device provided by the present application, where the data identification device may include:
the acquisition module 1 is used for carrying out data acquisition on the target information system according to preset data types to obtain sample data corresponding to each preset data type;
the training module 2 is used for training the initial neural network model by utilizing each sample data to obtain a data identification model;
the processing module 3 is used for processing the operation data of the target information system by utilizing the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type;
and the determining module 4 is used for determining key influence factors of the target information system in all preset data types according to the identification result.
It can be seen that, in the data identification device provided by the embodiment of the present application, firstly, for a target information system, data acquisition is performed on the target information system according to preset data types to obtain corresponding sample data, wherein each preset data type is a potential influencing factor in the target information system, and then, neural network model training is performed by using sample data corresponding to each preset data type to obtain a data identification model; therefore, for the target information system, the data recognition model obtained through training can be directly utilized to recognize the operation data of each preset data type, so that a recognition result is obtained, and key influence factors of the target information system are determined in all preset data types based on the recognition result. Obviously, compared with the DOE technology, the technical scheme DOEs not need to carry out a large amount of experiments, effectively simplifies the operation flow, can carry out more efficient data identification on the target information system so as to quickly find key influence factors of the target information system, and has better practicability and application prospect.
In one embodiment of the present application, the training module 2 may be specifically configured to divide sample data according to a preset ratio to obtain a training sample, a verification sample, and a test sample; performing iterative training on the initial neural network model by using a training sample and a verification sample to obtain an initial data identification model; evaluating the initial data identification model by using a test sample to obtain an evaluation result; when the evaluation result is that the evaluation fails, returning to the step of performing iterative training on the initial neural network model by using the training sample and the verification sample to obtain an initial data identification model; and when the evaluation result is that the evaluation passes, determining an initial data recognition model as a data recognition model.
In one embodiment of the present application, the data recognition device may further include a preprocessing module, configured to preprocess the sample data before training the initial neural network model with each sample data to obtain the data recognition model; the preprocessing includes one or more of data cleansing, outlier processing, data normalization, and missing value padding.
In one embodiment of the present application, the determining module 4 may be specifically configured to determine the impact weight of each preset data type according to the identification result; and taking the preset data type corresponding to the influence weight with the maximum value as a key influence factor of the target information system.
In one embodiment of the present application, the data identification device may further include an output module, configured to screen operation data corresponding to the key impact factors to obtain abnormal operation data; calculating the reject ratio of the target information system according to the quantity of the abnormal operation data; and outputting key influence factors, abnormal operation data and reject ratio.
In one embodiment of the present application, the data identification device may further include a monitoring module, configured to monitor the target information system according to the key impact factor, to obtain monitoring data;
when the monitoring data does not exceed the first threshold range, returning to the step of monitoring the target information system according to the key influence factors to obtain the monitoring data; when the monitoring data exceeds the first threshold range and does not exceed the second threshold range, acquiring log information of the target information system; when the log information does not have abnormal information, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data; when abnormal information exists in the log information, outputting an alarm prompt; and when the monitoring data exceeds the second threshold range, sending the monitoring data to the target terminal so that a target terminal user can remotely debug the target information system through the target terminal.
In an embodiment of the present application, the monitoring module may be further configured to return to monitoring the target information system according to the key impact factor when the monitored data does not exceed the first threshold range, and perform a normal check on the monitored data before the step of obtaining the monitored data, to obtain a check result; and when the verification result does not meet the preset requirement, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data.
For the description of the apparatus provided by the embodiment of the present application, refer to the above method embodiment, and the description of the present application is omitted here.
The embodiment of the application provides electronic equipment.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to the present application, where the electronic device may include:
a memory for storing a computer program;
a processor for implementing the steps of any one of the data recognition methods described above when executing the computer program.
As shown in fig. 4, which is a schematic diagram of a composition structure of an electronic device, the electronic device may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all complete communication with each other through a communication bus 13.
In an embodiment of the present application, the processor 10 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a field programmable gate array, or other programmable logic device, etc.
The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in an embodiment of the data identification method.
The memory 11 is used for storing one or more programs, and the programs may include program codes including computer operation instructions, and in the embodiment of the present application, at least the programs for implementing the following functions are stored in the memory 11:
data acquisition is carried out on the target information system according to preset data types, and sample data corresponding to each preset data type is obtained;
training an initial neural network model by utilizing each sample data to obtain a data identification model;
processing the operation data of the target information system by using the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type;
and determining key influence factors of the target information system in all preset data types according to the identification result.
In one possible implementation, the memory 11 may include a storage program area and a storage data area, where the storage program area may store an operating system, and at least one application program required for functions, etc.; the storage data area may store data created during use.
In addition, the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
The communication interface 12 may be an interface of a communication module for interfacing with other devices or systems.
Of course, it should be noted that the structure shown in fig. 4 is not limited to the electronic device in the embodiment of the present application, and the electronic device may include more or fewer components than those shown in fig. 4 or may combine some components in practical applications.
Embodiments of the present application provide a computer-readable storage medium.
The computer readable storage medium provided by the embodiment of the present application stores a computer program, and when the computer program is executed by a processor, the steps of any one of the data identification methods described above can be implemented.
The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RandomAccess Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
For the description of the computer-readable storage medium provided in the embodiment of the present application, refer to the above method embodiment, and the description of the present application is omitted here.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The technical scheme provided by the application is described in detail. The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present application and its core ideas. It should be noted that it will be apparent to those skilled in the art that the present application may be modified and practiced without departing from the spirit of the present application.

Claims (10)

1. A method of data identification, comprising:
data acquisition is carried out on the target information system according to preset data types, and sample data corresponding to each preset data type is obtained;
Training an initial neural network model by utilizing each sample data to obtain a data identification model;
processing the operation data of the target information system by using the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type;
and determining key influence factors of the target information system in all preset data types according to the identification result.
2. The method of claim 1, wherein training the initial neural network model using each of the sample data to obtain the data identification model comprises:
dividing the sample data according to a preset proportion to obtain a training sample, a verification sample and a test sample;
performing iterative training on the initial neural network model by using the training sample and the verification sample to obtain an initial data identification model;
evaluating the initial data identification model by using the test sample to obtain an evaluation result;
returning to the step of performing iterative training on the initial neural network model by using the training sample and the verification sample to obtain an initial data identification model when the evaluation result is that the evaluation fails;
And when the evaluation result is that the evaluation passes, determining the initial data recognition model as the data recognition model.
3. The method of claim 1, wherein training the initial neural network model using each of the sample data, before obtaining the data identification model, further comprises:
preprocessing the sample data; the preprocessing includes one or more of data cleansing, outlier processing, data normalization, and missing value padding.
4. The data identification method according to claim 1, wherein the determining the key impact factors of the target information system among all the preset data types according to the identification result comprises:
determining the influence weight of each preset data type according to the identification result;
and taking the preset data type corresponding to the influence weight with the maximum value as the key influence factor of the target information system.
5. The data identification method of claim 1, further comprising:
screening the operation data corresponding to the key influence factors to obtain abnormal operation data;
calculating the reject ratio of the target information system according to the quantity of the abnormal operation data;
And outputting the key influence factors, the abnormal operation data and the reject ratio.
6. The data recognition method according to any one of claims 1 to 5, further comprising:
monitoring the target information system according to the key influence factors to obtain monitoring data;
when the monitoring data does not exceed the first threshold range, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data;
when the monitoring data exceeds the first threshold range and does not exceed the second threshold range, acquiring log information of the target information system;
when the log information does not have abnormal information, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data;
when the log information contains the abnormal information, outputting an alarm prompt;
and when the monitoring data exceeds the second threshold range, sending the monitoring data to a target terminal so that a target terminal user can remotely debug the target information system through the target terminal.
7. The method for identifying data according to claim 6, wherein when the monitored data does not exceed the first threshold range, returning to the step of monitoring the target information system according to the key impact factor to obtain the monitored data, further comprises:
Carrying out normal verification on the monitoring data to obtain a verification result;
and when the verification result does not meet the preset requirement, returning to the step of monitoring the target information system according to the key influence factors to obtain monitoring data.
8. A data recognition device, comprising:
the acquisition module is used for carrying out data acquisition on the target information system according to preset data types to obtain sample data corresponding to each preset data type;
the training module is used for training the initial neural network model by utilizing the sample data to obtain a data identification model;
the processing module is used for processing the operation data of the target information system by utilizing the data identification model to obtain an identification result, wherein the operation data comprises operation data corresponding to each preset data type;
and the determining module is used for determining key influence factors of the target information system in all the preset data types according to the identification result.
9. An electronic device, comprising:
a memory for storing a computer program;
processor for implementing the steps of the data identification method according to any of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data identification method according to any of claims 1 to 7.
CN202310839483.3A 2023-07-10 2023-07-10 Data identification method, device, electronic equipment and computer readable storage medium Pending CN116720084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310839483.3A CN116720084A (en) 2023-07-10 2023-07-10 Data identification method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310839483.3A CN116720084A (en) 2023-07-10 2023-07-10 Data identification method, device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116720084A true CN116720084A (en) 2023-09-08

Family

ID=87867926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310839483.3A Pending CN116720084A (en) 2023-07-10 2023-07-10 Data identification method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116720084A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454166A (en) * 2023-10-11 2024-01-26 国网四川省电力公司电力科学研究院 Method for identifying arc faults of ignition based on EffNet lightweight model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454166A (en) * 2023-10-11 2024-01-26 国网四川省电力公司电力科学研究院 Method for identifying arc faults of ignition based on EffNet lightweight model
CN117454166B (en) * 2023-10-11 2024-05-10 国网四川省电力公司电力科学研究院 Method for identifying arc faults of induced thermal power based on EffNet lightweight model

Similar Documents

Publication Publication Date Title
Neumann An enhanced neural network technique for software risk analysis
CN101470426B (en) Fault detection method and system
US7925638B2 (en) Quality management in a data-processing environment
CN111309539A (en) Abnormity monitoring method and device and electronic equipment
JP6722309B2 (en) Verification method and apparatus for performing annotation processing work using verification annotation processing work
CN116955092B (en) Multimedia system monitoring method and system based on data analysis
CN116720084A (en) Data identification method, device, electronic equipment and computer readable storage medium
US20230081022A1 (en) Systems and methods for computing database interactions and evaluating interaction parameters
CN109544014B (en) Anti-fraud method and device based on historical data playback
CN110865924A (en) Health degree diagnosis method and health diagnosis framework for internal server of power information system
CN111597550A (en) Log information analysis method and related device
CN111160329A (en) Root cause analysis method and device
CN110335144B (en) Personal electronic bank account security detection method and device
CN116186221A (en) Big data analysis method and system applied to online dialogue platform
CN107871213B (en) Transaction behavior evaluation method, device, server and storage medium
CN116107789A (en) Method for monitoring and analyzing application fault reasons and storage medium
CN116340934A (en) Terminal abnormal behavior detection method, device, equipment and storage medium
CN113269378A (en) Network traffic processing method and device, electronic equipment and readable storage medium
CN117495544A (en) Sandbox-based wind control evaluation method, sandbox-based wind control evaluation system, sandbox-based wind control evaluation terminal and storage medium
CN116956250A (en) Abnormality detection method, device, equipment and medium for user behavior
CN116662186A (en) Log playback assertion method and device based on logistic regression and electronic equipment
CN114971638A (en) Transaction authentication method and device based on risk identification
CN112395280B (en) Data quality detection method and system
CN113011748A (en) Recommendation effect evaluation method and device, electronic equipment and readable storage medium
KR20220084618A (en) Apparatus and method for predicting loan arrears based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination