CN114662698A - Industrial internet multi-modal machine learning data processing method - Google Patents

Industrial internet multi-modal machine learning data processing method Download PDF

Info

Publication number
CN114662698A
CN114662698A CN202210129788.0A CN202210129788A CN114662698A CN 114662698 A CN114662698 A CN 114662698A CN 202210129788 A CN202210129788 A CN 202210129788A CN 114662698 A CN114662698 A CN 114662698A
Authority
CN
China
Prior art keywords
data
machine learning
industrial internet
processing method
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210129788.0A
Other languages
Chinese (zh)
Inventor
吴斌
王雪峰
刘青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inrich Technology Co ltd
Original Assignee
Nanjing Inrich Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inrich Technology Co ltd filed Critical Nanjing Inrich Technology Co ltd
Priority to CN202210129788.0A priority Critical patent/CN114662698A/en
Publication of CN114662698A publication Critical patent/CN114662698A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an industrial internet multi-modal machine learning data processing method, and relates to the technical field of industrial internet. The industrial internet multi-modal machine learning data processing method comprises the following specific methods: calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated: otherwise: judging whether all data are processed or not; the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed; and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed. The invention is beneficial to selecting different multiple data sources aiming at different scenes, effectively saves the system cost, reduces the machine learning model and is convenient for the implementation of edge calculation.

Description

Industrial internet multi-modal machine learning data processing method
Technical Field
The invention relates to the technical field of industrial internet, in particular to an industrial internet multi-mode machine learning data processing method.
Background
In the prior art, after a large number of terminals are introduced in an industrial internet scene, collected data can come from different data sources, for example, if a machine model is to be established whether a transformer substation in a power grid operates normally, temperature and humidity at different times, content of specific gas after transformer oil separation, visible light data (video and images), infrared thermodynamic diagram type data (data shot by a thermal imaging sensor), sound, smell and the like can be collected, and when a plurality of data sources exist, establishing a multi-mode machine learning model by using a plurality of data sources is an existing method for utilizing a related data set. However, how to measure the value of each data source in the model is not studied too much in the prior art, which is not beneficial to selecting different data sources for different scenes, and causes high system cost.
Disclosure of Invention
The technical problem to be solved by the invention is how to measure the value of each data source in the model without much research in the prior art.
In order to solve the technical problems, the invention adopts a technical scheme that: the method for processing the industrial internet multi-modal machine learning data comprises the following specific steps:
calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated:
otherwise: judging whether all data are processed or not;
the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed.
Preferably, the method for clearing the data in the first step so that the data are aligned in time is as follows: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, and obtaining samples through calculation of front and back data if a certain data source at the time point has no data.
Preferably, the specific method of obtaining the sample is as follows: let the horizontal axis be the time axis, X be the sampling time point to be calculated, and the preceding and following data be (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
Figure BDA0003502154470000021
preferably, the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (d) (. mu.)yIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, said XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xtx)(Yty)T]Said X istAnd YtHas a correlation coefficient of Cor (X)t,Yt),
Figure BDA0003502154470000022
Preferably, the method for determining the threshold related to the two data sets comprises: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
Preferably, the method for selecting a suitable data set as modeling data in the first step is as follows: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
The invention has the following beneficial effects:
according to the invention, through establishing a proper multi-mode machine learning model, different multiple data sources can be selected for different scenes, the system cost is effectively saved, and meanwhile, the machine learning model is reduced, and the implementation of edge calculation is facilitated.
Drawings
Fig. 1 is a flowchart of an industrial internet multimodal machine learning data processing method according to the present invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the present invention more comprehensible to those skilled in the art, and will thus provide a clear and concise definition of the scope of the present invention.
Referring to fig. 1, the industrial internet multimodal machine learning data processing method includes the following specific methods:
calculating the correlation between every two multi-modal data sets, firstly cleaning the data, and aligning the data in time as follows: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, if a certain data source at the time point has no data, obtaining samples through front and back data calculation, wherein the specific method for obtaining the samples comprises the following steps: let the horizontal axis be the time axis, X be the sampling time point to be calculated, and the preceding and following data be (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
Figure BDA0003502154470000031
so that the data are aligned in time, and whether the two data sets are related is judged:
otherwise: judging whether all data are processed or not;
comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
step two, restarting the step one if all the data are not processed, and establishing a proper multi-modal machine learning model if all the data are processed;
the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (d) (. mu.)yIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xtx)(Yty)T],XtAnd YtHas a correlation coefficient of Cor (X)t,Yt),
Figure BDA0003502154470000032
The method for selecting a proper data set as modeling data in the first step comprises the following steps: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
The method for determining the threshold related to the two data sets comprises the following steps: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (7)

1. The industrial internet multi-modal machine learning data processing method is characterized by comprising the following specific methods:
calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated:
otherwise: judging whether all data are processed or not;
the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed.
2. The industrial internet multimodal machine learning data processing method as claimed in claim 1, wherein in the first step, the data needs to be cleaned, so that the data are aligned in time by the following method: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, and obtaining samples through calculation of front and back data if a certain data source at the time point has no data.
3. The industrial internet multimodal machine learning data processing method according to claim 2, wherein the specific method of obtaining samples is as follows: let the horizontal axis be the time axisX is the sampling time point to be calculated, and the preceding and following data are (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
Figure FDA0003502154460000011
4. the industrial internet multimodal machine learning data processing method according to claim 1, wherein the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (a), muyIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, said XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xtx)(Yty)T]Said X istAnd YtHas a correlation coefficient of Cor (X)t,Yt),
Figure FDA0003502154460000012
5. The industrial internet multimodal machine learning data processing method according to claim 1, wherein the threshold related to the two data sets is determined by: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
6. The industrial internet multi-modal machine learning data processing method as claimed in claim 1, wherein the method of selecting a suitable data set as modeling data in the first step is: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
7. The industrial internet multimodal machine learning data processing method according to claim 1, wherein the specific method for establishing the suitable multimodal machine learning model is as follows:
when selecting data sets pairwise, selecting the data set with a large data amount, finally obtaining all data, and retraining a model by using the cleaned data, if the model is established during selection, using the model corresponding to the selected data for further use, for example, connecting all the independent models corresponding to the selected data in parallel for use, or directly outputting all the models trained by the selected data for use.
CN202210129788.0A 2022-02-11 2022-02-11 Industrial internet multi-modal machine learning data processing method Pending CN114662698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210129788.0A CN114662698A (en) 2022-02-11 2022-02-11 Industrial internet multi-modal machine learning data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210129788.0A CN114662698A (en) 2022-02-11 2022-02-11 Industrial internet multi-modal machine learning data processing method

Publications (1)

Publication Number Publication Date
CN114662698A true CN114662698A (en) 2022-06-24

Family

ID=82027930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210129788.0A Pending CN114662698A (en) 2022-02-11 2022-02-11 Industrial internet multi-modal machine learning data processing method

Country Status (1)

Country Link
CN (1) CN114662698A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610696A (en) * 2023-03-08 2024-02-27 西北工业大学 A method for runoff prediction across data sets using different attributes

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110333995A (en) * 2019-07-09 2019-10-15 英赛克科技(北京)有限公司 The method and device that operation of industrial installation is monitored
CN110795463A (en) * 2019-06-27 2020-02-14 浙江大学 Mass time series data visualization method for transient analysis of power system
CN111338302A (en) * 2020-02-28 2020-06-26 合肥力拓云计算科技有限公司 Chemical process modeling processing system based on industrial big data and industrial Internet of things
CN111970351A (en) * 2020-08-11 2020-11-20 震坤行工业超市(上海)有限公司 Data alignment-based multi-dimensional sensing optimization method and system for Internet of things
CN112131182A (en) * 2020-08-14 2020-12-25 陕西千山航空电子有限责任公司 Rapid alignment processing method for packet mining type flight parameter data
CN112149702A (en) * 2019-06-28 2020-12-29 北京百度网讯科技有限公司 Feature processing method and device
CN112198857A (en) * 2020-12-08 2021-01-08 浙江中自庆安新能源技术有限公司 Industrial equipment control optimization method and system based on monitoring data
CN112309571A (en) * 2020-10-30 2021-02-02 电子科技大学 A Screening Method for Prognostic Quantitative Features of Digital Pathology Images
US20210096541A1 (en) * 2019-09-30 2021-04-01 Rockwell Automation Technologies, Inc. Contextualization of industrial data at the device level
CN113113140A (en) * 2021-04-02 2021-07-13 中山大学 Diabetes early warning method, system, equipment and storage medium based on self-supervision DNN
CN113741358A (en) * 2021-08-04 2021-12-03 合肥力拓云计算科技有限公司 Compound fertilizer nutrient control method based on industrial digital intelligent prediction

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795463A (en) * 2019-06-27 2020-02-14 浙江大学 Mass time series data visualization method for transient analysis of power system
CN112149702A (en) * 2019-06-28 2020-12-29 北京百度网讯科技有限公司 Feature processing method and device
CN110333995A (en) * 2019-07-09 2019-10-15 英赛克科技(北京)有限公司 The method and device that operation of industrial installation is monitored
US20210096541A1 (en) * 2019-09-30 2021-04-01 Rockwell Automation Technologies, Inc. Contextualization of industrial data at the device level
CN111338302A (en) * 2020-02-28 2020-06-26 合肥力拓云计算科技有限公司 Chemical process modeling processing system based on industrial big data and industrial Internet of things
CN111970351A (en) * 2020-08-11 2020-11-20 震坤行工业超市(上海)有限公司 Data alignment-based multi-dimensional sensing optimization method and system for Internet of things
CN112131182A (en) * 2020-08-14 2020-12-25 陕西千山航空电子有限责任公司 Rapid alignment processing method for packet mining type flight parameter data
CN112309571A (en) * 2020-10-30 2021-02-02 电子科技大学 A Screening Method for Prognostic Quantitative Features of Digital Pathology Images
CN112198857A (en) * 2020-12-08 2021-01-08 浙江中自庆安新能源技术有限公司 Industrial equipment control optimization method and system based on monitoring data
CN113113140A (en) * 2021-04-02 2021-07-13 中山大学 Diabetes early warning method, system, equipment and storage medium based on self-supervision DNN
CN113741358A (en) * 2021-08-04 2021-12-03 合肥力拓云计算科技有限公司 Compound fertilizer nutrient control method based on industrial digital intelligent prediction

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610696A (en) * 2023-03-08 2024-02-27 西北工业大学 A method for runoff prediction across data sets using different attributes

Similar Documents

Publication Publication Date Title
US12056210B2 (en) AI-based pre-training model determination system, and AI-based vision inspection management system using same for product production lines
CN106295807B (en) A kind of method and device of information processing
CN107945210B (en) Target tracking method based on deep learning and environment self-adaption
CN110826644B (en) Distributed power supply time sequence joint output typical scene generation method based on Copula function
CN109145114B (en) Social network event detection method based on Kleinberg online state machine
CN117911877B (en) A method for identifying building communication optical cable faults based on machine vision
CN112508946A (en) Cable tunnel abnormity detection method based on antagonistic neural network
CN114662698A (en) Industrial internet multi-modal machine learning data processing method
CN117892175A (en) SNN multi-mode target identification method, system, equipment and medium
CN112786028A (en) Acoustic model processing method, device, equipment and readable storage medium
CN112364747A (en) Target detection method under limited sample
CN110070023B (en) Self-supervision learning method and device based on motion sequential regression
CN115797808A (en) Unmanned aerial vehicle inspection defect image identification method, system, device and medium
CN119202212A (en) A debiasing approach for visual question answering based on dual counterfactuals
CN118692142A (en) Method and device for detecting sequential motion
CN114140843B (en) Cross-database expression recognition method based on sample self-repairing
CN110988803A (en) A Radar Radiation Source Individual Identification System with High Confidence Dynamic Adjustment
CN116680414B (en) A knowledge graph prediction method based on attention mechanism
CN113033282A (en) Image recognition method, device and medium based on small object detection
CN112819079A (en) Model sampling algorithm matching method and device and electronic equipment
CN115063591B (en) RGB image semantic segmentation method and device based on edge measurement relation
CN113821642B (en) Method and system for cleaning text based on GAN clustering
Li et al. An Improved YOLO-v4 Algorithm for Recognition and Detection of Underwater Small Targets
CN115471893B (en) Face recognition model training, face recognition method and device
TWI884050B (en) Method and system for collecting negative samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination