CN114662698A - Industrial internet multi-modal machine learning data processing method - Google Patents

Industrial internet multi-modal machine learning data processing method Download PDF

Info

Publication number
CN114662698A
CN114662698A CN202210129788.0A CN202210129788A CN114662698A CN 114662698 A CN114662698 A CN 114662698A CN 202210129788 A CN202210129788 A CN 202210129788A CN 114662698 A CN114662698 A CN 114662698A
Authority
CN
China
Prior art keywords
data
machine learning
industrial internet
processing method
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210129788.0A
Other languages
Chinese (zh)
Inventor
吴斌
王雪峰
刘青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inrich Technology Co ltd
Original Assignee
Nanjing Inrich Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inrich Technology Co ltd filed Critical Nanjing Inrich Technology Co ltd
Priority to CN202210129788.0A priority Critical patent/CN114662698A/en
Publication of CN114662698A publication Critical patent/CN114662698A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an industrial internet multi-modal machine learning data processing method, and relates to the technical field of industrial internet. The industrial internet multi-modal machine learning data processing method comprises the following specific methods: calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated: otherwise: judging whether all data are processed or not; the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed; and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed. The invention is beneficial to selecting different multiple data sources aiming at different scenes, effectively saves the system cost, reduces the machine learning model and is convenient for the implementation of edge calculation.

Description

Industrial internet multi-modal machine learning data processing method
Technical Field
The invention relates to the technical field of industrial internet, in particular to an industrial internet multi-mode machine learning data processing method.
Background
In the prior art, after a large number of terminals are introduced in an industrial internet scene, collected data can come from different data sources, for example, if a machine model is to be established whether a transformer substation in a power grid operates normally, temperature and humidity at different times, content of specific gas after transformer oil separation, visible light data (video and images), infrared thermodynamic diagram type data (data shot by a thermal imaging sensor), sound, smell and the like can be collected, and when a plurality of data sources exist, establishing a multi-mode machine learning model by using a plurality of data sources is an existing method for utilizing a related data set. However, how to measure the value of each data source in the model is not studied too much in the prior art, which is not beneficial to selecting different data sources for different scenes, and causes high system cost.
Disclosure of Invention
The technical problem to be solved by the invention is how to measure the value of each data source in the model without much research in the prior art.
In order to solve the technical problems, the invention adopts a technical scheme that: the method for processing the industrial internet multi-modal machine learning data comprises the following specific steps:
calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated:
otherwise: judging whether all data are processed or not;
the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed.
Preferably, the method for clearing the data in the first step so that the data are aligned in time is as follows: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, and obtaining samples through calculation of front and back data if a certain data source at the time point has no data.
Preferably, the specific method of obtaining the sample is as follows: let the horizontal axis be the time axis, X be the sampling time point to be calculated, and the preceding and following data be (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
Figure BDA0003502154470000021
preferably, the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (d) (. mu.)yIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, said XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xtx)(Yty)T]Said X istAnd YtHas a correlation coefficient of Cor (X)t,Yt),
Figure BDA0003502154470000022
Preferably, the method for determining the threshold related to the two data sets comprises: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
Preferably, the method for selecting a suitable data set as modeling data in the first step is as follows: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
The invention has the following beneficial effects:
according to the invention, through establishing a proper multi-mode machine learning model, different multiple data sources can be selected for different scenes, the system cost is effectively saved, and meanwhile, the machine learning model is reduced, and the implementation of edge calculation is facilitated.
Drawings
Fig. 1 is a flowchart of an industrial internet multimodal machine learning data processing method according to the present invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the present invention more comprehensible to those skilled in the art, and will thus provide a clear and concise definition of the scope of the present invention.
Referring to fig. 1, the industrial internet multimodal machine learning data processing method includes the following specific methods:
calculating the correlation between every two multi-modal data sets, firstly cleaning the data, and aligning the data in time as follows: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, if a certain data source at the time point has no data, obtaining samples through front and back data calculation, wherein the specific method for obtaining the samples comprises the following steps: let the horizontal axis be the time axis, X be the sampling time point to be calculated, and the preceding and following data be (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
Figure BDA0003502154470000031
so that the data are aligned in time, and whether the two data sets are related is judged:
otherwise: judging whether all data are processed or not;
comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
step two, restarting the step one if all the data are not processed, and establishing a proper multi-modal machine learning model if all the data are processed;
the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (d) (. mu.)yIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xtx)(Yty)T],XtAnd YtHas a correlation coefficient of Cor (X)t,Yt),
Figure BDA0003502154470000032
The method for selecting a proper data set as modeling data in the first step comprises the following steps: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
The method for determining the threshold related to the two data sets comprises the following steps: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (7)

1. The industrial internet multi-modal machine learning data processing method is characterized by comprising the following specific methods:
calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated:
otherwise: judging whether all data are processed or not;
the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed.
2. The industrial internet multimodal machine learning data processing method as claimed in claim 1, wherein in the first step, the data needs to be cleaned, so that the data are aligned in time by the following method: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, and obtaining samples through calculation of front and back data if a certain data source at the time point has no data.
3. The industrial internet multimodal machine learning data processing method according to claim 2, wherein the specific method of obtaining samples is as follows: let the horizontal axis be the time axisX is the sampling time point to be calculated, and the preceding and following data are (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
Figure FDA0003502154460000011
4. the industrial internet multimodal machine learning data processing method according to claim 1, wherein the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (a), muyIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, said XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xtx)(Yty)T]Said X istAnd YtHas a correlation coefficient of Cor (X)t,Yt),
Figure FDA0003502154460000012
5. The industrial internet multimodal machine learning data processing method according to claim 1, wherein the threshold related to the two data sets is determined by: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
6. The industrial internet multi-modal machine learning data processing method as claimed in claim 1, wherein the method of selecting a suitable data set as modeling data in the first step is: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
7. The industrial internet multimodal machine learning data processing method according to claim 1, wherein the specific method for establishing the suitable multimodal machine learning model is as follows:
when selecting data sets pairwise, selecting the data set with a large data amount, finally obtaining all data, and retraining a model by using the cleaned data, if the model is established during selection, using the model corresponding to the selected data for further use, for example, connecting all the independent models corresponding to the selected data in parallel for use, or directly outputting all the models trained by the selected data for use.
CN202210129788.0A 2022-02-11 2022-02-11 Industrial internet multi-modal machine learning data processing method Pending CN114662698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210129788.0A CN114662698A (en) 2022-02-11 2022-02-11 Industrial internet multi-modal machine learning data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210129788.0A CN114662698A (en) 2022-02-11 2022-02-11 Industrial internet multi-modal machine learning data processing method

Publications (1)

Publication Number Publication Date
CN114662698A true CN114662698A (en) 2022-06-24

Family

ID=82027930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210129788.0A Pending CN114662698A (en) 2022-02-11 2022-02-11 Industrial internet multi-modal machine learning data processing method

Country Status (1)

Country Link
CN (1) CN114662698A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610696A (en) * 2023-03-08 2024-02-27 西北工业大学 Runoff prediction method for crossing data sets by utilizing different attributes

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110333995A (en) * 2019-07-09 2019-10-15 英赛克科技(北京)有限公司 The method and device that operation of industrial installation is monitored
CN110795463A (en) * 2019-06-27 2020-02-14 浙江大学 Mass time series data visualization method for transient analysis of power system
CN111338302A (en) * 2020-02-28 2020-06-26 合肥力拓云计算科技有限公司 Chemical process modeling processing system based on industrial big data and industrial Internet of things
CN111970351A (en) * 2020-08-11 2020-11-20 震坤行工业超市(上海)有限公司 Data alignment-based multi-dimensional sensing optimization method and system for Internet of things
CN112131182A (en) * 2020-08-14 2020-12-25 陕西千山航空电子有限责任公司 Rapid alignment processing method for packet mining type flight parameter data
CN112149702A (en) * 2019-06-28 2020-12-29 北京百度网讯科技有限公司 Feature processing method and device
CN112198857A (en) * 2020-12-08 2021-01-08 浙江中自庆安新能源技术有限公司 Industrial equipment control optimization method and system based on monitoring data
CN112309571A (en) * 2020-10-30 2021-02-02 电子科技大学 Screening method of prognosis quantitative characteristics of digital pathological image
US20210096541A1 (en) * 2019-09-30 2021-04-01 Rockwell Automation Technologies, Inc. Contextualization of industrial data at the device level
CN113113140A (en) * 2021-04-02 2021-07-13 中山大学 Diabetes early warning method, system, equipment and storage medium based on self-supervision DNN
CN113741358A (en) * 2021-08-04 2021-12-03 合肥力拓云计算科技有限公司 Compound fertilizer nutrient control method based on industrial digital intelligent prediction

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795463A (en) * 2019-06-27 2020-02-14 浙江大学 Mass time series data visualization method for transient analysis of power system
CN112149702A (en) * 2019-06-28 2020-12-29 北京百度网讯科技有限公司 Feature processing method and device
CN110333995A (en) * 2019-07-09 2019-10-15 英赛克科技(北京)有限公司 The method and device that operation of industrial installation is monitored
US20210096541A1 (en) * 2019-09-30 2021-04-01 Rockwell Automation Technologies, Inc. Contextualization of industrial data at the device level
CN111338302A (en) * 2020-02-28 2020-06-26 合肥力拓云计算科技有限公司 Chemical process modeling processing system based on industrial big data and industrial Internet of things
CN111970351A (en) * 2020-08-11 2020-11-20 震坤行工业超市(上海)有限公司 Data alignment-based multi-dimensional sensing optimization method and system for Internet of things
CN112131182A (en) * 2020-08-14 2020-12-25 陕西千山航空电子有限责任公司 Rapid alignment processing method for packet mining type flight parameter data
CN112309571A (en) * 2020-10-30 2021-02-02 电子科技大学 Screening method of prognosis quantitative characteristics of digital pathological image
CN112198857A (en) * 2020-12-08 2021-01-08 浙江中自庆安新能源技术有限公司 Industrial equipment control optimization method and system based on monitoring data
CN113113140A (en) * 2021-04-02 2021-07-13 中山大学 Diabetes early warning method, system, equipment and storage medium based on self-supervision DNN
CN113741358A (en) * 2021-08-04 2021-12-03 合肥力拓云计算科技有限公司 Compound fertilizer nutrient control method based on industrial digital intelligent prediction

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610696A (en) * 2023-03-08 2024-02-27 西北工业大学 Runoff prediction method for crossing data sets by utilizing different attributes

Similar Documents

Publication Publication Date Title
CN110880019B (en) Method for adaptively training target domain classification model through unsupervised domain
CN106295807B (en) A kind of method and device of information processing
CN109712105B (en) Image salient object detection method combining color and depth information
US12056210B2 (en) AI-based pre-training model determination system, and AI-based vision inspection management system using same for product production lines
CN107291845A (en) A kind of film based on trailer recommends method and system
CN110210294A (en) Evaluation method, device, storage medium and the computer equipment of Optimized model
CN111401149B (en) Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN112364747A (en) Target detection method under limited sample
CN114169968B (en) Multi-granularity session recommendation method integrating user interest states
CN110070023B (en) Self-supervision learning method and device based on motion sequential regression
CN106372083B (en) A kind of method and system that controversial news clue is found automatically
CN105760896A (en) Corrosion source joint de-noising method for multi-source heterogeneous big data
CN113792751A (en) Cross-domain behavior identification method, device, equipment and readable storage medium
CN116664867B (en) Feature extraction method and device for selecting training samples based on multi-evidence fusion
CN114662698A (en) Industrial internet multi-modal machine learning data processing method
CN116935438A (en) Pedestrian image re-recognition method based on autonomous evolution of model structure
CN114627496B (en) Robust pedestrian re-identification method based on Gaussian process depolarization batch normalization
CN115331081A (en) Image target detection method and device
CN112381056B (en) Cross-domain pedestrian re-identification method and system fusing multiple source domains
CN113033282A (en) Image recognition method, device and medium based on small object detection
CN112819079A (en) Model sampling algorithm matching method and device and electronic equipment
CN111860631A (en) Method for optimizing loss function by adopting error-cause strengthening mode
CN113821642B (en) Method and system for cleaning text based on GAN clustering
CN116680414A (en) Knowledge graph prediction method based on attention mechanism
CN113627609B (en) Network measurement method based on affine transformation and storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination