CN114662698A - Industrial internet multi-modal machine learning data processing method - Google Patents
Industrial internet multi-modal machine learning data processing method Download PDFInfo
- Publication number
- CN114662698A CN114662698A CN202210129788.0A CN202210129788A CN114662698A CN 114662698 A CN114662698 A CN 114662698A CN 202210129788 A CN202210129788 A CN 202210129788A CN 114662698 A CN114662698 A CN 114662698A
- Authority
- CN
- China
- Prior art keywords
- data
- machine learning
- industrial internet
- processing method
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 27
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 26
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000004140 cleaning Methods 0.000 claims abstract description 7
- 230000002596 correlated effect Effects 0.000 claims abstract description 3
- 238000012549 training Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000003066 decision tree Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 claims 2
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000001931 thermography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an industrial internet multi-modal machine learning data processing method, and relates to the technical field of industrial internet. The industrial internet multi-modal machine learning data processing method comprises the following specific methods: calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated: otherwise: judging whether all data are processed or not; the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed; and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed. The invention is beneficial to selecting different multiple data sources aiming at different scenes, effectively saves the system cost, reduces the machine learning model and is convenient for the implementation of edge calculation.
Description
Technical Field
The invention relates to the technical field of industrial internet, in particular to an industrial internet multi-mode machine learning data processing method.
Background
In the prior art, after a large number of terminals are introduced in an industrial internet scene, collected data can come from different data sources, for example, if a machine model is to be established whether a transformer substation in a power grid operates normally, temperature and humidity at different times, content of specific gas after transformer oil separation, visible light data (video and images), infrared thermodynamic diagram type data (data shot by a thermal imaging sensor), sound, smell and the like can be collected, and when a plurality of data sources exist, establishing a multi-mode machine learning model by using a plurality of data sources is an existing method for utilizing a related data set. However, how to measure the value of each data source in the model is not studied too much in the prior art, which is not beneficial to selecting different data sources for different scenes, and causes high system cost.
Disclosure of Invention
The technical problem to be solved by the invention is how to measure the value of each data source in the model without much research in the prior art.
In order to solve the technical problems, the invention adopts a technical scheme that: the method for processing the industrial internet multi-modal machine learning data comprises the following specific steps:
calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated:
otherwise: judging whether all data are processed or not;
the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed.
Preferably, the method for clearing the data in the first step so that the data are aligned in time is as follows: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, and obtaining samples through calculation of front and back data if a certain data source at the time point has no data.
Preferably, the specific method of obtaining the sample is as follows: let the horizontal axis be the time axis, X be the sampling time point to be calculated, and the preceding and following data be (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
preferably, the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (d) (. mu.)yIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, said XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xt-μx)(Yt-μy)T]Said X istAnd YtHas a correlation coefficient of Cor (X)t,Yt),
Preferably, the method for determining the threshold related to the two data sets comprises: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
Preferably, the method for selecting a suitable data set as modeling data in the first step is as follows: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
The invention has the following beneficial effects:
according to the invention, through establishing a proper multi-mode machine learning model, different multiple data sources can be selected for different scenes, the system cost is effectively saved, and meanwhile, the machine learning model is reduced, and the implementation of edge calculation is facilitated.
Drawings
Fig. 1 is a flowchart of an industrial internet multimodal machine learning data processing method according to the present invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the present invention more comprehensible to those skilled in the art, and will thus provide a clear and concise definition of the scope of the present invention.
Referring to fig. 1, the industrial internet multimodal machine learning data processing method includes the following specific methods:
calculating the correlation between every two multi-modal data sets, firstly cleaning the data, and aligning the data in time as follows: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, if a certain data source at the time point has no data, obtaining samples through front and back data calculation, wherein the specific method for obtaining the samples comprises the following steps: let the horizontal axis be the time axis, X be the sampling time point to be calculated, and the preceding and following data be (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
so that the data are aligned in time, and whether the two data sets are related is judged:
otherwise: judging whether all data are processed or not;
comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
step two, restarting the step one if all the data are not processed, and establishing a proper multi-modal machine learning model if all the data are processed;
the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (d) (. mu.)yIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xt-μx)(Yt-μy)T],XtAnd YtHas a correlation coefficient of Cor (X)t,Yt),
The method for selecting a proper data set as modeling data in the first step comprises the following steps: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
The method for determining the threshold related to the two data sets comprises the following steps: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (7)
1. The industrial internet multi-modal machine learning data processing method is characterized by comprising the following specific methods:
calculating the correlation between every two multi-mode data sets, firstly, cleaning data to align the data in time, and judging whether the two data sets are correlated:
otherwise: judging whether all data are processed or not;
the method comprises the following steps: selecting a proper data set as modeling data, and judging whether all data are processed;
and step two, restarting the step one if all the data are not processed, and establishing a proper multi-mode machine learning model if all the data are processed.
2. The industrial internet multimodal machine learning data processing method as claimed in claim 1, wherein in the first step, the data needs to be cleaned, so that the data are aligned in time by the following method: setting a fixed time interval in the same period of time aiming at the time alignment of all data, taking all data at each time point as cleaning output, and obtaining samples through calculation of front and back data if a certain data source at the time point has no data.
3. The industrial internet multimodal machine learning data processing method according to claim 2, wherein the specific method of obtaining samples is as follows: let the horizontal axis be the time axisX is the sampling time point to be calculated, and the preceding and following data are (X)0,y0),(x1,y1) And the y value calculation formula of the sampling point is as follows:
4. the industrial internet multimodal machine learning data processing method according to claim 1, wherein the correlation is calculated as follows: there are two expression methods for correlation, one is covariance, the other is correlation coefficient, and the correlation coefficient can be regarded as normalized covariance, and let: xtFor the first set of cleaned data, YtFor the second set of cleaned data, μxIs XtMean value of (a), muyIs YtMean value of (a)xIs XtStandard deviation of (a)yIs YtStandard deviation of (E [. cndot.)]To calculate expectation, said XtAnd YtHas a covariance of Cov (X)t,Yt),Cov(Xt,Yt)=E[(Xt-μx)(Yt-μy)T]Said X istAnd YtHas a correlation coefficient of Cor (X)t,Yt),
5. The industrial internet multimodal machine learning data processing method according to claim 1, wherein the threshold related to the two data sets is determined by: the value of the correlation coefficient is between-1 and 1, and as long as the absolute value of the correlation coefficient is greater than the threshold, one of the two data is selected to participate in training the multi-mode model.
6. The industrial internet multi-modal machine learning data processing method as claimed in claim 1, wherein the method of selecting a suitable data set as modeling data in the first step is: the method comprises the steps of utilizing test data to respectively test the contribution of a machine model which participates in training of two data sets to a detection result, and selecting a data set with good performance, wherein the machine model can be formed by independently training the two data sets, or can be formed by independently training the two data sets and other same data, and the machine learning model comprises but is not limited to a decision tree, a random forest, a linear regression, a naive Bayes, a neural network (including a deep learning neural network), a logistic regression and a support vector machine.
7. The industrial internet multimodal machine learning data processing method according to claim 1, wherein the specific method for establishing the suitable multimodal machine learning model is as follows:
when selecting data sets pairwise, selecting the data set with a large data amount, finally obtaining all data, and retraining a model by using the cleaned data, if the model is established during selection, using the model corresponding to the selected data for further use, for example, connecting all the independent models corresponding to the selected data in parallel for use, or directly outputting all the models trained by the selected data for use.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210129788.0A CN114662698A (en) | 2022-02-11 | 2022-02-11 | Industrial internet multi-modal machine learning data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210129788.0A CN114662698A (en) | 2022-02-11 | 2022-02-11 | Industrial internet multi-modal machine learning data processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114662698A true CN114662698A (en) | 2022-06-24 |
Family
ID=82027930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210129788.0A Pending CN114662698A (en) | 2022-02-11 | 2022-02-11 | Industrial internet multi-modal machine learning data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114662698A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117610696A (en) * | 2023-03-08 | 2024-02-27 | 西北工业大学 | Runoff prediction method for crossing data sets by utilizing different attributes |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110333995A (en) * | 2019-07-09 | 2019-10-15 | 英赛克科技(北京)有限公司 | The method and device that operation of industrial installation is monitored |
CN110795463A (en) * | 2019-06-27 | 2020-02-14 | 浙江大学 | Mass time series data visualization method for transient analysis of power system |
CN111338302A (en) * | 2020-02-28 | 2020-06-26 | 合肥力拓云计算科技有限公司 | Chemical process modeling processing system based on industrial big data and industrial Internet of things |
CN111970351A (en) * | 2020-08-11 | 2020-11-20 | 震坤行工业超市(上海)有限公司 | Data alignment-based multi-dimensional sensing optimization method and system for Internet of things |
CN112131182A (en) * | 2020-08-14 | 2020-12-25 | 陕西千山航空电子有限责任公司 | Rapid alignment processing method for packet mining type flight parameter data |
CN112149702A (en) * | 2019-06-28 | 2020-12-29 | 北京百度网讯科技有限公司 | Feature processing method and device |
CN112198857A (en) * | 2020-12-08 | 2021-01-08 | 浙江中自庆安新能源技术有限公司 | Industrial equipment control optimization method and system based on monitoring data |
CN112309571A (en) * | 2020-10-30 | 2021-02-02 | 电子科技大学 | Screening method of prognosis quantitative characteristics of digital pathological image |
US20210096541A1 (en) * | 2019-09-30 | 2021-04-01 | Rockwell Automation Technologies, Inc. | Contextualization of industrial data at the device level |
CN113113140A (en) * | 2021-04-02 | 2021-07-13 | 中山大学 | Diabetes early warning method, system, equipment and storage medium based on self-supervision DNN |
CN113741358A (en) * | 2021-08-04 | 2021-12-03 | 合肥力拓云计算科技有限公司 | Compound fertilizer nutrient control method based on industrial digital intelligent prediction |
-
2022
- 2022-02-11 CN CN202210129788.0A patent/CN114662698A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110795463A (en) * | 2019-06-27 | 2020-02-14 | 浙江大学 | Mass time series data visualization method for transient analysis of power system |
CN112149702A (en) * | 2019-06-28 | 2020-12-29 | 北京百度网讯科技有限公司 | Feature processing method and device |
CN110333995A (en) * | 2019-07-09 | 2019-10-15 | 英赛克科技(北京)有限公司 | The method and device that operation of industrial installation is monitored |
US20210096541A1 (en) * | 2019-09-30 | 2021-04-01 | Rockwell Automation Technologies, Inc. | Contextualization of industrial data at the device level |
CN111338302A (en) * | 2020-02-28 | 2020-06-26 | 合肥力拓云计算科技有限公司 | Chemical process modeling processing system based on industrial big data and industrial Internet of things |
CN111970351A (en) * | 2020-08-11 | 2020-11-20 | 震坤行工业超市(上海)有限公司 | Data alignment-based multi-dimensional sensing optimization method and system for Internet of things |
CN112131182A (en) * | 2020-08-14 | 2020-12-25 | 陕西千山航空电子有限责任公司 | Rapid alignment processing method for packet mining type flight parameter data |
CN112309571A (en) * | 2020-10-30 | 2021-02-02 | 电子科技大学 | Screening method of prognosis quantitative characteristics of digital pathological image |
CN112198857A (en) * | 2020-12-08 | 2021-01-08 | 浙江中自庆安新能源技术有限公司 | Industrial equipment control optimization method and system based on monitoring data |
CN113113140A (en) * | 2021-04-02 | 2021-07-13 | 中山大学 | Diabetes early warning method, system, equipment and storage medium based on self-supervision DNN |
CN113741358A (en) * | 2021-08-04 | 2021-12-03 | 合肥力拓云计算科技有限公司 | Compound fertilizer nutrient control method based on industrial digital intelligent prediction |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117610696A (en) * | 2023-03-08 | 2024-02-27 | 西北工业大学 | Runoff prediction method for crossing data sets by utilizing different attributes |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110880019B (en) | Method for adaptively training target domain classification model through unsupervised domain | |
CN106295807B (en) | A kind of method and device of information processing | |
CN109712105B (en) | Image salient object detection method combining color and depth information | |
US12056210B2 (en) | AI-based pre-training model determination system, and AI-based vision inspection management system using same for product production lines | |
CN107291845A (en) | A kind of film based on trailer recommends method and system | |
CN110210294A (en) | Evaluation method, device, storage medium and the computer equipment of Optimized model | |
CN111401149B (en) | Lightweight video behavior identification method based on long-short-term time domain modeling algorithm | |
CN112364747A (en) | Target detection method under limited sample | |
CN114169968B (en) | Multi-granularity session recommendation method integrating user interest states | |
CN110070023B (en) | Self-supervision learning method and device based on motion sequential regression | |
CN106372083B (en) | A kind of method and system that controversial news clue is found automatically | |
CN105760896A (en) | Corrosion source joint de-noising method for multi-source heterogeneous big data | |
CN113792751A (en) | Cross-domain behavior identification method, device, equipment and readable storage medium | |
CN116664867B (en) | Feature extraction method and device for selecting training samples based on multi-evidence fusion | |
CN114662698A (en) | Industrial internet multi-modal machine learning data processing method | |
CN116935438A (en) | Pedestrian image re-recognition method based on autonomous evolution of model structure | |
CN114627496B (en) | Robust pedestrian re-identification method based on Gaussian process depolarization batch normalization | |
CN115331081A (en) | Image target detection method and device | |
CN112381056B (en) | Cross-domain pedestrian re-identification method and system fusing multiple source domains | |
CN113033282A (en) | Image recognition method, device and medium based on small object detection | |
CN112819079A (en) | Model sampling algorithm matching method and device and electronic equipment | |
CN111860631A (en) | Method for optimizing loss function by adopting error-cause strengthening mode | |
CN113821642B (en) | Method and system for cleaning text based on GAN clustering | |
CN116680414A (en) | Knowledge graph prediction method based on attention mechanism | |
CN113627609B (en) | Network measurement method based on affine transformation and storage device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |