CN111126661B - Self-help modeling method and system based on data analysis platform - Google Patents
Self-help modeling method and system based on data analysis platform Download PDFInfo
- Publication number
- CN111126661B CN111126661B CN201911150485.1A CN201911150485A CN111126661B CN 111126661 B CN111126661 B CN 111126661B CN 201911150485 A CN201911150485 A CN 201911150485A CN 111126661 B CN111126661 B CN 111126661B
- Authority
- CN
- China
- Prior art keywords
- data
- processing
- modeling
- processed
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012545 processing Methods 0.000 claims abstract description 195
- 238000004458 analytical method Methods 0.000 claims abstract description 32
- 238000005065 mining Methods 0.000 claims abstract description 17
- 230000005540 biological transmission Effects 0.000 claims description 40
- 238000005457 optimization Methods 0.000 claims description 37
- 238000003062 neural network model Methods 0.000 claims description 35
- 230000010354 integration Effects 0.000 claims description 29
- 230000008569 process Effects 0.000 claims description 19
- 238000007619 statistical method Methods 0.000 claims description 13
- 230000002159 abnormal effect Effects 0.000 claims description 12
- 238000012800 visualization Methods 0.000 claims description 11
- 238000004806 packaging method and process Methods 0.000 claims description 9
- 230000000007 visual effect Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000013135 deep learning Methods 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 4
- 238000005538 encapsulation Methods 0.000 claims description 2
- 238000004519 manufacturing process Methods 0.000 abstract description 43
- 238000012544 monitoring process Methods 0.000 abstract description 7
- 238000011161 development Methods 0.000 abstract description 4
- 238000003860 storage Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Manufacturing & Machinery (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a self-service modeling method and a self-service modeling system based on a data analysis platform, which are used for carrying out corresponding processing on manufacturing data from the aspects of data development, data analysis and data service application, so that the manufacturing data are converted into different analysis models, and automatic analysis, prediction and monitoring of the manufacturing data are realized, thereby being convenient for improving the mining value of the manufacturing data.
Description
Technical Field
The invention relates to the technical field of manufacturing industry data analysis, in particular to a self-help modeling method and system based on a data analysis platform.
Background
In order to enable the data information to be fed back to be used as high-value data for improving the production and manufacturing capacity, the data information needs to be analyzed and processed, so that information processing systems for analyzing and modeling different types of manufacturing data have been developed in the prior art, but the information processing systems have no data relevance, so that corresponding resource integration and same processing treatment cannot be performed on all manufacturing data; in addition, the analysis and the value mining processing of the manufacturing data are also realized based on the data access, the data summarization, the modeling data analysis and other works of the data analysis modeling personnel, and the manufacturing data have huge volumes and complicated dimensional structures, so that the data analysis modeling personnel can develop a data analysis model suitable for a certain manufacturing scene only by spending more manpower and material resources, and the data analysis model is not suitable for other manufacturing scenes, so that the universality of the data analysis model is seriously reduced.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a self-help modeling method and a self-help modeling system based on a data analysis platform, which carry out corresponding processing on manufacturing data from the three aspects of data development, data analysis and data service application, so as to convert the manufacturing data into a model suitable for different analysis and realize automatic analysis, prediction and monitoring of the manufacturing data, thereby being convenient for improving the mining value of the manufacturing data; in addition, the method and the system have the following advantages: the first, the method and the system are convenient for the corresponding access, storage and management operation of the data processing and developing personnel to the manufacturing data, so as to integrate the different types of manufacturing data into the data meeting the corresponding requirements, and also endow the different types of manufacturing data with corresponding data business meanings so as to facilitate the subsequent data analysis; secondly, the method and the system can also perform adaptive self-service modeling, optimizing, predicting and evaluating operations aiming at manufacturing line analysts so as to realize optimizing analysis modeling of manufacturing industry data, thereby selecting an optimal model to be released into a platform to execute real-time prediction; thirdly, the method and the system can also carry out real-time data analysis result statistics and self-service generation of statistical reports aiming at the manager, thereby realizing visual check and real-time monitoring of manufacturing data.
The invention provides a self-help modeling method based on a data analysis platform, which is characterized by comprising the following steps of:
step S1, pre-integrating source data to convert the source data into a data set with a preset data structure;
step S2, self-service modeling processing and/or modeling optimization processing are carried out on the data to be processed in the data set, so that a modeling prediction result of the data to be processed is obtained;
step S3, generating a visualized data analysis statistical result and/or a statistical analysis report according to the modeling prediction result;
further, in the step S1, pre-integration processing is performed on the source data to convert the source data into a data set having a predetermined data structure, specifically including,
step S101, carrying out access processing on the source data about a preset data access engine, and converting the source data subjected to the access processing into a data queue with a preset structure;
step S102, scheduling the data queue to generate a plurality of metadata sets about the source data;
step S103, constructing a data warehouse related to the metadata sets, and performing data pushing processing on the metadata in the data warehouse;
further, in the step S101, the access processing with respect to the predetermined data access engine is performed on the source data, and the conversion of the source data subjected to the access processing into a data queue having a predetermined structure specifically includes,
after the access processing of the Spark engine or the flank engine is carried out on the source data, the source data is converted into a Kafka data queue;
or,
in the step S102, the data queue is subjected to a scheduling process to generate a plurality of metadata sets about the source data, specifically including,
performing a Yarn scheduling process on the data queue to generate a number of metadata sets about the source data;
or,
in the step S103, a data warehouse concerning the plurality of metadata sets is constructed, and the data pushing process for the plurality of metadata in the data warehouse specifically includes,
constructing a data warehouse related to the metadata set, performing label processing and data business meaning conversion processing on metadata types on the data warehouse, and performing data pushing processing on the metadata so as to push the metadata to a corresponding data analysis interface;
further, in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set, so as to obtain a modeling prediction result about the data to be processed specifically includes,
step S201A, obtaining characteristic information about a neural network model through a deep learning mode, and constructing the neural network model about the data to be processed according to the characteristic information;
step S202A, performing distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
step 203A, performing parameter configuration processing and model prediction processing on the neural network model to obtain the modeling prediction result of the data to be processed;
or,
in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed, and the self-adaptive transmission processing is performed on the data to be processed according to the data set and the network state, which specifically includes,
step S201B, acquiring the data to be processed, determining a transmission coefficient of the data to be processed according to the following formula (1),
in the above formula (1), pw is the transmission coefficient, pp is a data transmission stability coefficient in the self-help modeling and/or modeling optimization process of the preset data to be processed, N is the data of the data packet contained in the data to be processed, and Pa k For the packet loss probability of the kth data packet in the data to be processed, the Pk is the data transmission fluxThe lane bandwidth exp is an exponential function based on a natural constant e, k=1, 2, 3, …, N;
step S202B, according to the following formula (2), determining the data noise coefficient of the data to be processed,
in the above formula (2), md is the data noise figure, infinity is an infinite mathematical character, size is the size of the data to be processed,as a function exp (Pw log) of the parameters fx and yx containing the independent variables 2 1+size yx * ) 1+fx is integrated twice, and the first integration is integrated with fx as the integrated parameter and with the integrated lower limit of +.>The upper limit of the product is ≡, the parameter of the second integration is yx, the lower limit of the product is 0, and the upper limit of the product is ≡;
step S203B of determining a transmission speed of the data to be processed according to the following formula (3),
in the above formula (3), rt is the transport speed, ks is a preset minimum transport speed, pi is a path maximum transport speed,rounding the data, and taking a zero value when the data is negative;
step S204B, transmitting the data to be processed according to the transmission speed, and performing self-service modeling processing and/or modeling optimization processing;
further, in said step S3, generating visualized data analysis statistics and/or statistics analysis reports specifically comprises,
step S301, performing abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
step S302, determining analysis statistical results about the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
and step S303, carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visualized data analysis statistical result and/or a statistical analysis report.
The invention also provides a self-service modeling system based on the data analysis platform, which is characterized in that:
the self-service modeling system based on the data analysis platform comprises a pre-integration processing module, an automatic modeling/modeling optimization module and a visual result generation module; wherein,
the pre-integration processing module is used for carrying out pre-integration processing on source data so as to convert the source data into a data set with a preset data structure;
the automatic modeling/modeling optimization module is used for performing self-service modeling processing and/or modeling optimization processing on the data to be processed in the data set so as to obtain a modeling prediction result about the data to be processed;
the visualized result generation module is used for generating visualized data analysis statistical results and/or statistical analysis reports according to the modeling prediction results;
further, the pre-integration processing module comprises an access sub-module, a queue generating sub-module, a scheduling sub-module and a pushing sub-module; wherein,
the access sub-module is used for carrying out access processing on the source data about a predetermined data access engine;
the queue generating sub-module is used for converting the source data subjected to the access processing into a data queue with a preset structure;
the scheduling sub-module is used for performing scheduling processing on the data queue to generate a plurality of metadata sets about the source data;
the pushing sub-module is used for carrying out data pushing processing on a data bin formed by constructing the plurality of metadata sets;
further, the access sub-module is used for performing access processing on the source data about a Spark engine or a flank engine;
or,
the queue generating sub-module is used for converting the source data subjected to the access processing into a Kafka data queue;
or,
the scheduling sub-module is used for carrying out Yarn scheduling processing on the data queue so as to generate a plurality of metadata sets about the source data;
or,
the pushing sub-module is used for carrying out label processing and data service meaning conversion processing on metadata types on the data warehouse before carrying out the data pushing processing, and then pushing the metadata to corresponding data analysis interfaces;
further, the automatic modeling/modeling optimization module comprises a model feature acquisition sub-module, a model construction sub-module, a deployment/tuning sub-module and a model configuration/prediction sub-module; wherein,
the model feature acquisition sub-module is used for acquiring feature information about the neural network model through a deep learning mode;
the model construction submodule is used for constructing a neural network model related to the data to be processed according to the characteristic information;
the deployment/tuning submodule is used for carrying out distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
the model configuration/prediction submodule is used for carrying out parameter configuration processing and model prediction processing on the neural network model so as to obtain the modeling prediction result of the data to be processed;
further, the visualized result generation module comprises a predicted result processing sub-module, an analysis and statistics sub-module and a visualized/packaged sub-module; wherein,
the prediction result processing sub-module is used for carrying out abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
the analysis and statistics sub-module is used for determining analysis and statistics results related to the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
the visualization/packaging submodule is used for carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visual data analysis statistical result and/or a statistical analysis report.
Compared with the prior art, the self-service modeling method and the system based on the data analysis platform correspondingly process the manufacturing data from the aspects of data development, data analysis and data service application, so that the manufacturing data is converted into the data suitable for different analysis models, and the automatic analysis, prediction and monitoring of the manufacturing data are realized, so that the mining value of the manufacturing data is improved; in addition, the method and the system have the following advantages: the first, the method and the system are convenient for the corresponding access, storage and management operation of the data processing and developing personnel to the manufacturing data, so as to integrate the different types of manufacturing data into the data meeting the corresponding requirements, and also endow the different types of manufacturing data with corresponding data business meanings so as to facilitate the subsequent data analysis; secondly, the method and the system can also perform adaptive self-service modeling, optimizing, predicting and evaluating operations aiming at manufacturing line analysts so as to realize optimizing analysis modeling of manufacturing industry data, thereby selecting an optimal model to be released into a platform to execute real-time prediction; thirdly, the method and the system can also carry out real-time data analysis result statistics and self-service generation of statistical reports aiming at the manager, thereby realizing visual check and real-time monitoring of manufacturing data.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a self-service modeling method based on a data analysis platform.
Fig. 2 is a schematic structural diagram of a self-service modeling system based on a data analysis platform.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flow chart of a self-service modeling method based on a data analysis platform according to an embodiment of the present invention is shown. The self-help modeling method based on the data analysis platform comprises the following steps:
step S1, pre-integration processing is performed on the source data to convert the source data into a data set with a predetermined data structure.
Preferably, in this step S1, a pre-integration process is performed on the source data to convert the source data into a data set having a predetermined data structure, specifically including,
step S101, carrying out access processing on the source data about a predetermined data access engine, and converting the source data subjected to the access processing into a data queue with a predetermined structure;
step S102, scheduling the data queue to generate a plurality of metadata sets about the source data;
step S103, a data warehouse related to the metadata sets is constructed, and data pushing processing is performed on the metadata in the data warehouse.
Preferably, in the step S101, the access processing with respect to the predetermined data access engine is performed on the source data, and converting the source data subjected to the access processing into a data queue having a predetermined structure specifically includes,
after the access processing on the Spark engine or the flank engine is performed on the source data, the source data is converted into a Kafka data queue.
Preferably, in the step S102, the data queue is subjected to a scheduling process to generate a plurality of metadata sets about the source data, specifically including,
the data queue is Yarn scheduled to generate sets of metadata about the source data.
Preferably, in this step S103, a data warehouse is constructed for the several metadata sets, and the data pushing process for the several metadata in the data warehouse specifically includes,
constructing a data warehouse related to the metadata set, performing label processing and data business meaning conversion processing on metadata types on the data warehouse, and performing data pushing processing on the metadata so as to push the metadata to a corresponding data analysis interface.
And S2, performing self-service modeling processing and/or modeling optimization processing on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed.
Preferably, in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set, so as to obtain a modeling prediction result about the data to be processed specifically includes,
step S201A, obtaining characteristic information about a neural network model through a deep learning mode, and constructing the neural network model about the data to be processed according to the characteristic information;
step S202A, performing distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
in step S203A, a parameter configuration process and a model prediction process are performed on the neural network model to obtain the modeling prediction result of the data to be processed.
Preferably, in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed, and the self-adaptive transmission processing is further performed on the data to be processed according to the data set and the network state, which specifically includes,
step S201B, acquiring the data to be processed, determining a transmission coefficient of the data to be processed according to the following formula (1),
in the above formula (1), pw is the transmission coefficient, pp is a data transmission stability coefficient in the self-help modeling and/or modeling optimization process of the preset data to be processed, where the data transmission stability coefficient Pp generally takes a value of 0.3, where Pa is the data of the data packet contained in the data to be processed k For the packet loss probability of the kth data packet in the data to be processed, pk is the bandwidth of a data transmission channel, exp is an exponential function based on a natural constant e, and k=1, 2, 3, … and N;
step S202B, determining the data noise figure of the data to be processed according to the following formula (2),
in the above formula (2), md is the data noise figure, infinity is an infinite mathematical character, size is the size of the data to be processed,as a function exp (Pw log) of the parameters fx and yx containing the independent variables 2 1+size yx ) Taking secondary integration by 1+fx, and the first integration has the integrated parameter fx and the integrated lower limit of +.>The upper limit of the product is ≡, the parameter of the second integration is yx, the lower limit of the product is 0, and the upper limit of the product is ≡;
step S203B, determining the transmission speed of the data to be processed according to the following formula (3),
in the above formula (3), rt is the transmission speed, ks is a preset minimum transmission speed, the preset minimum transmission speed Ks generally takes a value of 50K/s, pi is the path maximum transmission speed,rounding the data, and taking a zero value when the data is negative;
step S204B, the data to be processed is transmitted according to the transmission speed, and self-service modeling processing and/or modeling optimization processing are performed.
Through the steps, self-service modeling processing and/or modeling optimization processing can be performed on the data to be processed in the data set, so that the data to be processed can be intelligently controlled to be transmitted and self-service modeling processing and/or modeling optimization processing can be performed in the process of obtaining a modeling prediction result of the data to be processed, the data can be processed according to the data characteristics and the hardware characteristics of the data analysis platform, and the stability of pre-integration processing and the data transmission efficiency can be improved during the self-service modeling processing and/or modeling optimization processing.
And step S3, generating a visualized data analysis statistical result and/or a statistical analysis report according to the modeling prediction result.
Preferably, in this step S3, generating visualized data analysis statistics and/or statistics analysis reports specifically comprises,
step S301, carrying out abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
step S302, determining analysis statistical results about the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
step S303, performing visualization processing and packaging processing on the analysis statistical result to correspondingly generate a transferable visualized data analysis statistical result and/or a statistical analysis report.
Referring to fig. 2, a schematic structural diagram of a self-service modeling system based on a data analysis platform according to an embodiment of the present invention is provided. The self-service modeling system based on the data analysis platform comprises a pre-integration processing module, an automatic modeling/modeling optimization module and a visual result generation module; wherein,
the pre-integration processing module is used for carrying out pre-integration processing on the source data so as to convert the source data into a data set with a preset data structure;
the automatic modeling/modeling optimization module is used for performing self-service modeling processing and/or modeling optimization processing on the data to be processed in the data set so as to obtain a modeling prediction result about the data to be processed;
the visualized result generation module is used for generating visualized data analysis statistical results and/or statistical analysis reports according to the modeling prediction results.
Preferably, the pre-integration processing module comprises an access sub-module, a queue generating sub-module, a scheduling sub-module and a pushing sub-module; wherein,
the access sub-module is used for carrying out access processing on the source data about a predetermined data access engine;
the queue generating sub-module is used for converting the source data subjected to the access processing into a data queue with a preset structure;
the scheduling submodule is used for scheduling the data queue to generate a plurality of metadata sets about the source data;
the pushing sub-module is used for carrying out data pushing processing on a data bin formed by constructing the metadata sets.
Preferably, the access submodule is used for performing access processing on the source data about a Spark engine or a flank engine.
Preferably, the queue generating submodule is configured to convert the source data subjected to the access processing into a Kafka data queue.
Preferably, the scheduling sub-module is configured to perform a Yarn scheduling process on the data queue to generate a number of metadata sets for the source data.
Preferably, the pushing sub-module is configured to perform label processing and data service meaning conversion processing on metadata types on the data warehouse before performing the data pushing processing, and then push the metadata to the corresponding data analysis interface.
Preferably, the automatic modeling/modeling optimization module comprises a model feature acquisition sub-module, a model construction sub-module, a deployment/tuning sub-module and a model configuration/prediction sub-module; wherein,
the model feature acquisition submodule is used for acquiring feature information about a neural network model through a deep learning mode;
the model construction submodule is used for constructing a neural network model related to the data to be processed according to the characteristic information;
the deployment/tuning sub-module is used for carrying out distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
the model configuration/prediction submodule is used for carrying out parameter configuration processing and model prediction processing on the neural network model so as to obtain the modeling prediction result of the data to be processed.
Preferably, the visualized result generating module comprises a predicted result processing sub-module, an analysis and statistics sub-module and a visualization/encapsulation sub-module; wherein,
the prediction result processing submodule is used for carrying out abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
the analysis and statistics submodule is used for determining analysis and statistics results related to the source data according to results of the abnormal state mining processing and/or the data evolution comparison processing;
the visualization/packaging submodule is used for carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visualized data analysis statistical result and/or a statistical analysis report.
As can be seen from the foregoing embodiments, the self-service modeling method and system based on the data analysis platform performs corresponding processing on the manufacturing data from the aspects of data development, data analysis and data service application, so as to convert the manufacturing data into a model suitable for different analysis and implement automatic analysis, prediction and monitoring of the manufacturing data, so as to improve the mining value of the manufacturing data; in addition, the method and the system have the following advantages: the first, the method and the system are convenient for the corresponding access, storage and management operation of the data processing and developing personnel to the manufacturing data, so as to integrate the different types of manufacturing data into the data meeting the corresponding requirements, and also endow the different types of manufacturing data with corresponding data business meanings so as to facilitate the subsequent data analysis; secondly, the method and the system can also perform adaptive self-service modeling, optimizing, predicting and evaluating operations aiming at manufacturing line analysts so as to realize optimizing analysis modeling of manufacturing industry data, thereby selecting an optimal model to be released into a platform to execute real-time prediction; thirdly, the method and the system can also carry out real-time data analysis result statistics and self-service generation of statistical reports aiming at the manager, thereby realizing visual check and real-time monitoring of manufacturing data.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (5)
1. The self-help modeling method based on the data analysis platform is characterized by comprising the following steps of:
step S1, pre-integrating source data to convert the source data into a data set with a preset data structure;
step S2, self-service modeling processing and/or modeling optimization processing are carried out on the data to be processed in the data set, so that a modeling prediction result of the data to be processed is obtained;
step S3, generating a visualized data analysis statistical result and/or a statistical analysis report according to the modeling prediction result;
in said step S1, pre-integration processing is performed on the source data to convert said source data into a data set having a predetermined data structure comprising in particular,
step S101, carrying out access processing on the source data about a preset data access engine, and converting the source data subjected to the access processing into a data queue with a preset structure;
step S102, scheduling the data queue to generate a plurality of metadata sets about the source data;
step S103, constructing a data warehouse related to the metadata sets, and performing data pushing processing on the metadata in the data warehouse;
in the step S101, an access process with respect to a predetermined data access engine is performed on the source data, and converting the source data subjected to the access process into a data queue having a predetermined structure specifically includes,
after the access processing of the Spark engine or the flank engine is carried out on the source data, the source data is converted into a Kafka data queue;
in the step S102, the data queue is subjected to a scheduling process to generate a plurality of metadata sets about the source data, specifically including,
performing a Yarn scheduling process on the data queue to generate a number of metadata sets about the source data;
in the step S103, a data warehouse concerning the plurality of metadata sets is constructed, and the data pushing process for the plurality of metadata in the data warehouse specifically includes,
constructing a data warehouse related to the metadata set, performing label processing and data business meaning conversion processing on metadata types on the data warehouse, and performing data pushing processing on the metadata so as to push the metadata to a corresponding data analysis interface;
in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set, so as to obtain a modeling prediction result about the data to be processed specifically includes,
step S201A, obtaining characteristic information about a neural network model through a deep learning mode, and constructing the neural network model about the data to be processed according to the characteristic information;
step S202A, performing distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
step 203A, performing parameter configuration processing and model prediction processing on the neural network model to obtain the modeling prediction result of the data to be processed;
in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed, and the self-adaptive transmission processing is performed on the data to be processed according to the data set and the network state, which specifically includes,
step S201B, acquiring the data to be processed, determining a transmission coefficient of the data to be processed according to the following formula (1),
(1);
in the above-mentioned formula (1),for the transmission coefficient, +.>Performing self-help modeling processing and/or data transmission stability coefficient in modeling optimization processing for preset data to be processed, wherein N is the number of data packets contained in the data to be processed,for the packet loss probability of the kth data packet in the data to be processed, the +.>Data transmission channel bandwidth, < > and->K=1, 2, 3, …, N, an exponential function based on a natural constant e;
step S202B, according to the following formula (2), determining the data noise coefficient of the data to be processed,
(2);
in the above-mentioned formula (2),for the data noise figure,/->Is an infinite mathematical character ++>For the size of the data to be processed, < >>For inclusion of argument +.>And->Function of->Make a second integration, and the integrated parameter of the first integration is +.>The lower limit of the quilt is->The upper limit of the quilt is->The second integration has an integrated parameter of +.>The lower limit of the quilt product is 0, and the upper limit of the quilt product is +.>;
Step S203B of determining a transmission speed of the data to be processed according to the following formula (3),
(3);
in the above-mentioned formula (3),for the transmission speed, +.>For presetting the minimum transmission speed, < >>For the maximum transmission speed of the path, < > for>Rounding the data, and taking a zero value when the data is negative;
step S204B, transmitting the data to be processed according to the transmission speed, and performing self-service modeling processing and/or modeling optimization processing.
2. The self-help modeling method based on a data analysis platform as claimed in claim 1, wherein:
in said step S3, generating a visualized data analysis statistical result and/or statistical analysis report according to said modeling prediction result specifically includes,
step S301, performing abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
step S302, determining analysis statistical results about the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
and step S303, carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visualized data analysis statistical result and/or a statistical analysis report.
3. The self-help modeling system based on the data analysis platform is characterized in that:
the self-service modeling system based on the data analysis platform comprises a pre-integration processing module, an automatic modeling/modeling optimization module and a visual result generation module; wherein,
the pre-integration processing module is used for carrying out pre-integration processing on source data so as to convert the source data into a data set with a preset data structure;
the automatic modeling/modeling optimization module is used for performing self-service modeling processing and/or modeling optimization processing on the data to be processed in the data set so as to obtain a modeling prediction result about the data to be processed;
the visualized result generation module is used for generating visualized data analysis statistical results and/or statistical analysis reports according to the modeling prediction results;
the pre-integration processing module comprises an access sub-module, a queue generating sub-module, a scheduling sub-module and a pushing sub-module; wherein,
the access sub-module is used for carrying out access processing on the source data about a predetermined data access engine;
the queue generating sub-module is used for converting the source data subjected to the access processing into a data queue with a preset structure;
the scheduling sub-module is used for performing scheduling processing on the data queue to generate a plurality of metadata sets about the source data;
the pushing sub-module is used for carrying out data pushing processing on a data bin formed by constructing the plurality of metadata sets;
the access sub-module is used for carrying out access processing on the Spark engine or the flank engine on the source data;
the queue generating sub-module is used for converting the source data subjected to the access processing into a Kafka data queue;
the scheduling sub-module is used for carrying out Yarn scheduling processing on the data queue so as to generate a plurality of metadata sets about the source data;
the pushing sub-module is used for carrying out label processing and data service meaning conversion processing on metadata types on the data warehouse before carrying out the data pushing processing, and then pushing the metadata to corresponding data analysis interfaces;
the automatic modeling/modeling optimization module performs self-help modeling processing and/or modeling optimization processing on the data to be processed in the data set to obtain modeling prediction results about the data to be processed, wherein the modeling prediction results specifically comprise,
acquiring characteristic information about a neural network model through a deep learning mode, and constructing the neural network model about the data to be processed according to the characteristic information;
performing distributed deployment processing and/or automatic optimization processing on the neural network model so that the neural network model meets preset model convergence conditions;
performing parameter configuration processing and model prediction processing on the neural network model to obtain the modeling prediction result of the data to be processed;
the automatic modeling/modeling optimization module performs self-help modeling processing and/or modeling optimization processing on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed, and is further used for performing adaptive transmission processing on the data to be processed according to the data set and a network state, and the method specifically comprises the steps of,
acquiring the data to be processed, determining the transmission coefficient of the data to be processed according to the following formula (1),
(1);
in the above-mentioned formula (1),for the transmission coefficient, +.>Self-service for presetting data to be processedThe data transmission stability coefficient in the modeling processing and/or the modeling optimization processing process, N is the number of data packets contained in the data to be processed,for the packet loss probability of the kth data packet in the data to be processed, the +.>Data transmission channel bandwidth, < > and->K=1, 2, 3, …, N, an exponential function based on a natural constant e;
determining a data noise figure of the data to be processed according to the following formula (2),
(2);
in the above-mentioned formula (2),for the data noise figure,/->Is an infinite mathematical character ++>For the size of the data to be processed, < >>For inclusion of argument +.>And->Function of->Make a second integration, and the integrated parameter of the first integration is +.>The lower limit of the quilt is->The upper limit of the quilt is->The second integration has an integrated parameter of +.>The method comprises the steps of carrying out a first treatment on the surface of the The lower limit of the quilt product is 0, and the upper limit of the quilt product is +.>;
Determining the transmission speed of the data to be processed according to the following formula (3),
(3);
in the above-mentioned formula (3),for the transmission speed, +.>For presetting the minimum transmission speed, < >>For the maximum transmission speed of the path, < > for>Rounding the data, and taking a zero value when the data is negative;
and transmitting the data to be processed according to the transmission speed, and performing self-service modeling processing and/or modeling optimization processing.
4. A self-service modeling system based on a data analysis platform as claimed in claim 3, wherein:
the automatic modeling/modeling optimization module comprises a model feature acquisition sub-module, a model construction sub-module, a deployment/tuning sub-module and a model configuration/prediction sub-module; wherein,
the model feature acquisition sub-module is used for acquiring feature information about the neural network model through a deep learning mode;
the model construction submodule is used for constructing a neural network model related to the data to be processed according to the characteristic information;
the deployment/tuning submodule is used for carrying out distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
the model configuration/prediction submodule is used for carrying out parameter configuration processing and model prediction processing on the neural network model so as to obtain the modeling prediction result of the data to be processed.
5. A self-service modeling system based on a data analysis platform as claimed in claim 3, wherein:
the visual result generation module comprises a prediction result processing sub-module, an analysis and statistics sub-module and a visualization/encapsulation sub-module; wherein,
the prediction result processing sub-module is used for carrying out abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
the analysis and statistics sub-module is used for determining analysis and statistics results related to the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
the visualization/packaging submodule is used for carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visual data analysis statistical result and/or a statistical analysis report.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911150485.1A CN111126661B (en) | 2019-11-21 | 2019-11-21 | Self-help modeling method and system based on data analysis platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911150485.1A CN111126661B (en) | 2019-11-21 | 2019-11-21 | Self-help modeling method and system based on data analysis platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111126661A CN111126661A (en) | 2020-05-08 |
CN111126661B true CN111126661B (en) | 2023-11-24 |
Family
ID=70496210
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911150485.1A Active CN111126661B (en) | 2019-11-21 | 2019-11-21 | Self-help modeling method and system based on data analysis platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111126661B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956015A (en) * | 2016-04-22 | 2016-09-21 | 四川中软科技有限公司 | Service platform integration method based on big data |
CN109255523A (en) * | 2018-08-16 | 2019-01-22 | 北京奥技异科技发展有限公司 | Analysis indexes computing platform based on KKS coding rule and big data framework |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8271430B2 (en) * | 2008-06-02 | 2012-09-18 | The Boeing Company | Methods and systems for metadata driven data capture for a temporal data warehouse |
-
2019
- 2019-11-21 CN CN201911150485.1A patent/CN111126661B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956015A (en) * | 2016-04-22 | 2016-09-21 | 四川中软科技有限公司 | Service platform integration method based on big data |
CN109255523A (en) * | 2018-08-16 | 2019-01-22 | 北京奥技异科技发展有限公司 | Analysis indexes computing platform based on KKS coding rule and big data framework |
Non-Patent Citations (4)
Title |
---|
基于Spark的分布式大数据分析建模系统的设计与实现;徐时芳;罗晓宾;陈阳华;;现代电子技术(第20期);第172-174/178页 * |
基于大数据平台构建数据仓库的研究与实践;赵毅;;中国金融电脑(第05期);第37-42页 * |
基于大数据的全业务统一数据中心数据分析域建设研究;朱碧钦;吴飞;罗富财;;电力信息与通信技术(第02期);第91-96页 * |
基于大数据的数据服务应用研究;陈光;;计算机技术与发展(第08期);第129-134页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111126661A (en) | 2020-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022001918A1 (en) | Method and apparatus for building predictive model, computing device, and storage medium | |
CN110324170B (en) | Data analysis equipment, multi-model co-decision system and method | |
CN110428127B (en) | Automatic analysis method, user equipment, storage medium and device | |
CN112671757A (en) | Encrypted flow protocol identification method and device based on automatic machine learning | |
CN110852387B (en) | Energy internet super real-time state studying and judging algorithm | |
CN116680459B (en) | Foreign trade content data processing system based on AI technology | |
CN114666224A (en) | Dynamic allocation method, device, equipment and storage medium for business resource capacity | |
CN110427298A (en) | A kind of Automatic Feature Extraction method of distributed information log | |
CN105718307B (en) | Process management method and management of process device | |
CN116594857A (en) | Office software intelligent interaction management platform based on artificial intelligence | |
CN105868222A (en) | Task scheduling method and device | |
CN110691003A (en) | Network traffic classification method, device and storage medium | |
WO2024140067A1 (en) | Vibration signal processing method based on supervised contrastive learning, and device | |
CN117040141B (en) | Safety monitoring system and method for electric power intelligent gateway | |
CN113642700A (en) | Cross-platform multi-modal public opinion analysis method based on federal learning and edge calculation | |
CN109379245A (en) | A kind of wifi report form generation method and system | |
CN116980284A (en) | Optical cable fiber distribution box operation and maintenance information transmission method and system based on Internet of things | |
CN114979309A (en) | Method for supporting random access and processing of networked target data | |
CN114997325A (en) | Deep learning algorithm management system based on network cooperation | |
CN111126661B (en) | Self-help modeling method and system based on data analysis platform | |
Guangyi et al. | Native intelligence for 6G mobile network: technical challenges, architecture and key features | |
CN112288317A (en) | Industrial big data analysis platform and method based on multi-source heterogeneous data governance | |
CN116578924A (en) | Network task optimization method and system for machine learning classification | |
CN116700929A (en) | Task batch processing method and system based on artificial intelligence | |
CN105955895A (en) | Distributed message queue logic control method and device, and data processing equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |