CN111126661B - Self-help modeling method and system based on data analysis platform - Google Patents

Self-help modeling method and system based on data analysis platform Download PDF

Info

Publication number
CN111126661B
CN111126661B CN201911150485.1A CN201911150485A CN111126661B CN 111126661 B CN111126661 B CN 111126661B CN 201911150485 A CN201911150485 A CN 201911150485A CN 111126661 B CN111126661 B CN 111126661B
Authority
CN
China
Prior art keywords
data
processing
modeling
processed
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911150485.1A
Other languages
Chinese (zh)
Other versions
CN111126661A (en
Inventor
吴占伟
翟伟辰
何军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gechuang Dongzhi Shenzhen Technology Co ltd
Original Assignee
Gechuang Dongzhi Shenzhen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gechuang Dongzhi Shenzhen Technology Co ltd filed Critical Gechuang Dongzhi Shenzhen Technology Co ltd
Priority to CN201911150485.1A priority Critical patent/CN111126661B/en
Publication of CN111126661A publication Critical patent/CN111126661A/en
Application granted granted Critical
Publication of CN111126661B publication Critical patent/CN111126661B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention provides a self-service modeling method and a self-service modeling system based on a data analysis platform, which are used for carrying out corresponding processing on manufacturing data from the aspects of data development, data analysis and data service application, so that the manufacturing data are converted into different analysis models, and automatic analysis, prediction and monitoring of the manufacturing data are realized, thereby being convenient for improving the mining value of the manufacturing data.

Description

Self-help modeling method and system based on data analysis platform
Technical Field
The invention relates to the technical field of manufacturing industry data analysis, in particular to a self-help modeling method and system based on a data analysis platform.
Background
In order to enable the data information to be fed back to be used as high-value data for improving the production and manufacturing capacity, the data information needs to be analyzed and processed, so that information processing systems for analyzing and modeling different types of manufacturing data have been developed in the prior art, but the information processing systems have no data relevance, so that corresponding resource integration and same processing treatment cannot be performed on all manufacturing data; in addition, the analysis and the value mining processing of the manufacturing data are also realized based on the data access, the data summarization, the modeling data analysis and other works of the data analysis modeling personnel, and the manufacturing data have huge volumes and complicated dimensional structures, so that the data analysis modeling personnel can develop a data analysis model suitable for a certain manufacturing scene only by spending more manpower and material resources, and the data analysis model is not suitable for other manufacturing scenes, so that the universality of the data analysis model is seriously reduced.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a self-help modeling method and a self-help modeling system based on a data analysis platform, which carry out corresponding processing on manufacturing data from the three aspects of data development, data analysis and data service application, so as to convert the manufacturing data into a model suitable for different analysis and realize automatic analysis, prediction and monitoring of the manufacturing data, thereby being convenient for improving the mining value of the manufacturing data; in addition, the method and the system have the following advantages: the first, the method and the system are convenient for the corresponding access, storage and management operation of the data processing and developing personnel to the manufacturing data, so as to integrate the different types of manufacturing data into the data meeting the corresponding requirements, and also endow the different types of manufacturing data with corresponding data business meanings so as to facilitate the subsequent data analysis; secondly, the method and the system can also perform adaptive self-service modeling, optimizing, predicting and evaluating operations aiming at manufacturing line analysts so as to realize optimizing analysis modeling of manufacturing industry data, thereby selecting an optimal model to be released into a platform to execute real-time prediction; thirdly, the method and the system can also carry out real-time data analysis result statistics and self-service generation of statistical reports aiming at the manager, thereby realizing visual check and real-time monitoring of manufacturing data.
The invention provides a self-help modeling method based on a data analysis platform, which is characterized by comprising the following steps of:
step S1, pre-integrating source data to convert the source data into a data set with a preset data structure;
step S2, self-service modeling processing and/or modeling optimization processing are carried out on the data to be processed in the data set, so that a modeling prediction result of the data to be processed is obtained;
step S3, generating a visualized data analysis statistical result and/or a statistical analysis report according to the modeling prediction result;
further, in the step S1, pre-integration processing is performed on the source data to convert the source data into a data set having a predetermined data structure, specifically including,
step S101, carrying out access processing on the source data about a preset data access engine, and converting the source data subjected to the access processing into a data queue with a preset structure;
step S102, scheduling the data queue to generate a plurality of metadata sets about the source data;
step S103, constructing a data warehouse related to the metadata sets, and performing data pushing processing on the metadata in the data warehouse;
further, in the step S101, the access processing with respect to the predetermined data access engine is performed on the source data, and the conversion of the source data subjected to the access processing into a data queue having a predetermined structure specifically includes,
after the access processing of the Spark engine or the flank engine is carried out on the source data, the source data is converted into a Kafka data queue;
or,
in the step S102, the data queue is subjected to a scheduling process to generate a plurality of metadata sets about the source data, specifically including,
performing a Yarn scheduling process on the data queue to generate a number of metadata sets about the source data;
or,
in the step S103, a data warehouse concerning the plurality of metadata sets is constructed, and the data pushing process for the plurality of metadata in the data warehouse specifically includes,
constructing a data warehouse related to the metadata set, performing label processing and data business meaning conversion processing on metadata types on the data warehouse, and performing data pushing processing on the metadata so as to push the metadata to a corresponding data analysis interface;
further, in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set, so as to obtain a modeling prediction result about the data to be processed specifically includes,
step S201A, obtaining characteristic information about a neural network model through a deep learning mode, and constructing the neural network model about the data to be processed according to the characteristic information;
step S202A, performing distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
step 203A, performing parameter configuration processing and model prediction processing on the neural network model to obtain the modeling prediction result of the data to be processed;
or,
in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed, and the self-adaptive transmission processing is performed on the data to be processed according to the data set and the network state, which specifically includes,
step S201B, acquiring the data to be processed, determining a transmission coefficient of the data to be processed according to the following formula (1),
in the above formula (1), pw is the transmission coefficient, pp is a data transmission stability coefficient in the self-help modeling and/or modeling optimization process of the preset data to be processed, N is the data of the data packet contained in the data to be processed, and Pa k For the packet loss probability of the kth data packet in the data to be processed, the Pk is the data transmission fluxThe lane bandwidth exp is an exponential function based on a natural constant e, k=1, 2, 3, …, N;
step S202B, according to the following formula (2), determining the data noise coefficient of the data to be processed,
in the above formula (2), md is the data noise figure, infinity is an infinite mathematical character, size is the size of the data to be processed,as a function exp (Pw log) of the parameters fx and yx containing the independent variables 2 1+size yx * ) 1+fx is integrated twice, and the first integration is integrated with fx as the integrated parameter and with the integrated lower limit of +.>The upper limit of the product is ≡, the parameter of the second integration is yx, the lower limit of the product is 0, and the upper limit of the product is ≡;
step S203B of determining a transmission speed of the data to be processed according to the following formula (3),
in the above formula (3), rt is the transport speed, ks is a preset minimum transport speed, pi is a path maximum transport speed,rounding the data, and taking a zero value when the data is negative;
step S204B, transmitting the data to be processed according to the transmission speed, and performing self-service modeling processing and/or modeling optimization processing;
further, in said step S3, generating visualized data analysis statistics and/or statistics analysis reports specifically comprises,
step S301, performing abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
step S302, determining analysis statistical results about the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
and step S303, carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visualized data analysis statistical result and/or a statistical analysis report.
The invention also provides a self-service modeling system based on the data analysis platform, which is characterized in that:
the self-service modeling system based on the data analysis platform comprises a pre-integration processing module, an automatic modeling/modeling optimization module and a visual result generation module; wherein,
the pre-integration processing module is used for carrying out pre-integration processing on source data so as to convert the source data into a data set with a preset data structure;
the automatic modeling/modeling optimization module is used for performing self-service modeling processing and/or modeling optimization processing on the data to be processed in the data set so as to obtain a modeling prediction result about the data to be processed;
the visualized result generation module is used for generating visualized data analysis statistical results and/or statistical analysis reports according to the modeling prediction results;
further, the pre-integration processing module comprises an access sub-module, a queue generating sub-module, a scheduling sub-module and a pushing sub-module; wherein,
the access sub-module is used for carrying out access processing on the source data about a predetermined data access engine;
the queue generating sub-module is used for converting the source data subjected to the access processing into a data queue with a preset structure;
the scheduling sub-module is used for performing scheduling processing on the data queue to generate a plurality of metadata sets about the source data;
the pushing sub-module is used for carrying out data pushing processing on a data bin formed by constructing the plurality of metadata sets;
further, the access sub-module is used for performing access processing on the source data about a Spark engine or a flank engine;
or,
the queue generating sub-module is used for converting the source data subjected to the access processing into a Kafka data queue;
or,
the scheduling sub-module is used for carrying out Yarn scheduling processing on the data queue so as to generate a plurality of metadata sets about the source data;
or,
the pushing sub-module is used for carrying out label processing and data service meaning conversion processing on metadata types on the data warehouse before carrying out the data pushing processing, and then pushing the metadata to corresponding data analysis interfaces;
further, the automatic modeling/modeling optimization module comprises a model feature acquisition sub-module, a model construction sub-module, a deployment/tuning sub-module and a model configuration/prediction sub-module; wherein,
the model feature acquisition sub-module is used for acquiring feature information about the neural network model through a deep learning mode;
the model construction submodule is used for constructing a neural network model related to the data to be processed according to the characteristic information;
the deployment/tuning submodule is used for carrying out distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
the model configuration/prediction submodule is used for carrying out parameter configuration processing and model prediction processing on the neural network model so as to obtain the modeling prediction result of the data to be processed;
further, the visualized result generation module comprises a predicted result processing sub-module, an analysis and statistics sub-module and a visualized/packaged sub-module; wherein,
the prediction result processing sub-module is used for carrying out abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
the analysis and statistics sub-module is used for determining analysis and statistics results related to the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
the visualization/packaging submodule is used for carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visual data analysis statistical result and/or a statistical analysis report.
Compared with the prior art, the self-service modeling method and the system based on the data analysis platform correspondingly process the manufacturing data from the aspects of data development, data analysis and data service application, so that the manufacturing data is converted into the data suitable for different analysis models, and the automatic analysis, prediction and monitoring of the manufacturing data are realized, so that the mining value of the manufacturing data is improved; in addition, the method and the system have the following advantages: the first, the method and the system are convenient for the corresponding access, storage and management operation of the data processing and developing personnel to the manufacturing data, so as to integrate the different types of manufacturing data into the data meeting the corresponding requirements, and also endow the different types of manufacturing data with corresponding data business meanings so as to facilitate the subsequent data analysis; secondly, the method and the system can also perform adaptive self-service modeling, optimizing, predicting and evaluating operations aiming at manufacturing line analysts so as to realize optimizing analysis modeling of manufacturing industry data, thereby selecting an optimal model to be released into a platform to execute real-time prediction; thirdly, the method and the system can also carry out real-time data analysis result statistics and self-service generation of statistical reports aiming at the manager, thereby realizing visual check and real-time monitoring of manufacturing data.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a self-service modeling method based on a data analysis platform.
Fig. 2 is a schematic structural diagram of a self-service modeling system based on a data analysis platform.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flow chart of a self-service modeling method based on a data analysis platform according to an embodiment of the present invention is shown. The self-help modeling method based on the data analysis platform comprises the following steps:
step S1, pre-integration processing is performed on the source data to convert the source data into a data set with a predetermined data structure.
Preferably, in this step S1, a pre-integration process is performed on the source data to convert the source data into a data set having a predetermined data structure, specifically including,
step S101, carrying out access processing on the source data about a predetermined data access engine, and converting the source data subjected to the access processing into a data queue with a predetermined structure;
step S102, scheduling the data queue to generate a plurality of metadata sets about the source data;
step S103, a data warehouse related to the metadata sets is constructed, and data pushing processing is performed on the metadata in the data warehouse.
Preferably, in the step S101, the access processing with respect to the predetermined data access engine is performed on the source data, and converting the source data subjected to the access processing into a data queue having a predetermined structure specifically includes,
after the access processing on the Spark engine or the flank engine is performed on the source data, the source data is converted into a Kafka data queue.
Preferably, in the step S102, the data queue is subjected to a scheduling process to generate a plurality of metadata sets about the source data, specifically including,
the data queue is Yarn scheduled to generate sets of metadata about the source data.
Preferably, in this step S103, a data warehouse is constructed for the several metadata sets, and the data pushing process for the several metadata in the data warehouse specifically includes,
constructing a data warehouse related to the metadata set, performing label processing and data business meaning conversion processing on metadata types on the data warehouse, and performing data pushing processing on the metadata so as to push the metadata to a corresponding data analysis interface.
And S2, performing self-service modeling processing and/or modeling optimization processing on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed.
Preferably, in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set, so as to obtain a modeling prediction result about the data to be processed specifically includes,
step S201A, obtaining characteristic information about a neural network model through a deep learning mode, and constructing the neural network model about the data to be processed according to the characteristic information;
step S202A, performing distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
in step S203A, a parameter configuration process and a model prediction process are performed on the neural network model to obtain the modeling prediction result of the data to be processed.
Preferably, in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed, and the self-adaptive transmission processing is further performed on the data to be processed according to the data set and the network state, which specifically includes,
step S201B, acquiring the data to be processed, determining a transmission coefficient of the data to be processed according to the following formula (1),
in the above formula (1), pw is the transmission coefficient, pp is a data transmission stability coefficient in the self-help modeling and/or modeling optimization process of the preset data to be processed, where the data transmission stability coefficient Pp generally takes a value of 0.3, where Pa is the data of the data packet contained in the data to be processed k For the packet loss probability of the kth data packet in the data to be processed, pk is the bandwidth of a data transmission channel, exp is an exponential function based on a natural constant e, and k=1, 2, 3, … and N;
step S202B, determining the data noise figure of the data to be processed according to the following formula (2),
in the above formula (2), md is the data noise figure, infinity is an infinite mathematical character, size is the size of the data to be processed,as a function exp (Pw log) of the parameters fx and yx containing the independent variables 2 1+size yx ) Taking secondary integration by 1+fx, and the first integration has the integrated parameter fx and the integrated lower limit of +.>The upper limit of the product is ≡, the parameter of the second integration is yx, the lower limit of the product is 0, and the upper limit of the product is ≡;
step S203B, determining the transmission speed of the data to be processed according to the following formula (3),
in the above formula (3), rt is the transmission speed, ks is a preset minimum transmission speed, the preset minimum transmission speed Ks generally takes a value of 50K/s, pi is the path maximum transmission speed,rounding the data, and taking a zero value when the data is negative;
step S204B, the data to be processed is transmitted according to the transmission speed, and self-service modeling processing and/or modeling optimization processing are performed.
Through the steps, self-service modeling processing and/or modeling optimization processing can be performed on the data to be processed in the data set, so that the data to be processed can be intelligently controlled to be transmitted and self-service modeling processing and/or modeling optimization processing can be performed in the process of obtaining a modeling prediction result of the data to be processed, the data can be processed according to the data characteristics and the hardware characteristics of the data analysis platform, and the stability of pre-integration processing and the data transmission efficiency can be improved during the self-service modeling processing and/or modeling optimization processing.
And step S3, generating a visualized data analysis statistical result and/or a statistical analysis report according to the modeling prediction result.
Preferably, in this step S3, generating visualized data analysis statistics and/or statistics analysis reports specifically comprises,
step S301, carrying out abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
step S302, determining analysis statistical results about the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
step S303, performing visualization processing and packaging processing on the analysis statistical result to correspondingly generate a transferable visualized data analysis statistical result and/or a statistical analysis report.
Referring to fig. 2, a schematic structural diagram of a self-service modeling system based on a data analysis platform according to an embodiment of the present invention is provided. The self-service modeling system based on the data analysis platform comprises a pre-integration processing module, an automatic modeling/modeling optimization module and a visual result generation module; wherein,
the pre-integration processing module is used for carrying out pre-integration processing on the source data so as to convert the source data into a data set with a preset data structure;
the automatic modeling/modeling optimization module is used for performing self-service modeling processing and/or modeling optimization processing on the data to be processed in the data set so as to obtain a modeling prediction result about the data to be processed;
the visualized result generation module is used for generating visualized data analysis statistical results and/or statistical analysis reports according to the modeling prediction results.
Preferably, the pre-integration processing module comprises an access sub-module, a queue generating sub-module, a scheduling sub-module and a pushing sub-module; wherein,
the access sub-module is used for carrying out access processing on the source data about a predetermined data access engine;
the queue generating sub-module is used for converting the source data subjected to the access processing into a data queue with a preset structure;
the scheduling submodule is used for scheduling the data queue to generate a plurality of metadata sets about the source data;
the pushing sub-module is used for carrying out data pushing processing on a data bin formed by constructing the metadata sets.
Preferably, the access submodule is used for performing access processing on the source data about a Spark engine or a flank engine.
Preferably, the queue generating submodule is configured to convert the source data subjected to the access processing into a Kafka data queue.
Preferably, the scheduling sub-module is configured to perform a Yarn scheduling process on the data queue to generate a number of metadata sets for the source data.
Preferably, the pushing sub-module is configured to perform label processing and data service meaning conversion processing on metadata types on the data warehouse before performing the data pushing processing, and then push the metadata to the corresponding data analysis interface.
Preferably, the automatic modeling/modeling optimization module comprises a model feature acquisition sub-module, a model construction sub-module, a deployment/tuning sub-module and a model configuration/prediction sub-module; wherein,
the model feature acquisition submodule is used for acquiring feature information about a neural network model through a deep learning mode;
the model construction submodule is used for constructing a neural network model related to the data to be processed according to the characteristic information;
the deployment/tuning sub-module is used for carrying out distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
the model configuration/prediction submodule is used for carrying out parameter configuration processing and model prediction processing on the neural network model so as to obtain the modeling prediction result of the data to be processed.
Preferably, the visualized result generating module comprises a predicted result processing sub-module, an analysis and statistics sub-module and a visualization/encapsulation sub-module; wherein,
the prediction result processing submodule is used for carrying out abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
the analysis and statistics submodule is used for determining analysis and statistics results related to the source data according to results of the abnormal state mining processing and/or the data evolution comparison processing;
the visualization/packaging submodule is used for carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visualized data analysis statistical result and/or a statistical analysis report.
As can be seen from the foregoing embodiments, the self-service modeling method and system based on the data analysis platform performs corresponding processing on the manufacturing data from the aspects of data development, data analysis and data service application, so as to convert the manufacturing data into a model suitable for different analysis and implement automatic analysis, prediction and monitoring of the manufacturing data, so as to improve the mining value of the manufacturing data; in addition, the method and the system have the following advantages: the first, the method and the system are convenient for the corresponding access, storage and management operation of the data processing and developing personnel to the manufacturing data, so as to integrate the different types of manufacturing data into the data meeting the corresponding requirements, and also endow the different types of manufacturing data with corresponding data business meanings so as to facilitate the subsequent data analysis; secondly, the method and the system can also perform adaptive self-service modeling, optimizing, predicting and evaluating operations aiming at manufacturing line analysts so as to realize optimizing analysis modeling of manufacturing industry data, thereby selecting an optimal model to be released into a platform to execute real-time prediction; thirdly, the method and the system can also carry out real-time data analysis result statistics and self-service generation of statistical reports aiming at the manager, thereby realizing visual check and real-time monitoring of manufacturing data.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (5)

1. The self-help modeling method based on the data analysis platform is characterized by comprising the following steps of:
step S1, pre-integrating source data to convert the source data into a data set with a preset data structure;
step S2, self-service modeling processing and/or modeling optimization processing are carried out on the data to be processed in the data set, so that a modeling prediction result of the data to be processed is obtained;
step S3, generating a visualized data analysis statistical result and/or a statistical analysis report according to the modeling prediction result;
in said step S1, pre-integration processing is performed on the source data to convert said source data into a data set having a predetermined data structure comprising in particular,
step S101, carrying out access processing on the source data about a preset data access engine, and converting the source data subjected to the access processing into a data queue with a preset structure;
step S102, scheduling the data queue to generate a plurality of metadata sets about the source data;
step S103, constructing a data warehouse related to the metadata sets, and performing data pushing processing on the metadata in the data warehouse;
in the step S101, an access process with respect to a predetermined data access engine is performed on the source data, and converting the source data subjected to the access process into a data queue having a predetermined structure specifically includes,
after the access processing of the Spark engine or the flank engine is carried out on the source data, the source data is converted into a Kafka data queue;
in the step S102, the data queue is subjected to a scheduling process to generate a plurality of metadata sets about the source data, specifically including,
performing a Yarn scheduling process on the data queue to generate a number of metadata sets about the source data;
in the step S103, a data warehouse concerning the plurality of metadata sets is constructed, and the data pushing process for the plurality of metadata in the data warehouse specifically includes,
constructing a data warehouse related to the metadata set, performing label processing and data business meaning conversion processing on metadata types on the data warehouse, and performing data pushing processing on the metadata so as to push the metadata to a corresponding data analysis interface;
in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set, so as to obtain a modeling prediction result about the data to be processed specifically includes,
step S201A, obtaining characteristic information about a neural network model through a deep learning mode, and constructing the neural network model about the data to be processed according to the characteristic information;
step S202A, performing distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
step 203A, performing parameter configuration processing and model prediction processing on the neural network model to obtain the modeling prediction result of the data to be processed;
in the step S2, self-service modeling processing and/or modeling optimization processing is performed on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed, and the self-adaptive transmission processing is performed on the data to be processed according to the data set and the network state, which specifically includes,
step S201B, acquiring the data to be processed, determining a transmission coefficient of the data to be processed according to the following formula (1),
(1);
in the above-mentioned formula (1),for the transmission coefficient, +.>Performing self-help modeling processing and/or data transmission stability coefficient in modeling optimization processing for preset data to be processed, wherein N is the number of data packets contained in the data to be processed,for the packet loss probability of the kth data packet in the data to be processed, the +.>Data transmission channel bandwidth, < > and->K=1, 2, 3, …, N, an exponential function based on a natural constant e;
step S202B, according to the following formula (2), determining the data noise coefficient of the data to be processed,
(2);
in the above-mentioned formula (2),for the data noise figure,/->Is an infinite mathematical character ++>For the size of the data to be processed, < >>For inclusion of argument +.>And->Function of->Make a second integration, and the integrated parameter of the first integration is +.>The lower limit of the quilt is->The upper limit of the quilt is->The second integration has an integrated parameter of +.>The lower limit of the quilt product is 0, and the upper limit of the quilt product is +.>
Step S203B of determining a transmission speed of the data to be processed according to the following formula (3),
(3);
in the above-mentioned formula (3),for the transmission speed, +.>For presetting the minimum transmission speed, < >>For the maximum transmission speed of the path, < > for>Rounding the data, and taking a zero value when the data is negative;
step S204B, transmitting the data to be processed according to the transmission speed, and performing self-service modeling processing and/or modeling optimization processing.
2. The self-help modeling method based on a data analysis platform as claimed in claim 1, wherein:
in said step S3, generating a visualized data analysis statistical result and/or statistical analysis report according to said modeling prediction result specifically includes,
step S301, performing abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
step S302, determining analysis statistical results about the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
and step S303, carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visualized data analysis statistical result and/or a statistical analysis report.
3. The self-help modeling system based on the data analysis platform is characterized in that:
the self-service modeling system based on the data analysis platform comprises a pre-integration processing module, an automatic modeling/modeling optimization module and a visual result generation module; wherein,
the pre-integration processing module is used for carrying out pre-integration processing on source data so as to convert the source data into a data set with a preset data structure;
the automatic modeling/modeling optimization module is used for performing self-service modeling processing and/or modeling optimization processing on the data to be processed in the data set so as to obtain a modeling prediction result about the data to be processed;
the visualized result generation module is used for generating visualized data analysis statistical results and/or statistical analysis reports according to the modeling prediction results;
the pre-integration processing module comprises an access sub-module, a queue generating sub-module, a scheduling sub-module and a pushing sub-module; wherein,
the access sub-module is used for carrying out access processing on the source data about a predetermined data access engine;
the queue generating sub-module is used for converting the source data subjected to the access processing into a data queue with a preset structure;
the scheduling sub-module is used for performing scheduling processing on the data queue to generate a plurality of metadata sets about the source data;
the pushing sub-module is used for carrying out data pushing processing on a data bin formed by constructing the plurality of metadata sets;
the access sub-module is used for carrying out access processing on the Spark engine or the flank engine on the source data;
the queue generating sub-module is used for converting the source data subjected to the access processing into a Kafka data queue;
the scheduling sub-module is used for carrying out Yarn scheduling processing on the data queue so as to generate a plurality of metadata sets about the source data;
the pushing sub-module is used for carrying out label processing and data service meaning conversion processing on metadata types on the data warehouse before carrying out the data pushing processing, and then pushing the metadata to corresponding data analysis interfaces;
the automatic modeling/modeling optimization module performs self-help modeling processing and/or modeling optimization processing on the data to be processed in the data set to obtain modeling prediction results about the data to be processed, wherein the modeling prediction results specifically comprise,
acquiring characteristic information about a neural network model through a deep learning mode, and constructing the neural network model about the data to be processed according to the characteristic information;
performing distributed deployment processing and/or automatic optimization processing on the neural network model so that the neural network model meets preset model convergence conditions;
performing parameter configuration processing and model prediction processing on the neural network model to obtain the modeling prediction result of the data to be processed;
the automatic modeling/modeling optimization module performs self-help modeling processing and/or modeling optimization processing on the data to be processed in the data set to obtain a modeling prediction result about the data to be processed, and is further used for performing adaptive transmission processing on the data to be processed according to the data set and a network state, and the method specifically comprises the steps of,
acquiring the data to be processed, determining the transmission coefficient of the data to be processed according to the following formula (1),
(1);
in the above-mentioned formula (1),for the transmission coefficient, +.>Self-service for presetting data to be processedThe data transmission stability coefficient in the modeling processing and/or the modeling optimization processing process, N is the number of data packets contained in the data to be processed,for the packet loss probability of the kth data packet in the data to be processed, the +.>Data transmission channel bandwidth, < > and->K=1, 2, 3, …, N, an exponential function based on a natural constant e;
determining a data noise figure of the data to be processed according to the following formula (2),
(2);
in the above-mentioned formula (2),for the data noise figure,/->Is an infinite mathematical character ++>For the size of the data to be processed, < >>For inclusion of argument +.>And->Function of->Make a second integration, and the integrated parameter of the first integration is +.>The lower limit of the quilt is->The upper limit of the quilt is->The second integration has an integrated parameter of +.>The method comprises the steps of carrying out a first treatment on the surface of the The lower limit of the quilt product is 0, and the upper limit of the quilt product is +.>
Determining the transmission speed of the data to be processed according to the following formula (3),
(3);
in the above-mentioned formula (3),for the transmission speed, +.>For presetting the minimum transmission speed, < >>For the maximum transmission speed of the path, < > for>Rounding the data, and taking a zero value when the data is negative;
and transmitting the data to be processed according to the transmission speed, and performing self-service modeling processing and/or modeling optimization processing.
4. A self-service modeling system based on a data analysis platform as claimed in claim 3, wherein:
the automatic modeling/modeling optimization module comprises a model feature acquisition sub-module, a model construction sub-module, a deployment/tuning sub-module and a model configuration/prediction sub-module; wherein,
the model feature acquisition sub-module is used for acquiring feature information about the neural network model through a deep learning mode;
the model construction submodule is used for constructing a neural network model related to the data to be processed according to the characteristic information;
the deployment/tuning submodule is used for carrying out distributed deployment processing and/or automatic tuning processing on the neural network model so that the neural network model meets preset model convergence conditions;
the model configuration/prediction submodule is used for carrying out parameter configuration processing and model prediction processing on the neural network model so as to obtain the modeling prediction result of the data to be processed.
5. A self-service modeling system based on a data analysis platform as claimed in claim 3, wherein:
the visual result generation module comprises a prediction result processing sub-module, an analysis and statistics sub-module and a visualization/encapsulation sub-module; wherein,
the prediction result processing sub-module is used for carrying out abnormal state mining processing and/or data evolution comparison processing on the source data according to the modeling prediction result;
the analysis and statistics sub-module is used for determining analysis and statistics results related to the source data according to the results of the abnormal state mining processing and/or the data evolution comparison processing;
the visualization/packaging submodule is used for carrying out visualization processing and packaging processing on the analysis statistical result so as to correspondingly generate a transferable visual data analysis statistical result and/or a statistical analysis report.
CN201911150485.1A 2019-11-21 2019-11-21 Self-help modeling method and system based on data analysis platform Active CN111126661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911150485.1A CN111126661B (en) 2019-11-21 2019-11-21 Self-help modeling method and system based on data analysis platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911150485.1A CN111126661B (en) 2019-11-21 2019-11-21 Self-help modeling method and system based on data analysis platform

Publications (2)

Publication Number Publication Date
CN111126661A CN111126661A (en) 2020-05-08
CN111126661B true CN111126661B (en) 2023-11-24

Family

ID=70496210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911150485.1A Active CN111126661B (en) 2019-11-21 2019-11-21 Self-help modeling method and system based on data analysis platform

Country Status (1)

Country Link
CN (1) CN111126661B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956015A (en) * 2016-04-22 2016-09-21 四川中软科技有限公司 Service platform integration method based on big data
CN109255523A (en) * 2018-08-16 2019-01-22 北京奥技异科技发展有限公司 Analysis indexes computing platform based on KKS coding rule and big data framework

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8271430B2 (en) * 2008-06-02 2012-09-18 The Boeing Company Methods and systems for metadata driven data capture for a temporal data warehouse

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956015A (en) * 2016-04-22 2016-09-21 四川中软科技有限公司 Service platform integration method based on big data
CN109255523A (en) * 2018-08-16 2019-01-22 北京奥技异科技发展有限公司 Analysis indexes computing platform based on KKS coding rule and big data framework

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于Spark的分布式大数据分析建模系统的设计与实现;徐时芳;罗晓宾;陈阳华;;现代电子技术(第20期);第172-174/178页 *
基于大数据平台构建数据仓库的研究与实践;赵毅;;中国金融电脑(第05期);第37-42页 *
基于大数据的全业务统一数据中心数据分析域建设研究;朱碧钦;吴飞;罗富财;;电力信息与通信技术(第02期);第91-96页 *
基于大数据的数据服务应用研究;陈光;;计算机技术与发展(第08期);第129-134页 *

Also Published As

Publication number Publication date
CN111126661A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN110659173B (en) Operation and maintenance system and method
CN110324170B (en) Data analysis equipment, multi-model co-decision system and method
CN110428127B (en) Automatic analysis method, user equipment, storage medium and device
WO2022001918A1 (en) Method and apparatus for building predictive model, computing device, and storage medium
CN112671757A (en) Encrypted flow protocol identification method and device based on automatic machine learning
CN116680459B (en) Foreign trade content data processing system based on AI technology
EP4307634A1 (en) Feature engineering programming method and apparatus
CN110852387A (en) Energy internet super real-time state studying and judging algorithm
CN105718307B (en) Process management method and management of process device
CN110691003A (en) Network traffic classification method, device and storage medium
CN114666224A (en) Dynamic allocation method, device, equipment and storage medium for business resource capacity
CN109144734A (en) A kind of container resource quota distribution method and device
CN109522138A (en) A kind of processing method and system of distributed stream data
CN110427298A (en) A kind of Automatic Feature Extraction method of distributed information log
CN116594857A (en) Office software intelligent interaction management platform based on artificial intelligence
CN105868222A (en) Task scheduling method and device
CN113642700A (en) Cross-platform multi-modal public opinion analysis method based on federal learning and edge calculation
CN114979309A (en) Method for supporting random access and processing of networked target data
CN111126661B (en) Self-help modeling method and system based on data analysis platform
CN112288317B (en) Industrial big data analysis platform and method based on multi-source heterogeneous data governance
CN116700929A (en) Task batch processing method and system based on artificial intelligence
CN109992626A (en) A kind of data processing method and device of Multidimensional Data Model
CN110971541B (en) Electric power terminal equipment identification method and system based on flow correlation matching
CN108449343B (en) SSH protocol text data acquisition method, acquisition device and computer equipment
CN117040141B (en) Safety monitoring system and method for electric power intelligent gateway

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant