CN110969263A - Advanced analysis infrastructure for machine learning - Google Patents

Advanced analysis infrastructure for machine learning Download PDF

Info

Publication number
CN110969263A
CN110969263A CN201911233132.8A CN201911233132A CN110969263A CN 110969263 A CN110969263 A CN 110969263A CN 201911233132 A CN201911233132 A CN 201911233132A CN 110969263 A CN110969263 A CN 110969263A
Authority
CN
China
Prior art keywords
machine learning
data
screening
unit
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911233132.8A
Other languages
Chinese (zh)
Inventor
彭喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201911233132.8A priority Critical patent/CN110969263A/en
Publication of CN110969263A publication Critical patent/CN110969263A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Factory Administration (AREA)

Abstract

The invention discloses an advanced analysis infrastructure for machine learning, which comprises a machine learning system, wherein the machine learning system comprises: the device comprises a data input module, a machine learning algorithm library, a screening unit, a test module and a statistic module. According to the invention, the screening unit is added in the machine learning system to perform classified screening, data deviation screening and data defect screening on the input data and data defect existing in the data set, so that the influence of poor quality of the input data on the normal operation of the machine learning system is avoided, the data defect in the data set is screened at the same time, and the screened data defect is displayed in time through the display module, so that the defects existing in the machine learning system can be repaired in time conveniently by related personnel, and the high economic loss of enterprises caused by small data defects is prevented.

Description

Advanced analysis infrastructure for machine learning
Technical Field
The invention relates to the technical field of analysis infrastructure, in particular to an advanced analysis infrastructure for machine learning.
Background
Machine learning is a process for analyzing data using a data set to determine a model (also called a rule or function) that maps input data (also called an explanatory variable or a predicted value) to output data (an independent variable or a response variable). One type of machine learning is supervised learning, in which a model is trained with a data set that includes known output data for a sufficient amount of input data. Once trained, the model can be deployed, i.e., applied to new input data to predict the desired output.
The poor data quality is one of the greatest risks of machine learning, if the data quality of machine learning is poor, the whole big data analysis work can be endangered, the whole machine learning mode is disordered, meanwhile, the poor data quality also has many defects, a small data defect is often ignored, expensive errors can be caused, and if the data quality is not controlled, the whole machine learning plan can be developed towards an error direction.
To this end, we propose an advanced analysis infrastructure for machine learning.
Disclosure of Invention
It is an object of the present invention to provide an advanced analysis infrastructure for machine learning that addresses the problems set forth in the background above.
In order to achieve the purpose, the invention adopts the following technical scheme:
an advanced analytics infrastructure for machine learning, comprising a machine learning system, the machine learning system comprising:
a data input module for inputting a data set or receiving a data set;
a machine learning algorithm library for a plurality of machine learning algorithms tested using a common interface;
the screening unit is used for carrying out classification screening, data deviation screening and screening of defects existing in the data set on the input data set;
the test module is used for evaluating and training the performance result of each machine learning model;
and the statistical module is used for comparing and counting the performance results of all the machine learning models.
Preferably, the machine learning system further comprises a processing unit for processing the input data set and the output data set.
Preferably, the machine learning system further comprises a presentation module for presenting the performance result counted by the statistics module and the compared performance result.
Preferably, the machine learning system further comprises a storage unit for storing the input data set.
Preferably, the screening unit includes a classification screening unit, a data screening unit and a defect screening unit, the classification screening unit performs classification screening on the data set processed by the processing unit, the data screening unit performs data deviation screening on the data set processed by the processing unit, and the defect screening unit screens the defects existing in the continuous data of the data set processed by the processing unit.
Preferably, the data input module is divided into a receiving unit and an output unit, the receiving unit is used for receiving the data set, and the output unit is used for outputting the processed data set.
Preferably, the machine learning system further comprises a data preprocessor, wherein the data preprocessor is used for preparing the data processed by the screening unit into a data set and processing the prepared data set by the test module.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the screening unit is added in the machine learning system to perform classified screening, data deviation screening and data defect screening on the input data and data defect existing in the data set, so that the influence of poor quality of the input data on the normal operation of the machine learning system is avoided, the data defect in the data set is screened at the same time, and the screened data defect is displayed in time through the display module, so that the defects existing in the machine learning system can be repaired in time conveniently by related personnel, and the high economic loss of enterprises caused by small data defects is prevented.
Drawings
FIG. 1 is a representation of an advanced analysis infrastructure for machine learning according to the present invention;
fig. 2 is a representation of the modules of an advanced analysis infrastructure for machine learning according to the present invention.
In the figure: 1. a machine learning system; 11. a data input module; 111. a receiving unit; 112. an output unit; 12. a processing unit; 13. a screening unit; 131. a classification screening unit; 132. a data screening unit; 133. a defect screening unit; 14. a storage unit; 2. a data preprocessor; 3. a machine learning algorithm library; 4. a test module; 41. a machine learning model; 5. a statistical module; 6. and a presentation module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The invention will be further illustrated with reference to specific examples, which are provided solely for the purpose of better understanding the present invention.
Referring to fig. 1-2, the present invention also proposes an advanced analysis infrastructure for machine learning, comprising a machine learning system 1, the machine learning system 1 comprising:
a data input module 11 for inputting a data set or receiving a data set;
a machine learning algorithm library 3 for a plurality of machine learning algorithms to be tested using a common interface, the machine learning algorithm library 3 comprising a plurality of machine learning algorithms, these machine learning algorithms are all configured to conform to a common interface, referred to as a reciprocal interface, to facilitate application of the machine learning system 1 (e.g., to facilitate testing, training, evaluation, and/or deployment), these common interfaces may define common inputs and/or outputs, common methods for inputting and/or outputting data, and/or common procedure calls for each machine learning algorithm, which may be in the form of one or more algorithm combinations in a na iotave bayes classifier, a tree-enhanced na iotave bayes classifier, a dynamic bayesian network, a support vector machine, a learning decision tree, a ensemble learning decision tree (e.g., a random forest-type structure of learning decision trees), an artificial neural network;
the screening unit 13 is used for performing classification screening on the input data set, screening data deviation and screening defects of the data set, the screening unit 13 can perform classification screening on the input data and screening errors in the data, the classification screening is performed according to the category of the data, when the input data is digital, the input data is classified according to the field to which the data belongs, when the input data is a picture, the input data is screened according to the corresponding field according to the information on the picture, and the error screening in the data refers to, for example, the input data with the length, but the metering unit is an electric power metering unit, so that the condition that the quality of the input data is poor can be avoided in time;
the testing module 4 is used for evaluating and training the performance result of each machine learning model 41, the testing module 4 comprises a plurality of machine learning models 41 and is used for evaluating and training a plurality of different data sets, and each data set comprises one or more observable data;
the statistic module 5 compares and counts the performance results of all the machine learning models 41, and is used for counting the training result and the evaluation result of each machine learning model 41 and comparing the training result and the evaluation result of each machine learning model 41.
Wherein the machine learning system 1 further comprises a processing unit 12 for processing the input data set and the output data set.
The machine learning system 1 further includes a presenting module 6, where the presenting module 6 is configured to present the performance result counted by the counting module 5 and the performance result compared, and also present the abnormal result screened out by the screening unit 13.
The machine learning system 1 further includes a storage unit 14, configured to store the input data set, and store the input data set and the output data set of the machine learning system 1, so as to facilitate querying when needed.
The screening unit 13 includes a classification screening unit 131, a data screening unit 132, and a defect screening unit 133, where the classification screening unit 131 performs classification screening on the data set processed by the processing unit 12, the data screening unit 132 performs data deviation screening on the data set processed by the processing unit 12, and the defect screening unit 133 screens the defects existing in the continuous data of the data set processed by the processing unit 12.
The data input module 11 is divided into a receiving unit 111 and an output unit 112, where the receiving unit 111 is used to receive a data set, and the output unit 112 is used to output the processed data set.
The machine learning system 1 further includes a data preprocessor 2, where the data preprocessor 2 is configured to prepare data processed by the screening unit 13 into a data set, and process the prepared data set by the testing module 4.
In the invention, data is input into a data input module 11, a receiving unit 111 of the data input module 11 receives the data, the data is input into a screening unit 13 for classification screening, data deviation screening, defect screening in the data and error screening in the data, then the data is processed by a data preprocessor 2, a matched machine learning algorithm is selected from a machine learning algorithm library 3, then the data is evaluated and trained by a test module 4, a data set processed by the test module 4 is processed by a statistical module 5, a training result and an evaluation result of each machine learning model 41 are counted, meanwhile, the training result and the evaluation result of each machine learning model 41 can be compared with each other, and then the comparison result and the statistical result are presented by a presentation module 6.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (7)

1. An advanced analysis infrastructure for machine learning, comprising a machine learning system (1), characterized in that the machine learning system (1) comprises:
a data input module (11) for inputting or receiving a data set;
a machine learning algorithm library (3) for a plurality of machine learning algorithms tested using a common interface;
the screening unit (13) is used for carrying out classification screening, data deviation screening and screening of defects existing in the data set on the input data set;
a test module (4) for evaluating and training the performance result of each machine learning model (41);
and a statistic module (5) for comparing and counting the performance results of all the machine learning models (41).
2. The advanced analysis infrastructure for machine learning according to claim 1, characterized in that the machine learning system (1) further comprises a processing unit (12) for processing the input data set and the output data set.
3. The advanced analysis infrastructure for machine learning according to claim 1, characterized in that the machine learning system (1) further comprises a presentation module (6), the presentation module (6) being configured to present the performance results counted by the statistics module (5) and the compared performance results.
4. The advanced analysis infrastructure for machine learning according to claim 1, characterized by the machine learning system (1) further comprising a storage unit (14) for storing the input data set.
5. The advanced analysis infrastructure for machine learning according to claim 1, wherein the screening unit (13) comprises a classification screening unit (131), a data screening unit (132) and a defect screening unit (133), the classification screening unit (131) performs classification screening on the processed data sets of the processing unit (12), the data screening unit (132) performs data deviation screening on the processed data sets of the processing unit (12), and the defect screening unit (133) screens the processed data sets of the processing unit (12) for defects in the continuous data.
6. The advanced analysis infrastructure for machine learning according to claim 1, characterized in that the data input module (11) is divided into a receiving unit (111) and an output unit (112), the receiving unit (111) being adapted to receive a data set and the output unit (112) being adapted to output a processed data set.
7. The advanced analysis infrastructure for machine learning according to claim 1, characterized in that the machine learning system (1) further comprises a data preprocessor (2), the data preprocessor (2) is configured to prepare the data processed by the screening unit (13) into a data set, and to process the prepared data set by the testing module (4).
CN201911233132.8A 2019-12-05 2019-12-05 Advanced analysis infrastructure for machine learning Pending CN110969263A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911233132.8A CN110969263A (en) 2019-12-05 2019-12-05 Advanced analysis infrastructure for machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911233132.8A CN110969263A (en) 2019-12-05 2019-12-05 Advanced analysis infrastructure for machine learning

Publications (1)

Publication Number Publication Date
CN110969263A true CN110969263A (en) 2020-04-07

Family

ID=70033124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911233132.8A Pending CN110969263A (en) 2019-12-05 2019-12-05 Advanced analysis infrastructure for machine learning

Country Status (1)

Country Link
CN (1) CN110969263A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779011A (en) * 2021-09-16 2021-12-10 平安科技(深圳)有限公司 Data restoration method and device based on machine learning and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779011A (en) * 2021-09-16 2021-12-10 平安科技(深圳)有限公司 Data restoration method and device based on machine learning and computer equipment
CN113779011B (en) * 2021-09-16 2023-06-02 平安科技(深圳)有限公司 Data restoration method and device based on machine learning and computer equipment

Similar Documents

Publication Publication Date Title
CN111724211A (en) Offline store commodity sales prediction method, device and equipment
CN110009171B (en) User behavior simulation method, device, equipment and computer readable storage medium
TWI706318B (en) Solder paste printing quality detecting method, data processing device and computer storage medium
CN111368089A (en) Service processing method and device based on knowledge graph
CN108764047A (en) Group's emotion-directed behavior analysis method and device, electronic equipment, medium, product
Paksoy et al. Information fusion with dempster-shafer evidence theory for software defect prediction
US20200019855A1 (en) Data analysis device, data analysis method and data analysis program
CN113010389A (en) Training method, fault prediction method, related device and equipment
CN105405221A (en) Method and device for automated test
CN112184667A (en) Defect detection and repair method, device and storage medium
CN112363911A (en) Software test defect analysis method and device
WO2022005586A1 (en) Managing defects in a model training pipeline using synthetic data sets associated with defect types
CN110244185A (en) A kind of multi-source harmonic contributions division methods, terminal device and storage medium
CN112559316A (en) Software testing method and device, computer storage medium and server
CN110969263A (en) Advanced analysis infrastructure for machine learning
CN114881996A (en) Defect detection method and device
CN111367782A (en) Method and device for automatically generating regression test data
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN109470954B (en) Power grid running state monitoring system based on big data and monitoring method thereof
CN116775741A (en) Auditing method and related device for completion resolution of engineering
CN110928942A (en) Index data monitoring and management method and device
CN111277427A (en) Data center network equipment inspection method and system
CN115576834A (en) Software test multiplexing method, system, terminal and medium for supporting fault recovery
CN116155541A (en) Automatic machine learning platform and method for network security application
CN113850773A (en) Detection method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200407

WD01 Invention patent application deemed withdrawn after publication