CN112783478A

CN112783478A - Software design method based on automatic machine learning

Info

Publication number: CN112783478A
Application number: CN202110191036.2A
Authority: CN
Inventors: 柏战华; 胡静远
Original assignee: Hefei Haisai Information Technology Co ltd
Current assignee: Hefei Haisai Information Technology Co ltd
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2021-05-11

Abstract

The invention discloses a software design method based on automatic machine learning, which comprises the following steps: the method comprises the steps that process configuration data set by a user and used for representing at least part of a machine learning modeling process are obtained through a data obtaining module, and the obtained process configuration data are compared with original data through a data interaction module; screening the acquired data by a data screening module, storing process configuration data superior to the original data into a data storage module, and further analyzing the screened process configuration data by a data analysis module; and operating the execution terminal by using the instruction module to perform operation training, comparing the obtained training data by using the data interaction module again, outputting the optimal value of the operation parameter by using the output module, further obtaining the optimal operation data parameter of the automatic machine, and automatically finishing learning by the machine. According to the process configuration data of the machine learning modeling process, the machine is assisted to train and learn autonomously, and the machine perfecting efficiency is improved.

Description

Software design method based on automatic machine learning

Technical Field

The invention relates to the field of software design, in particular to a software design method based on automatic machine learning.

Background

Automated machine learning is an end-to-end automated process that applies machine learning to real-world problems. In a typical machine learning application, an engineer trains a data set consisting of input data points. It is possible that not all algorithms may be adapted to the form of the raw data itself, out of box. An expert in machine learning may have to apply appropriate data preprocessing, feature engineering, feature extraction, and feature selection methods to adapt the data set for machine learning. Since many of these steps tend to go beyond non-expert capabilities, automated machine learning is proposed as an artificial intelligence based solution to address the growing challenge of how to apply machine learning. Automating the application process of end-to-end machine learning provides some advantages for this: simpler solutions are produced, these solutions are created faster, and models are generally better than hand-designed. However, automated machine learning is not a prodigious drug, it may introduce its own additional parameters, called hyper-parameters, which may require some expertise to set up from rows.

When talking about performing machine learning projects in an organization, data scientists, project managers and business executives need to work together to deploy the best model to meet specific business goals. The central goal of this step is to identify key business variables that need to be predicted in the analysis. We consider these variables as targets of the model and then use the metrics associated with them to ensure the success of the project. With the popularity of deep learning, engineers need to select corresponding neural network architectures, training processes, regularization methods, hyper-parameters, and the like, all of which have a great influence on the performance of the algorithm. The deep learning engineer is then also called a key engineer. The goal of automatic machine learning is to use an automated data-driven approach to make the above-mentioned decisions. As long as the user provides data, the automatic machine learning system automatically determines the optimal scheme, and domain experts do not need to worry about learning various machine learning algorithms any more.

In the prior art, many automatic learning machines cannot be well applied, mainly because the training cost is high, a large amount of training and data input are needed to achieve a satisfactory degree, and many problems that enough data cannot be found and a large amount of money and time are needed to capture original data are needed. On the other hand, since knowledge is extracted from data, machines cannot directly learn knowledge and are less good at solving some specific problems.

Disclosure of Invention

The invention overcomes the defects of the prior art and provides a software design method based on automatic machine learning.

In order to achieve the purpose, the invention adopts the technical scheme that: a software design method based on automatic machine learning comprises the following steps:

step S1: the method comprises the steps of obtaining process configuration data set by a user and used for representing at least part of a machine learning modeling process through a data obtaining module, and comparing the obtained process configuration data with original data through a data interaction module.

Step S2: the data screening module screens the acquired data, the process configuration data superior to the original data are stored in the data storage module, and the data analysis module further analyzes the screened process configuration data.

Step S3: and operating the execution terminal by using the instruction module to perform operation training, comparing the obtained training data by using the data interaction module again, outputting the optimal value of the operation parameter by using the output module, further obtaining the optimal operation data parameter of the automatic machine, and finishing learning by using the automatic machine.

In a preferred embodiment of the present invention, one or more sets of value combinations of the operating parameters are determined based on the process configuration data and the raw data.

In a preferred embodiment of the present invention, the value combination includes: a combination of three of the action data, the method data, and the final target data.

In a preferred embodiment of the present invention, before the step S1, standard data values of the action data, the method data and the final target data are set, and the standard data values are used for assisting the determination module to perform data comparison.

In a preferred embodiment of the present invention, in the step S1, the obtained data is classified by a determining module, and the action similarity is calculated according to the action data attribute; calculating method similarity according to the method data attribute; and calculating the final target similarity according to the final target data attribute.

In a preferred embodiment of the present invention, the optimal operation data parameters obtained after the comparison by the data interaction module in step S3 are used as the raw data.

In a preferred embodiment of the present invention, the method comprises: software unit and executive terminal, the software unit includes: the device comprises a data acquisition module, a data screening module, a data storage module, an analysis module, an instruction module, a judgment module, a data conversion module, a data interaction module and an output module. The execution terminal is used for executing the instruction module to send out an instruction and carrying out actual operation training; the execution terminal also comprises a data uploading interface, and the data uploading interface is used for receiving the process configuration data provided by the user.

In a preferred embodiment of the present invention, the software unit further includes: and the data encryption sub-module is used for encrypting the sent data information.

In a preferred embodiment of the present invention, the data conversion module is configured to convert the collected data information and input the converted data information into the data storage module in a unified data format.

In a preferred embodiment of the present invention, the data compression sub-module is configured to compress data in the data storage module, so as to facilitate the data storage module to store the data.

The invention solves the defects in the background technology, and has the following beneficial effects:

(1) according to the method, the data are configured according to the process of the machine learning modeling process, the machine is assisted to train and learn autonomously, a large number of engineers are not needed to debug parameters manually, so that a large amount of labor cost is saved, meanwhile, the machine is used for realizing continuous improvement and adjustment of the data, the automatic learning speed of the machine can be further accelerated, and the improvement efficiency of the machine is improved. On the other hand, the output module compares a plurality of execution results, and selects the optimal operation data parameter, so that the machine learning is more perfect.

(2) According to the invention, various data are combined, and the acquired data are subjected to multi-dimensional reference, so that the machine learning can be more comprehensive, the obtained execution result is more reliable, and meanwhile, three data in action data, method data and final target data are combined in the value combination, so that more specific machine operation parameters can be obtained, and the machine learning target is further ensured to be more clear and systematic.

(3) According to the invention, the original data and the acquired process configuration data are compared by reference through the data interaction module, and the optimal operation data parameters are selected through multiple screening, so that manual upgrading and maintenance are not required by engineering personnel and experts, and the improvement can be realized in a semi-automatic mode; meanwhile, the optimal operation data parameters obtained after comparison by the data interaction module are used as original data, so that the self behavior parameters can be continuously strengthened in the machine learning process, the intelligent operation is gradually approached, and operators are replaced more completely.

(4) According to the invention, the data compression submodule compresses the information acquired by the image acquisition module, so that the data storage module can store data more quickly, and the storage capacity of the data storage module can be further increased.

(5) According to the invention, data in various formats can be converted into a unified form through the data conversion module, so that the data analysis module can analyze the data more conveniently, and further the training data obtained by the machine can be completely stored in the data storage module, and further the machine learning is more comprehensive.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a distribution block diagram of each unit module in the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

As shown in fig. 1, a software design method based on automatic machine learning includes the following steps:

step S1: the method comprises the steps that process configuration data set by a user and used for representing at least part of a machine learning modeling process are obtained through a data obtaining module, and the obtained process configuration data are compared with original data through a data interaction module;

step S2: screening the acquired data by a data screening module, storing process configuration data superior to the original data into a data storage module, and further analyzing the screened process configuration data by a data analysis module;

In a preferred embodiment of the present invention, the machine apparatus mainly comprises: software element and executive terminal, the software element includes: the device comprises a data acquisition module, a data screening module, a data storage module, an analysis module, an instruction module, a judgment module, a data conversion module, a data interaction module and an output module; the execution terminal is used for executing the instruction module to send out an instruction and carrying out actual operation training; the execution terminal also comprises a data uploading interface which is used for receiving the process configuration data provided by the user.

In a preferred embodiment of the present invention, the data conversion module is configured to convert the collected data information and input the converted data information into the data storage module in a unified data format. Through the data conversion module, can convert the data of various formats into unified form, more make things convenient for data analysis module to carry out the analysis to data, and then ensure that the training data that the machine obtained can be saved to the data storage module completely in, and then make the machine study more comprehensive.

In a preferred embodiment of the present invention, the combination of values comprises: combining three data in the action data, the method data and the final target data, and calculating action similarity according to the action data attribute; calculating method similarity according to the method data attribute; and calculating the final target similarity according to the final target data attribute. By combining various data and carrying out multi-dimensional reference on the acquired data, the machine learning can be ensured to be more comprehensive, the obtained execution result is more reliable, meanwhile, the three data in the action data, the method data and the final target data are combined in the value combination, more specific machine operation parameters can be obtained, and the machine learning target is further ensured to be more definite and systematic.

In a preferred embodiment of the invention, the machine is assisted to autonomously train and learn according to the process configuration data of the machine learning modeling process without manual parameter debugging by a large number of engineers, so that a large amount of labor cost is saved, meanwhile, the machine is utilized to continuously perfect and adjust the data, the automatic learning speed of the machine can be further accelerated, and the perfecting efficiency of the machine is improved. On the other hand, the output module compares a plurality of execution results, and selects the optimal operation data parameter, so that the machine learning is more perfect.

In a preferred embodiment of the present invention, the optimal operation data parameters obtained after the comparison by the data interaction module in step S3 are used as the raw data. The original data and the acquired process configuration data are compared through a data interaction module, and the optimal operation data parameters are selected through multiple screening, so that manual upgrading and maintenance are not required by engineering personnel and experts, and the process configuration data can be perfected in a semi-automatic mode; meanwhile, the optimal operation data parameters obtained after comparison by the data interaction module are used as original data, so that the self behavior parameters can be continuously strengthened in the machine learning process, the intelligent operation is gradually approached, and operators are replaced more completely.

In a preferred embodiment of the present invention, the standard data values of the action data, the method data and the final target data are set before step S1, and the standard data values can be used to assist the determination module in data comparison, so as to improve the machine.

In a preferred embodiment of the present invention, the data compression sub-module is configured to compress the data in the data storage module, so as to facilitate the data storage module to store the data. The data compression submodule compresses the information collected by the image collection module, so that the data storage module can store data more quickly, and the storage capacity of the data storage module can be further increased.

In a preferred embodiment of the present invention, the software unit further comprises: and the data encryption sub-module is used for encrypting the sent data information. Data information is encrypted through the data encryption submodule, transmission error of various data is prevented, accuracy of transmission among the data is guaranteed, and meanwhile automatic machine learning is more efficient.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A software design method based on automatic machine learning is characterized by comprising the following steps:

2. The software design method based on automatic machine learning according to claim 1, characterized in that: and determining one or more groups of value combinations of the operating parameters according to the process configuration data and the original data.

3. The software design method based on automatic machine learning according to claim 1, characterized in that: the value combination comprises the following steps: a combination of three of the action data, the method data, and the final target data.

4. The software design method based on automatic machine learning according to claim 3, characterized in that: before the step S1, standard data values of the motion data, the method data, and the final target data are set, and the standard data values are used for assisting a determination module in data comparison.

5. The software design method based on automatic machine learning according to claim 3, characterized in that: the acquired data is classified by the judgment module in the step S1,

calculating action similarity according to the action data attribute;

calculating method similarity according to the method data attribute;

and calculating the final target similarity according to the final target data attribute.

6. The software design method based on automatic machine learning according to claim 1, characterized in that: and using the optimal operation data parameters obtained after the comparison by the data interaction module in the step S3 as raw data.

7. A machine apparatus of the automatic machine learning based software design method according to claim 1, comprising: a software unit and an execution terminal, characterized in that,

the software unit includes: the device comprises a data acquisition module, a data screening module, a data storage module, an analysis module, an instruction module, a judgment module, a data conversion module, a data interaction module and an output module;

the execution terminal is used for executing the instruction module to send out an instruction and carrying out actual operation training; the execution terminal also comprises a data uploading interface, and the data uploading interface is used for receiving the process configuration data provided by the user.

8. The machine equipment of the software design method based on automatic machine learning according to claim 7, characterized in that: the software unit further comprises: and the data encryption sub-module is used for encrypting the sent data information.

9. The machine equipment of the software design method based on automatic machine learning according to claim 7, characterized in that: the data conversion module is used for converting the acquired data information and inputting the data information into the data storage module in a unified data form.

10. The machine equipment of the software design method based on automatic machine learning according to claim 7, characterized in that: the data compression submodule is used for compressing the data in the data storage module, and the data storage module can store the data conveniently.