CN114610272A

CN114610272A - AI model generation method, electronic device, and storage medium

Info

Publication number: CN114610272A
Application number: CN202011341131.8A
Authority: CN
Inventors: 蒋阳; 豆泽阳; 庞磊; 赵丛
Original assignee: Gongdadi Innovation Technology Shenzhen Co ltd
Current assignee: Gongdadi Innovation Technology Shenzhen Co ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2022-06-10

Abstract

The application provides an AI model generation method, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a service requirement corresponding to a target AI model; determining an initial model and a model generation mode corresponding to the target AI model according to the service requirement, wherein the model generation mode represents a way for training the initial model; training the initial model with training data based on the model generation pattern to generate the target AI model. According to the method, the model and the hyper-parameter configuration corresponding to the business requirement can be determined through different modes, and the AI model generation method based on the modes can realize systematic and low-threshold cost reduction and efficiency improvement and promote large-scale AI landing.

Description

AI model generation method, electronic device, and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an AI model generation method, an electronic device, and a storage medium.

Background

Artificial Intelligence (AI) is a technical science for researching and developing theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence, and in recent years, people's demand for the AI has increased, and the AI technique has been widely applied to various scenes, such as classification, detection, voice recognition, and the like.

The AI model that needs to be applied to under the different scenes is different, and the data of difference need be applied to in the design of AI model and training to in the final design process of AI model, need to debug data, network structure, super parameter etc. repeatedly to demands such as different data, scenes, waste time and energy, and the professional technical staff that needs to correspond participates in, can realize, is unfavorable for the low threshold of AI to fall to the ground.

Disclosure of Invention

The application provides an AI model generation method, electronic equipment and a storage medium, aiming at realizing low-threshold landing of an AI model.

In a first aspect, an embodiment of the present application provides an AI model generation method, including:

acquiring a service requirement corresponding to a target AI model;

determining an initial model and a model generation mode corresponding to the target AI model according to the service requirement, wherein the model generation mode represents a path for training the initial model;

training the initial model with training data based on the model generation pattern to generate the target AI model.

In a second aspect, an embodiment of the present application provides an electronic device, including a memory and a processor;

the memory is used for storing a computer program;

the processor is configured to execute the computer program and, when executing the computer program, implement the AI model generation method described above.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program causes the processor to implement the AI model generation method described above.

The embodiment of the application provides an AI model generation method, electronic equipment and a storage medium, and the method comprises the steps of obtaining a service requirement corresponding to a target AI model; determining an initial model and a model generation mode corresponding to the target AI model according to the service requirement, wherein the model generation mode represents a way for training the initial model; training the initial model with training data based on the model generation pattern to generate the target AI model. The AI model generation method provided by the application can determine the model and the hyper-parameter configuration corresponding to the service requirement through different modes, different modes have different characteristics, the different modes are mutually coupled, the systematic and low-threshold cost reduction and efficiency improvement can be realized by the AI model generation method based on the modes, and the large-scale falling of the AI is promoted.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an AI model generation method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an application scenario of an AI model generation method;

fig. 3 is a schematic flowchart of a model generation mode step of an AI model generation method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a directed acyclic graph of an AutoML system;

fig. 5 is a schematic flowchart of a basic database construction flow of an AI model generation method according to an embodiment of the present application;

6A-6C are schematic diagrams of neural network structure searches provided by embodiments of the present application;

FIG. 7 is a schematic diagram of a directed acyclic graph of a prefabricated layer of an AutoML system;

FIG. 8 is a schematic diagram of a directed acyclic graph of a custom layer of an AutoML system in one embodiment;

FIG. 9 is a schematic diagram of a directed acyclic graph of a custom layer of an AutoML system in another embodiment;

FIG. 10 is a schematic diagram of the underlying framework of an AutoML system in one embodiment;

fig. 11 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

It should be noted that the descriptions in this application referring to "first", "second", etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Therefore, how to realize the automatic generation of the AI model is a popular topic that is being researched by those skilled in the art.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an AI model generation method according to an embodiment of the present disclosure.

The AI model generation method can be applied to electronic equipment, such as terminal equipment, a server or a cloud server, and is used for generating AI models and other processes; the terminal equipment can be electronic equipment such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and wearable equipment; the servers may be independent servers or server clusters.

Illustratively, as shown in fig. 2, a scene diagram of the AI model generation method when applied to the server is shown. The server can acquire the service requirement from the terminal equipment, execute the AI model generation method to generate the target AI model, and send the generated target AI model to the terminal equipment so that the terminal equipment can perform operations such as model test or deployment.

As shown in fig. 1, the AI model generation method of the embodiment of the present application includes steps S11 through S13.

And step S11, acquiring the service requirement corresponding to the target AI model.

Illustratively, a requirement acquisition interface is displayed on a display device of the terminal device, and a user can input a service requirement on the corresponding interface through an input device in communication connection with the terminal device. It is understood that the target AI model represents the AI model desired by the user, and the input devices include, but are not limited to, a keyboard, a mouse, and a touch screen.

In some embodiments, the business requirements include at least one of: task type, terminal type, application scenario, computational demand.

The task type represents a practical scene of a target AI model required by a user, such as tasks required to be processed by the target AI model, such as classification, detection, video, natural language processing, and the like. The terminal type represents a deployment environment of the target AI model, such as a model of a terminal that deploys the target AI model, a processor type, a model of the terminal, and the like. For example, the processor type of the terminal may include a CPU (Central Processing Unit) and/or a GPU (graphics Processing Unit). The application scenario may include at least one of: small sample detection, small object detection, unbalanced sample detection, and the like.

Step S12, determining an initial model and a model generation mode corresponding to the target AI model according to the service requirement, wherein the model generation mode represents a path for training the initial model.

Exemplarily, at least one of an initial model, a task type, a terminal type and an application scene to be adapted to the AI model is different according to different service requirements, and meanwhile, time and model recognition accuracy required for training the AI model may also be different, and the initial model, the training time of the initial model and the model recognition accuracy required for training the AI model correspond to the computational power of the terminal, so that the initial model and the model generation mode corresponding to the target AI model determined according to the service requirements can meet the customized requirements of different users.

Different model generation modes represent ways for training the initial model, that is, the AI models generate different processes, that is, the AI models generate different speeds, the target AI model generates different computational power consumption, and/or the target AI model has different scales and accuracies.

And determining an initial model and a model generation mode corresponding to the target AI model according to the corresponding service requirements, and training the corresponding initial model by using the corresponding model generation mode so as to obtain the corresponding target AI model, thereby realizing the customized service provided for the corresponding client.

Step S13, training the initial model with training data based on the model generation pattern to generate the target AI model.

And acquiring corresponding training data corresponding to different generation modes, and training a corresponding initial model by using the acquired training data so as to acquire a target AI model corresponding to the customer service requirement.

Referring to fig. 3, in some embodiments, the model generation mode includes at least one of a first generation mode, a second generation mode, and a third generation mode, and step S13 specifically includes steps S131 to S134.

Step S131, if it is determined that the model generation mode is the first generation mode, determining a corresponding initial model in a basic database according to the service requirement, and determining target hyper-parameter configuration information corresponding to the initial model. In some embodiments, the AI model generation method may be used in an AutoML system, which may be deployed in a server, a cloud server, or a terminal device.

Illustratively, the AutoML system may include a base layer, a pre-fabricated layer, and a custom layer. The base layer is used for generating a model in a first generation mode, the prefabricated layer is used for generating the model in a second generation mode, and the customized layer is used for generating the model in a third generation mode.

The model generation speed of the first generation mode is faster than that of the second generation mode, and the model generation speed of the second generation mode is faster than that of the third generation mode; or the first generation mode generates a model with less accuracy than the second generation mode, which generates a model with less accuracy than the third generation mode. That is, the first generation mode may be referred to as a base generation mode, the second generation mode may be referred to as a pre-production generation mode, and the first generation mode may be referred to as a custom generation mode.

The model generation mode may be determined by displaying a plurality of generation modes on a display device of the terminal device and determining the model generation mode according to a mode selection operation of a user. Or determining the model generation mode according to the acquired service requirement.

In some embodiments, the basic database of the AutoML system includes different business requirements, such as standard data modules for different tasks (classification, detection, video processing, natural language processing, etc.), model selection modules, hyper-parameter configuration modules, etc., which can be understood as model warehouse modules for the models.

Taking the classification task as an example, the standard data module may include a plurality of data expansion units for executing data expansion algorithms, such as flipping, clipping, rotating, etc.; the model selection module may include mainstream neural network model architectures including, but not limited to, convolutional networks, residual networks. Such as VGG (visual Geometry group), ResNet (Residual network), DenseNet (Densey Connected volumetric networks), inclusion, etc.; the hyper-parameter configuration may include a variety of hyper-parameters including, but not limited to, weight decay (weight decay), learning rate, Dropout, and the like.

For example, the initial model may be determined to be a VGG architecture according to the service requirement, and the configuration information of the at least one hyper-parameter, that is, the target hyper-parameter configuration information, may be determined.

In some embodiments, the base layer of the AutoML system may further include a training module for training the model according to the determined training strategy. The training module may contain various training strategy settings including, but not limited to, gradient descent algorithms, reinforcement learning algorithms, such as gradient descent algorithms including SGD (stochastic gradient descent), Adam (adaptive moment estimation), and the like.

In some embodiments, the base layer of the AutoML system may further include a deployment module, where the deployment module is configured to convert the trained model into a scene task usable by the cloud or the chip, package the model, and deploy the model to a corresponding device.

In some embodiments, the base layer of the AutoML system includes a Directed Acyclic Graph (DAG) that describes an automation workflow of the base layer of the AutoML system, implementing architectural encapsulation of the automation workflow.

Illustratively, as shown in fig. 4, a schematic diagram of a directed acyclic graph of a base layer of the AutoML system is shown. Wherein the solid lines represent workflows that have been activated and the dashed lines represent workflows that have not been activated, the workflows represented by the solid lines are executed while the model is being generated in the first generation mode.

In some embodiments, the directed acyclic graph includes an identification of a plurality of AI models and an identification of a plurality of hyper-parameter configuration information in the base database.

For example, VGG, ResNet, densneet, inclusion in fig. 4 are currently mainstream AI models, and may be referred to as name identifiers of the neural network model architecture. Dropout, weights detail, and learning rate in fig. 4 are optional hyper-parameters, i.e., identifications of hyper-parameter configuration information.

In some embodiments, in step S131, determining a corresponding initial model in the basic database according to the service requirement, and determining target hyper-parameter configuration information corresponding to the initial model specifically includes: and determining the identifier of at least one AI model as the identifier of the initial model and determining the identifier of at least one hyper-parameter configuration information as the identifier of the target hyper-parameter configuration information in the directed acyclic graph according to the service requirement.

For example, as shown in fig. 4, the VGG architecture therein is determined as an initial model; dropout, weights default, and learning rate are hyper-parameters.

In some embodiments, a training policy may also be determined in a directed acyclic graph of an AutoML system, such as determining the training policy to be SGD.

As shown in fig. 4, if it is determined that the model generation mode is the first generation mode, determining a corresponding initial model in a basic database according to the service requirement, and determining target hyper-parameter configuration information corresponding to the initial model, including: data expansion is carried out on training data through turning, cutting and rotating, a determined initial model is trained according to the expanded data and determined target hyper-parameter configuration information, a training strategy of SGD is used during training, a target AI model is obtained after training is finished, and then the target AI model can be deployed. Therefore, the one-key automatic training model can be realized based on the AutoML system.

Specifically, the directed acyclic graph may be displayed in an interface form of a configuration table, and the configuration table may interface with a software front end, a web front end, or the like, and may also interface with a command line for operation.

As shown in fig. 3, in some embodiments, before determining the corresponding initial model in the base database according to the business requirements, the method further comprises:

step S130: and constructing a basic database, wherein the basic database stores the reference model.

Illustratively, one or more reference models are stored in the basic database, wherein the reference models are obtained through preliminary pre-training, and the model performance is better than that of the AI model of the initially constructed model. The reference model functions include, but are not limited to, classification, detection, video processing, natural language processing.

Step S130 may be before either of steps S11 and S12 or before step S131, and in this embodiment, step S130 is after S12 and before step S131, but is not limited to step S130 being only after S12 and before step S131.

As shown in fig. 5, in some embodiments, step S130 includes steps S1301-S1305.

Wherein, step S1301: and acquiring a corresponding candidate network structure.

When the basic database is constructed, if the performance of the reference model in the basic database is not good, the time cost required for adjusting and updating the model parameters based on the reference model is increased, and therefore, the performance of the reference model in the database is required to be better than that of the corresponding preset neural network model when the basic database is constructed.

Therefore, a corresponding preset neural network model is determined, and the preset neural network model may be a model obtained by searching in a network or a corresponding model obtained through pre-training, which is not limited herein.

And determining an initial architecture of the corresponding network model through the preset neural network model, such as node information of the preset neural network model. And then, the initial network architecture is configured through network structure search, so that a corresponding candidate network structure is obtained, and when the candidate network structure is superior to a network structure corresponding to a preset neural network model, a model corresponding to the candidate network structure is used as a reference model to construct the basic database.

In some embodiments, obtaining the corresponding candidate network structure comprises:

determining a preset neural network model to be subjected to network structure search;

determining a search space to be subjected to network structure search according to the preset neural network model,

and acquiring a corresponding candidate network structure from the search space.

Exemplarily, a preset neural network model for network structure search is determined according to the application requirements of the AI model, and a corresponding initial network architecture and a search space for network structure search are determined according to the preset neural network model. The search space, which may be a pre-constructed search space containing various network fabric elements, defines the scope of the network fabric search. The network structure unit may be a basic unit for constructing a neural network model, and specifically may be a single network layer, such as a single convolutional layer or a full connection layer; or may be a structural unit formed by combining a plurality of network layers, such as a block structure (block) formed by combining a convolutional layer, a Batch Normalization layer (Batch Normalization), and a nonlinear layer (e.g., Relu), but not limited thereto.

And each network structure unit is correspondingly provided with a code, and the corresponding network structure unit is searched from the search space through the corresponding code. The corresponding candidate network structure can be searched out from the search space by inputting the corresponding code sequence.

As shown in fig. 6A, knowing the preset neural network model according to the preset neural network model determines that the corresponding initial network architecture includes 4 nodes (nodes), which are

nodes

0, 1, 2, and 3, respectively, but the operation between the nodes is unknown, i.e. as the question mark "? "means.

The nodes in the neural network model may be understood as feature layers in the neural network model. For example, in fig. 6A, the predetermined neural network model includes an input feature layer, two intermediate feature layers, and an output feature layer. Where node 0 represents the input feature layer,

nodes

1 and 2 represent the intermediate feature layers, and node 3 represents the output feature layer. It should be understood that node 0 includes feature data (feature vector or feature matrix, and the like as follows) on the input feature level, node 1 includes feature data on the first intermediate feature level, node 2 includes feature data on the second intermediate feature level, and node 3 includes feature data on the output feature level. An operation between two nodes refers to an operation required for feature data exchange on one node to feature data on the other node. The operations mentioned in this embodiment may be convolution operations, pooling operations, or other neural network operations such as fully-connected operations. Operations between two nodes can be considered to constitute an operational layer between the two nodes. Typically, there are multiple operations available for searching, i.e., there are multiple candidate operations, at the operational level between two nodes. The purpose of the network structure search is to determine an operation at each operational level.

The network structure search may determine operations between

nodes

0, 1, 2, and 3 from the search space, with different combinations of operations between

nodes

0, 1, 2, and 3 corresponding to different network structures. Therefore, the corresponding candidate network structure can be obtained through the network structure search.

In some embodiments, said obtaining the corresponding candidate network structure from the search space comprises:

and searching the network structure of the preset neural network model in the search space by utilizing an optimization algorithm based on gradient information to obtain a corresponding candidate network structure.

As shown in fig. 6B, the search space defines, illustratively, a variety of operations on the operational level between each two nodes in the neural network model. The search space defines 3 operations for each operation layer, with different dotted lines representing operation 1, operation 2 and operation 3, respectively. Such as operation 1 being a convolution operation, operation 2 being a pooling operation, and operation 3 being a full join operation. For an operation layer of the neural network, the purpose of the network structure search is to select one operation from 3 operations as the operation of the operation layer.

And performing network structure search on the neural network model in a search space by using an optimization algorithm based on gradient information to configure structure parameters for various operations on each operation layer of the neural network model and obtain optimized structure parameters, thereby determining a corresponding candidate network structure according to the optimized structure parameters. As shown in fig. 6C, a model architecture corresponding to the final neural network model is obtained as a candidate network structure through network structure search.

and searching the network structure of the preset neural network model in the search space by using an optimization algorithm based on reinforcement learning so as to obtain a corresponding candidate network structure.

Illustratively, the search space defines a plurality of operations on an operation layer between every two nodes in the neural network model, and the neural network model is subjected to network structure search in the search space by using an optimization algorithm based on reinforcement learning to configure structure parameters for the plurality of operations on each operation layer of the neural network model and obtain optimized structure parameters, so that a corresponding candidate network structure is determined according to the optimized structure parameters.

In some embodiments, said obtaining corresponding candidate network structures from said search space comprises:

and searching the network structure of the preset neural network model in the search space by using an optimization algorithm based on reinforcement learning and an optimization algorithm based on gradient information to obtain a corresponding candidate network structure.

Illustratively, the search space defines a plurality of operations on an operation layer between every two nodes in the neural network model, and the optimization algorithm based on reinforcement learning and the optimization algorithm based on gradient information are utilized, performing a network structure search on the neural network model in a search space to configure structural parameters for a plurality of operations on each operational layer of the neural network model and obtain optimized structural parameters, thereby determining a corresponding candidate network structure according to the optimized structure parameters, performing network structure search on the neural network model in a search space through at least two optimization algorithms based on the optimization algorithm of reinforcement learning and the optimization algorithm based on gradient information, therefore, the corresponding candidate network structure can be obtained more efficiently, and the probability that the performance of the obtained candidate network model corresponding to the corresponding candidate network structure is superior to that of the preset neural network model is higher.

Step S1302: and training the candidate network structure, and acquiring the performance information of the candidate neural network model corresponding to the trained candidate network structure.

According to tasks needing to be processed by the target AI model, such as classification, detection, video, natural language processing and the like, a training data set and a test data set of the candidate neural network model are determined, the candidate neural network model corresponding to the candidate network structure is trained by using the corresponding training data set, and the candidate neural network model is tested by using the corresponding test data set.

In some embodiments, training the candidate network structure and obtaining performance information of a candidate neural network model corresponding to the trained candidate network structure includes:

determining a training dataset and a testing dataset for the candidate neural network model;

training the candidate network structure according to the training data set to obtain a candidate neural network model corresponding to the trained candidate network structure;

and testing the candidate neural network model according to the test data set to obtain the performance information of the candidate neural network model.

Illustratively, when the task to be processed by the target AI model is cat recognition, the training data set of the candidate neural network model is determined as a cat recognition training image, and the test data set is a cat recognition test image. And training the candidate neural network model corresponding to the candidate network structure by using the corresponding training data set, and testing the candidate neural network model by using the corresponding testing data set, thereby obtaining the performance information of the candidate neural network model, wherein the performance information comprises but is not limited to the identification accuracy.

Step S1303: and judging whether the candidate neural network model is superior to the preset neural network model or not according to the performance information.

And when the target AI model is an object classification model, if the classification and identification accuracy of the candidate neural network model is higher than that of the preset neural network model, the model performance of the candidate neural network model is superior to that of the preset neural network model.

Step S1304: and when the candidate neural network model is superior to the preset neural network model, constructing the basic database based on the candidate neural network model.

And when the model performance of the candidate neural network model is superior to the model performance of the preset neural network model, replacing the corresponding preset neural network model by using the candidate neural network model to construct a basic database.

Step S1305: and when the candidate neural network model is next to the preset neural network model, re-executing the step of acquiring the corresponding candidate network structure.

And searching and acquiring a globally optimal candidate network structure through a network architecture, and when the model performance of a candidate neural network model corresponding to the candidate network structure is superior to the model performance of a preset neural network model, replacing the corresponding preset neural network model with the candidate neural network model to construct a basic database, so that the model performance corresponding to the model stored in the basic database is ensured to be superior, and the time cost required by subsequent model training is reduced.

In some embodiments, the building the base database comprises:

acquiring a pre-trained hyper-network model, wherein the hyper-network model comprises a preset number of sub-network models;

determining a plurality of target subnetwork models from a preset number of subnetwork models of the super network model;

acquiring a plurality of mainstream network models trained based on open source data;

splicing each target sub-network model serving as a first trunk network with a first branch network to obtain a plurality of first spliced networks, and splicing each main flow network model serving as a second trunk network with a second branch network to obtain a plurality of second spliced networks;

fine-tuning and testing a plurality of the first and second spliced networks to determine a target network model;

carrying out transfer learning on the target network model to obtain a required candidate neural network model;

judging whether the candidate neural network model is superior to a preset neural network model;

when the candidate neural network model is superior to the preset neural network model, constructing the basic database based on the candidate neural network model;

and when the candidate neural network model is next to the preset neural network model, re-executing the step of acquiring the pre-trained hyper-network model.

Illustratively, the trained super network model includes a preset number of sub-network models, such as 100 or more, but not limited thereto.

And sampling the super network model according to a sampling algorithm, and screening out the sub network model meeting the preset model constraint condition according to the preset model constraint condition to serve as a target sub network model. The sub-network models are continuously collected until preset values are met, for example, M preset values are determined, and M can be set by a user according to the actual situation, for example, M can be set to be equal to 20. Wherein the preset sampling algorithm comprises: at least one of a random sampling algorithm, an Evolutionary algorithm-based sampling algorithm (Evolutionary algorithm), and a Gradient-based sampling algorithm (Gradient-based method).

The main stream network model is obtained based on open source data training, and a trained network model with high use frequency by a user can be obtained from an open source website to serve as the main stream network model. In this embodiment, the acquired multiple mainstream network models have different model complexities. Wherein the model complexity comprises at least one of a model operand and a model parameter. By selecting the mainstream network models with different model complexity, the accuracy of network structure search can be improved, and the accuracy of the model required by the final user can be further improved.

And respectively taking a plurality of target sub-network models determined by the hyper-network model as a first trunk network and splicing with a first branch network to obtain a plurality of first spliced networks, and respectively taking a plurality of main flow network models trained based on open source data as a second trunk network and splicing with a second branch network to obtain a plurality of second spliced networks.

After the first splicing network and the second splicing network are obtained, fine-tuning (fine-tune) is respectively carried out on the first splicing network and the second splicing network, a target network model is determined according to a test evaluation result, and the target network model is used for carrying out transfer learning on the target network model so as to obtain a corresponding candidate neural network model.

After the corresponding candidate neural network model is obtained, the candidate neural network model is tested by using the test data set to obtain performance information of the candidate neural network model, wherein the performance information includes but is not limited to identification accuracy.

Judging whether the current candidate neural network model is superior to a preset neural network model or not according to the performance information, and constructing a basic database based on the candidate neural network model when the candidate neural network model is superior to the preset neural network model; and when the candidate neural network model is next to the preset neural network model, re-executing the step of acquiring the pre-trained hyper-network model.

And when the model performance of the candidate neural network model corresponding to the candidate network structure is superior to that of the preset neural network model, the candidate neural network model is used for replacing the corresponding preset neural network model to construct a basic database, so that the model performance corresponding to the model stored in the basic database is ensured to be superior, and the time cost required by subsequent model training is reduced.

Step S132, if the model generation mode is determined to be the second generation mode, determining a corresponding pre-training model as an initial model according to the business requirement and a preset expert experience logic, and determining target hyper-parameter configuration information.

Specifically, the expert experience logic may include experience logic deposited on scene task data by experts and professional engineers, for example, pre-training (pretrain) models corresponding to different terminals and scenes, hyper-parameter configuration, and the like.

Illustratively, the pre-trained model corresponding to the business requirement may be determined as an initial model according to a preset expert experience logic, and target hyper-parameter configuration information may be determined, for example, a pre-trained mainstream model and suitable hyper-parameters that may fall on the ground on a scene, a pre-trained self-developed manually designed model and suitable hyper-parameters, a pre-trained model searched by AutoML, and suitable hyper-parameters may be obtained.

Illustratively, the requirements of a scenario task may be combated into 3 requirements: terminal model, application scenario, and computational requirements.

Taking the classification scenes running on the CPU and the GPU as an example, if the user selects to use the classification scenes as the CPU and does not have computational power constraint, due to the friendliness of the depthwise-separable operator to the CPU, the pre-trained mobilene model can be started, and fine-tuning (fine-tune) can be performed on scene data. The mobilenet model refers to a lightweight deep neural network model. If the user chooses to use the classified scene as the GPU and no computational power constraint exists, due to the friendliness of the conv33 operator to the GPU, the prefabricated resnet-50 model can be started, and fine adjustment on scene data is achieved. If the user selects a CPU and the requirement of the business requirement model is below 3 million (M) parameter quantity, and the requirement is not met due to the fact that the parameter quantity of the mobilene model is 4.2 million, the expert can be started to pre-train the compressed mobilene model (for example, 2M parameters), and fine adjustment is carried out on the scene.

For example, the expert experience logic of the pre-layer packaging may include various chip models (which may be refined to corresponding models), computational requirements, application scenarios (such as small sample detection, small object detection, unbalanced sample detection, task type), and corresponding pre-training models and target hyper-parameter configuration information.

Exemplarily, as shown in fig. 7, the model 1, the model 2, the model 3, and the model 4 are identifiers of pre-training models, the hyper-parameter configuration table 1, the hyper-parameter configuration table 2, and the hyper-parameter configuration table 3 are identifiers of optional hyper-parameters, that is, hyper-parameter configuration information, and the training strategy 1 and the training strategy 2 are identifiers of optional training strategies.

In some embodiments, the pre-production layer encapsulates the automation workflow based on a directed acyclic graph of the base layer of the AutoML system, in the form shown in fig. 7, which is also directed acyclic graph-based encapsulation. The service logic corresponding to the service requirement and the corresponding automatic training, deployment and encapsulation can be embodied. Wherein, one part of logic encapsulation of the prefabricated layer is secondary encapsulation based on encapsulation of a basic layer of the AutoML system, and the other part is customized research and development result encapsulation of an adaptive scene.

Illustratively, as shown in fig. 7, model 1 is determined as an initial model, and hyper-parameter configuration table 1 is determined as a hyper-parameter.

In some embodiments, a training strategy may also be determined in the directed acyclic graph of the prefabricated layer, such as determining the training strategy to be training strategy 1.

As shown in fig. 7, if it is determined that the model generation mode is the second generation mode, determining, according to the service requirement and a preset expert experience logic, that a corresponding pre-training model is an initial model, and determining target hyper-parameter configuration information includes: determining the model 1 as an initial model according to the terminal model, the application scene and the computing power requirement, training the determined initial model according to the determined target hyper-parameter configuration information, using the training strategy 1 during training, obtaining a target AI model after training is finished, and then deploying the target AI model. Thus, a one-touch automated training model can be implemented based on the pre-fabricated layer.

Step S133, if the model generation mode is determined to be the third generation mode, determining an initial model in a network structure vector space according to the service requirement and the network structure search logic, and determining target hyper-parameter configuration information according to the service requirement and the hyper-parameter search logic.

Illustratively, the network structure vector space may be a union of various network models, and the network model may be described by a vector that may describe at least a width, a depth, an activation function, a loss function, etc. of the network model.

Specifically, when the model is generated in the third generation mode, the degree of freedom is high, and automatic search processes such as network structure, super parameter setting, data processing and the like can be performed from the beginning. The method is a data, scene and power-driven customized AI floor service based on an AutoML algorithm, and the search time is long, but the full customization, such as the generated model, has higher accuracy and is more suitable for a specific scene.

Illustratively, the initial model and the target hyper-parameter configuration information can be searched and determined based on the AutoML algorithm, so that the requirements of low time consumption and customization are met.

Illustratively, as shown in fig. 8, the model can be customized with high degree of freedom, and the model with the highest performance under the condition of meeting 1M parameters is sought with 1M computation amount and network performance as optimization targets.

In some embodiments, the AI model generation method further comprises: determining a data enhancement strategy of the training data according to the service requirement; and enhancing the training data based on a data enhancement processing unit corresponding to the data enhancement strategy in the basic database so as to improve the information content of the training data, and the model obtained by training can be more accurate.

Illustratively, the data enhancement policy includes at least one of: data cleansing, data preprocessing, data expansion, wherein the data expansion can comprise flipping, clipping, rotating, and the like.

For example, if it is determined that the model generation mode is the third generation mode, the following steps may be performed: and performing network structure search, automatic search of a data enhancement strategy and automatic search of a hyper-parameter according to the terminal model, the application scene and the computing power requirement to determine an initial model, the data enhancement strategy and target hyper-parameter configuration information, training the determined initial model, using a training strategy 1 during training, obtaining a target AI model after training is finished, and then deploying the target AI model. Thus, a one-touch automated training model can be implemented based on the custom layer.

In some embodiments, the determining an initial model in a network structure vector space according to the traffic demand and network structure search logic comprises: determining a search initialization state according to the initial model corresponding to the first generation mode or the pre-training model corresponding to the second generation mode; and according to the service requirement and the network structure search logic, carrying out network structure search in a network structure vector space in the search initialization state to obtain an initial model.

In order to efficiently complete the searching process, a basic layer and a prefabricated layer of the AutoML system can be used for initializing a network model structure and super-parameter configuration so as to be used for automatically searching the network structure, the super-parameters, the data and the like. And (3) customized AI floor service driven by data, scenes and computing power constraints based on the AutoML algorithm. The basic layer public model based on the AutoML system is used as a search initialization state to start the AutoML search, or the self-research model based on the prefabricated layer is used as a search initialization state to start the AutoML search.

Illustratively, if a user selects a CPU and a business requirement model is 1M parameter number, but since a 4.2M parameter quantity of a mobilene model does not meet requirements, a compressed pretrain mobilene model prefabricated by experts is 2M parameter, and does not meet requirements, an automatic compression algorithm may be started at a custom layer, the pretrain mobilene model of the prefabricated 2M parameter number is used as an initialization state, automatic network compression is started from 2M to 1M parameter number, and then the compressed prefabricated model is fine-tune on scene data.

For example, as shown in fig. 9, a pre-training model corresponding to the service requirements such as the terminal model, the application scenario, and the computing power requirement and a preset expert experience logic may be determined, for example, the pre-training model 1 is used as an initial model, a search initialization state is determined by the pre-training model, a network structure search is performed in a network structure vector space according to the search initialization state to obtain the initial model, and an automatic search and a hyper-parameter automatic search of a data enhancement policy are performed to determine a data enhancement policy and target hyper-parameter configuration information; and then training the determined initial model, using a training strategy 1 during training, obtaining a target AI model after training is finished, and then deploying the target AI model. Thus, a one-touch automated training model can be implemented based on the custom layer.

As can be appreciated, the functions of the custom layer include: providing an automatic AutoML algorithm based on a basic layer and a prefabricated layer of the AutoML system internally, and realizing the automatic calling of the basic layer and the prefabricated layer of the AutoML system; and providing high-order AutoML development and toolization capability based on the external.

Step S134, configuring the initial model according to the target hyper-parameter configuration information, and training the initial model according to the training data to generate the target AI model.

In some embodiments, if it is determined that the model generation mode is the first generation mode, the initial model may be configured according to a base layer automation workflow of the AutoML system and trained according to training data; if the model generation mode is determined to be the second generation mode, the initial model can be configured according to the automation workflow of the prefabricated layer and trained according to training data; if the model generation mode is determined to be the third generation mode, the initial model can be configured according to the automation workflow of the customization layer and trained according to training data.

In some embodiments, the base layer of the AutoML system may include a training module to train the model according to the determined training strategy. The training module may contain a variety of training strategy settings, such as SGD, Adam, etc. For example, automated workflows of the pre-production layer and custom layer training phases may also be performed by the base layer training module of the AutoML system to simplify the AutoML system.

Illustratively, when the step S132 determines the corresponding pre-training model as the initial model according to the service requirement and the preset expert experience logic, and determines the target hyper-parameter configuration information, the method further includes: storing a pre-training model determined as an initial model in the basic database, and adding the identifier of the pre-training model in the directed acyclic graph; and storing the determined target hyper-parameter configuration information in the basic database, and adding the identifier of the determined target hyper-parameter configuration information in the directed acyclic graph.

It is understood that the initial model, the data enhancement strategy and the target hyper-parameter configuration information determined according to the expert experience logic in step S132 may be automatically pulled into the base layer workflow of the AutoML system to be trained at the base layer of the AutoML system.

In some embodiments, the configuring the initial model according to the target hyper-parameter configuration information and training the initial model according to training data to generate a target AI model in step S134 includes: and determining a workflow path in a directed acyclic graph of a base layer of the AutoML system, and training the initial model according to the training data based on the workflow path to generate a target AI model.

The directed acyclic graph is used to describe an automated workflow of a base layer of the AutoML system, where the workflow path includes an identifier of the initial model, an identifier of the target hyper-parameter configuration information, an identifier of a training policy, and a path between the identifier of the initial model and the identifier of the target hyper-parameter configuration information, and a path between the identifier of the target hyper-parameter configuration information and the identifier of the training policy, and reference may be specifically made to fig. 4 to fig. 10.

Illustratively, when determining an initial model in a network structure vector space according to the service requirement and the network structure search logic and determining target hyper-parameter configuration information according to the service requirement and the hyper-parameter search logic in step S133, the method further includes: storing the determined initial model in the base database, and adding the identification of the initial model in the directed acyclic graph; and storing the determined target hyper-parameter configuration information in the basic database, and adding the identifier of the determined target hyper-parameter configuration information in the directed acyclic graph.

It is understood that the initial model, the data enhancement strategy and the target hyper-parameter configuration information determined by searching in step S133 can be automatically pulled into the workflow of the basic layer of the AutoML system, and the training is performed at the basic layer of the AutoML system.

In some embodiments, the base layer of the AutoML system may include a deployment module, and the deployment module is configured to convert the trained model into a scene task usable by a cloud or a chip, package the model, and deploy the model to a corresponding device.

In some embodiments, the AI model generation method further comprises: deploying the target AI model to a target device and/or publishing the target AI model to a model trading platform.

In some embodiments, as shown in fig. 10, the AutoML system may use, without limitation, tenserflow, pytorch, caffe, mxnet open source frameworks, etc. as the underlying framework.

Illustratively, the output of the bottom layer framework sequentially enters an NNCF framework, an ONNX framework and a TNN framework, and is used for ARM, GPU or CPU chips after iteration.

Specifically, with the above-described mainstream open source framework, for example, tensorflow can seamlessly transit TF-lite and then open numerous chips such as MTK-APU, CPU, etc. The outputs of the Tensorflow, the pytorch, the caffe and the mxnet frames can sequentially enter an NNCF frame, an ONNX frame and a TNN frame and seamlessly reach a plurality of chips such as an ARM, a GPU, a CPU and the like.

Further, the AutoML system may be linked to an AI model transaction platform for providing a SaaS service mode of the AI system, setting an external release button of the model on a web interface of a product, and correspondingly linking the button to the transaction platform to realize external release of the model for automated production.

Specifically, the embodiment provides a SaaS form after the AutoML system is commercialized, where SaaS is software as a service, and is used for publishing a model to the outside after the automatic production is completed, and a trading platform button is set on a web interface and is linked to an AI trading platform.

According to the AI model generation method provided by the embodiment of the application, the initial model is configured according to the target hyper-parameter configuration information, and the initial model is trained according to the training data to generate the target AI model. The AI model generation method provided by the application can determine the model and the hyper-parameter configuration corresponding to the service requirement through different modes, different modes have different characteristics, the different modes are mutually coupled, the systematic and low-threshold cost reduction and efficiency improvement can be realized by the AI model generation method based on the modes, and the large-scale falling of the AI is promoted.

Referring to fig. 11, fig. 11 is a schematic block diagram of an electronic device according to an embodiment of the present application. The electronic device includes but is not limited to a server and a terminal device.

As shown in fig. 11, the electronic device 30 includes a processor 301 and a memory 302, and the processor 301 and the memory 302 are connected by a bus, such as an I2C (Inter-integrated Circuit) bus.

Specifically, the Processor 301 may be a Micro-controller Unit (MCU), a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like.

Specifically, the Memory 302 may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.

The processor is configured to run a computer program stored in the memory, and when executing the computer program, implement any one of the AI model generation methods provided in the embodiments of the present application.

Illustratively, the processor 301 is configured to run a computer program stored in the memory and to implement the following steps when executing the computer program:

acquiring a service requirement corresponding to a target AI model;

In some embodiments, the model generation mode includes at least one of a first generation mode, a second generation mode, and a third generation mode, and the processor 301 trains the initial model with training data based on the model generation mode to generate the target AI model, including:

if the model generation mode is determined to be the first generation mode, determining a corresponding initial model in a basic database according to the service requirement, and determining target hyper-parameter configuration information corresponding to the initial model;

if the model generation mode is determined to be the second generation mode, determining a corresponding pre-training model as an initial model according to the business requirement and a preset expert experience logic, and determining target hyper-parameter configuration information;

if the model generation mode is determined to be the third generation mode, determining an initial model in a network structure vector space according to the service requirement and network structure search logic, and determining target hyper-parameter configuration information according to the service requirement and hyper-parameter search logic;

and configuring the initial model according to the target hyper-parameter configuration information, and training the initial model according to the training data to generate the target AI model.

In some embodiments, before determining the corresponding initial model in the base database according to the business requirements, the processor 301 is further configured to perform:

and constructing a basic database, wherein the basic database stores the reference model.

In some embodiments, the processor 301, when building the base database, comprises:

acquiring a corresponding candidate network structure;

training the candidate network structure, and acquiring performance information of a candidate neural network model corresponding to the trained candidate network structure;

judging whether the candidate neural network model is superior to the preset neural network model or not according to the performance information;

and when the candidate neural network model is next to the preset neural network model, re-executing the step of acquiring the corresponding candidate network structure.

and when the candidate neural network model is next to the preset neural network model, re-executing the step of acquiring the pre-trained hyper-network model. In some embodiments, the processor 301, in obtaining the corresponding candidate network structure, includes:

In some embodiments, the processor 301 obtains the corresponding candidate network structure from the search space, including:

In some embodiments, the training of the candidate network structure and the obtaining of the performance information of the candidate neural network model corresponding to the trained candidate network structure by the processor 301 include:

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes AI model generation program instructions, and the processor executes the AI model generation program instructions to implement the steps of the AI model generation method provided in any one of the above embodiments.

The computer-readable storage medium may be an internal storage unit of the electronic device according to any of the foregoing embodiments, for example, a memory or an internal memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. An AI model generation method, comprising:

acquiring a service requirement corresponding to a target AI model;

2. The AI model generation method of claim 1, wherein the model generation mode includes at least one of a first generation mode, a second generation mode, and a third generation mode, and wherein training the initial model with training data based on the model generation mode to generate the target AI model comprises:

3. The AI model generation method of claim 2, wherein prior to determining the corresponding initial model in the base database based on the business requirements, the method further comprises:

4. The AI model generation method of claim 3, wherein the building a base database includes:

acquiring a corresponding candidate network structure;

judging whether the candidate neural network model is superior to a preset neural network model or not according to the performance information;

5. The AI model generation method of claim 4, wherein the obtaining the corresponding candidate network structure comprises:

6. The AI model generation method of claim 5, wherein the obtaining the corresponding candidate network structure from the search space comprises:

7. The AI model generation method of claim 5, wherein the obtaining the corresponding candidate network structure from the search space comprises:

8. The AI model generation method of claim 4, wherein the training the candidate network structure and obtaining performance information of the candidate neural network model corresponding to the trained candidate network structure comprises:

9. An electronic device comprising a memory and a processor;

the memory is used for storing a computer program;

the processor that executes the computer program and, when executing the computer program, implements the AI model generation method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to implement the AI model generation method according to any one of claims 1 to 8.