CN112906554B

CN112906554B - Model training optimization method and device based on visual image and related equipment

Info

Publication number: CN112906554B
Application number: CN202110182525.1A
Authority: CN
Inventors: 刘伟华; 王栋; 张国权
Original assignee: Athena Eyes Co Ltd
Current assignee: Athena Eyes Co Ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-12-23
Anticipated expiration: 2041-02-08
Also published as: CN112906554A

Abstract

The invention relates to the field of computer vision processing, and discloses a model training optimization method and device based on a visual image, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring visual image data and user configuration parameters; determining a training target for the visual image data according to the user configuration parameters; selecting a visual piece corresponding to the training target from a preset visual piece set as a target visual piece; determining a preset model according to the target visual piece; the visual image data is adopted to train the preset model to obtain a training optimization model, and the method improves the generation efficiency of the training task model and reduces the complexity.

Description

Model training optimization method and device based on visual images and related equipment

Technical Field

The invention relates to the field of computer vision processing, in particular to a model training optimization method and device based on visual images, computer equipment and a storage medium.

Background

In the prior art, a common deep learning method is TensorFlow (symbolic mathematical system based on data flow programming), pyTorch (open source Python machine learning library).

However, for computer vision problems, a TensorFlow and PyTorch deep learning method is used, a model needs to be trained independently according to the visual task requirement of each user, and with the continuous improvement of task complexity, a mainstream framework system becomes complex due to the reasons of structural design, continuous updating and the like, and the structural optimization and transplantation become more difficult, so that the generation efficiency of the training task model becomes low and the complexity becomes high.

Therefore, the existing method has the problems of low generation efficiency and high complexity of the training task model caused by diversified visual task requirements.

Disclosure of Invention

The embodiment of the invention provides a model training optimization method and device based on visual images, computer equipment and a storage medium, so that the model training optimization generation efficiency based on the visual images is improved, and the complexity is reduced.

In order to solve the above technical problem, an embodiment of the present application provides a model training optimization method based on visual images, including:

acquiring visual image data and user configuration parameters;

determining a training target for the visual image data according to the user configuration parameters;

selecting a visual piece corresponding to the training target from a preset visual piece set as a target visual piece;

determining a preset model according to the target visual piece;

and training the preset model by adopting the visual image data to obtain a training optimization model.

In order to solve the above technical problem, an embodiment of the present application further provides a model training optimization apparatus based on visual images, including:

the data acquisition module is used for acquiring visual image data and user configuration parameters;

a training target determining module, configured to determine a training target for the visual image data according to the user configuration parameter;

the target visual piece determining module is used for selecting a visual piece corresponding to the training target from a preset visual piece set to serve as a target visual piece;

the preset model determining module is used for determining a preset model according to the target visual piece;

and the training optimization module is used for training the preset model by adopting the visual image data to obtain a training optimization model.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the steps of the above method for training optimization based on visual images.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method for optimizing model training optimization based on visual images for model training based on visual images.

The model training optimization method, the model training optimization device, the computer equipment and the storage medium based on the visual images provided by the embodiment of the invention acquire visual image data and user configuration parameters; determining a training target for the visual image data according to the user configuration parameters; selecting a visual piece corresponding to a training target from a preset visual piece set as a target visual piece; determining a preset model according to the target visual piece; the preset model is trained by adopting visual image data to obtain a training optimization model, so that the generation efficiency of a training task model is improved, and the complexity is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a method for optimizing model training based on visual images according to an embodiment of the present application;

FIG. 2 is a flow diagram of one embodiment of a visual image-based model training optimization method of the present application;

FIG. 3 is a schematic diagram of an embodiment of a visual image-based model training optimization apparatus according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof in the description and claims of this application and the description of the figures above, are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The model training optimization method based on the visual images can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server through a network. The client may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

The method for optimizing model training based on visual images provided by the embodiment of the present application is executed by a server, and accordingly, a device for optimizing model training based on visual images is disposed in the server.

Referring to fig. 2, fig. 2 shows a model training optimization method based on visual images according to an embodiment of the present invention, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps S101 to S105:

s101, acquiring visual image data and user configuration parameters.

In step S101, it is specifically:

visual image data and user configuration parameters are acquired through a training platform.

The visual image data is training data with label information for training a visual model, and the visual model is a model established for different visual tasks in the visual field.

The training platform is a training platform which trains the visual model based on a component mode and can automatically select a specific visual model. The training platform includes, but is not limited to, a deep learning training platform, a machine learning based training platform. Here, the component-based method means that model training is divided into a visual element, a built-in base model, and a meta-operator. The user can configure parameters for the visual piece and the built-in base model through the training platform.

The user configuration parameters include, but are not limited to, a visual model network structure, an objective function, an optimization method, and the like.

The Visual model Network structure refers to a model used for processing a Visual image task, and includes, but is not limited to, a ResNet (residual Network structure), a VGG (Visual Geometry Group Network, a convolutional neural Network structure based on VGG), a YOLO (a deep neural Network structure based on YOLO), an SSD (Single Shot multi box Detector, a target detection Network structure based on SSD), and the like.

The objective function refers to a function for optimizing random gradient descent in a model training process, and includes but is not limited to softmax maximum value classification, binary cross entropy classification, classification cross entropy, mean square error regression, and join-sense time sequence classification.

The optimization method includes, but is not limited to, SGD (stochastic gradient optimization) and Adam (AdaptiVe Moment Estimation optimization).

It should be noted that, in this embodiment, multiple task models (picture classification, picture detection, and scene segmentation) are preset for tasks in different visual fields, and when model training is required, only a small amount of visual image data with labeling information needs to be provided by a client as sample data, and a fine-tuning technique of migration learning is adopted to increase the capacity of the sample data, so that better model performance is obtained on the basis of a small amount of visual image data, the data amount of network transmission between the client and the server is reduced, and the efficiency of network transmission of the visual image data is improved.

By acquiring the user configuration parameters, various parameter information can be flexibly acquired according to specific requirements of users, for example, a visual model network structure, an objective function and an optimization method, and the visual model is trained through various parameter information, so that the complexity of the training model is reduced.

And S102, determining a training target for the visual image data according to the user configuration parameters.

In step S102, the training targets include, but are not limited to, classification, regression, detection, segmentation, and the like. In step S102, it is specifically:

and the training platform reads the training targets in the user configuration parameters and determines the training tasks of the visual image data according to the training targets.

It should be noted here that the training target may be one or more.

For example, when the training target in the user configuration parameter is [ class, segment ], then two training targets may be determined for the visual image data. The visual image data is classified first, and the classified visual image data is segmented.

S103, selecting a visual piece corresponding to the training target from a preset visual piece set to serve as a target visual piece.

In step S103, the preset visual element set includes, but is not limited to, a classification visual element, a regression visual element, a detection visual element, and a segmentation visual element.

Classified visuals include, but are not limited to, resNet-based classified visuals and VGG-based classified visuals, inspection visuals include, but are not limited to, SSD-based inspection visuals and target detection-based visuals (e.g., fast-RCNN inspection visuals), segmentation visuals include, but are not limited to, semantic segmentation-based visuals (e.g., deep Lab segmentation visuals), target segmentation-based visuals (e.g., mask-RCNN segmentation visuals).

And S104, determining a preset model according to the target visual piece.

In step S104, it should be noted that the preset model is a special task model preset by the training platform according to different training tasks in the visual field, for example, a classification preset model, a regression preset model, a detection preset model, and a segmentation preset model.

In step S104, it specifically includes the following steps a to B:

A. and selecting a built-in base model corresponding to the target visual part from a preset built-in base model set as a target built-in base model.

B. And determining a preset model according to the target visual piece and the target built-in base model.

For the step a, specifically, a built-in base model corresponding to the target visual part is selected from a preset built-in base model set according to the user configuration parameters to serve as the target built-in base model.

It should be noted here that the relationship between the target visual element and the built-in base model is a one-to-many relationship. For example, the classification target visual corresponds to a ResNet based built-in basis model, a DenseNet based built-in basis model.

In step a, the preset built-in basis model set includes, but is not limited to, a feature-side built-in basis model and a target-side built-in basis model.

The feature-end built-in base model includes, but is not limited to, a color feature-based built-in base model, a shape feature-based built-in base model, a texture feature-based built-in base model, and a depth feature-based built-in base model. The target-side built-in base model includes, but is not limited to, a built-in base model based on binary cross entropy classification, a built-in base model based on classification cross entropy, a built-in base model based on mean square error regression, a built-in base model based on binding-sense time-series classification, a built-in base model based on ResNet, and a DenseNet built-in base model.

For the step B, it should be noted here that the relationship between the preset model and the target built-in base model is a one-to-many relationship. For example, the classification preset model corresponds to a ResNet-based built-in basis model and a DenseNet-based built-in basis model.

In a specific example, the target visual element is a classified visual element, and the preset selectable set of built-in base models corresponding to the classified visual element is a built-in base model based on ResNet and a DenseNet built-in base model. In the user configuration parameters, a built-in basis model based on ResNet is selected.

In the step A, according to the user configuration parameters, the target built-in base model corresponding to the target visual element is determined to be a ResNet-based built-in base model.

In the step B, according to the fact that the target visual part is the classified visual part and the target built-in base model is a built-in base model based on ResNet, the preset model is determined to be a classified preset model based on ResNet.

By carrying out layered design on the visual part and the built-in base model, a user can diversify and flexibly configure the model according to respective requirements, so that specific requirements are met, in the parameter configuration process, the user can select a proper tuning method, even if the complexity of the user requirements is continuously improved, the user can be hierarchically selected through the layered design of the visual part and the built-in base model, and the complexity of training the model is reduced.

And S105, training the preset model by adopting visual image data to obtain a training optimization model.

As for the above step S105, it specifically includes the following steps C to D:

C. and determining a meta-operator according to the target built-in base model, wherein the meta-operator carries out calculation processing on the visual image data.

D. And training the visual image data according to all the element operators to obtain a training optimization model.

For the step C, the target built-in base model and the meta-operator are in one-to-many relationship, and each meta-operator for realizing the target built-in base model can be determined according to one target built-in base model. The calculation order of each meta-operator corresponds to the method used by the target built-in basis model.

In step C, the above-mentioned meta-operators include, but are not limited to, a compute meta-operator and an index meta-operator.

The computational meta-operators include, but are not limited to, dense computational meta-operators and sparse computational meta-operators. The dense compute meta-operators include, but are not limited to, vector optimization dense compute meta-operators and neural network dense compute meta-operators. Vector optimization intensive operand operators include, but are not limited to, matrix-to-matrix multiplication, matrix-to-vector multiplication, and vector-to-vector multiplication. Neural network intensive computational operators include, but are not limited to convolution, pooling, full join.

Index element operators include, but are not limited to, a split index element operator and an accumulate index element operator.

For the step D, specifically, the visual image data is computationally trained according to the computation sequence of the meta-operator, so as to obtain a training optimization model.

The network structure (such as a neural network structure and a residual network structure) formed by combining the computational element operators of different methods enables training of visual image data to be simple and uniform, and is convenient for fusing various computational graphs. Meanwhile, according to the optimization method in the user configuration parameters, the precision of the training optimization model can be automatically verified, and the accuracy of the training optimization model is improved.

In a specific example, if the target built-in basis model is a built-in basis model for extracting texture features based on LBP (local binary feature extraction), the meta-operators are determined to be a segmentation meta-operator, a frequency computation meta-operator, and a full join meta-operator. According to the LBP algorithm used by the target built-in base model, the calculation sequence of the meta-operator can be determined to be a segmentation meta-operator, a frequency calculation meta-operator and a full connection meta-operator.

According to the segmentation meta-operator, the comparison meta-operator, the frequency calculation meta-operator and the full-connection meta-operator, segmenting visual image data, performing frequency calculation on an image obtained through segmentation, and finally performing full-connection operation, so that a training optimization model is obtained.

In this example, step S105 is followed by steps E to G:

E. and acquiring parameter information of the network equipment.

F. And determining a deployment model according to the training optimization model and the parameter information of the network equipment.

G. And applying the deployment model to the network equipment so that the network equipment performs operation corresponding to the training target on the image data according to the deployment model.

For step E, the network devices include, but are not limited to, network devices supporting mainstream hardware and network devices supporting home chips. The network devices supporting the mainstream hardware include but are not limited to network devices based on a graphic processor, network devices based on a central processing unit and network devices based on an instruction set processor, and the network devices supporting the domestic chips include but are not limited to network devices supporting Huashi chips and network devices supporting Bitsuan and Membranan custom chips.

For the step F, it is specifically:

and analyzing the training optimization model to obtain an analysis model.

And according to the parameter information of the network equipment, optimizing and compiling a calculation graph of the analysis model to generate a deployment model.

The analytical model is a network model with inputs and outputs. The network device includes, but is not limited to, a cloud, a server, and various terminal devices.

The optimization of the computation graph on the analytical model includes, but is not limited to, feature fusion of different network layers in the analytical model, and superposition of features of different network layers in the analytical model. And compiling the analysis model after the calculation graph is optimized by adopting a pre-compiling technology according to parameter information of different network equipment so as to obtain a deployment model corresponding to the network equipment.

For the step G, it is specifically:

and transmitting the deployment model to the corresponding network equipment by calling the uniform interface.

And applying the deployment model to the network equipment so that the network equipment performs operation corresponding to the training target on the image data according to the deployment model.

In a specific example, the network device is a handset end of a HUAWEI ei P40 model, and parameter information of the handset end includes system Android and a hardware chip ARM. The training target is classified visual image data, and a training optimization model of the classified visual image data is obtained. And analyzing the classified visual image data training optimization model to obtain a classified visual image data analysis model. And transmitting the classified visual image data analysis model to a HUAWEI P40 model mobile phone end by calling a uniform interface. And optimizing and compiling a calculation chart of the classified visual image data analysis model according to Android of a parameter information system of a HUAWEI P40 model mobile phone terminal and an ARM (advanced RISC machine) and generating a classified visual image data deployment model. And applying the classified visual image data deployment model to a HUAWEI P40 type mobile phone end, wherein the mobile phone end can classify the visual image data according to the classified visual image data deployment model.

And acquiring a training optimization model corresponding to a training target through user configuration parameters, wherein the training optimization model is a unified standard model and supports third-party platform conversion. By calling the unified interface, the training optimization model can be applied to various platforms, a cross-platform deployment model is realized, algorithm personnel do not need to call artificial intelligence libraries under different platforms for deployment when the same type of model is deployed on different platforms, and therefore the efficiency of applying the training optimization model across platforms is improved.

In the embodiment, visual image data and user configuration parameters are acquired; determining a training target for the visual image data according to the user configuration parameters; selecting a visual piece corresponding to a training target from a preset visual piece set as a target visual piece; determining a preset model according to the target visual piece; the preset model is trained by adopting visual image data to obtain a training optimization model, so that the generation efficiency of a training task model is improved, and the complexity is reduced.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not limit the implementation process of the embodiments of the present invention in any way.

Fig. 3 is a schematic block diagram of a visual image-based model training optimization device in one-to-one correspondence with the visual image-based model training optimization method according to the foregoing embodiment. As shown in fig. 3, the visual image-based model training optimization device includes a data acquisition module 11, a training target determination module 12, a target visual element determination module 13, a preset model determination module 14, and a training optimization module 15. The functional modules are explained in detail as follows:

a data acquisition module 11, configured to acquire visual image data and user configuration parameters;

a training target determining module 12, configured to determine a training target for the visual image data according to the user configuration parameter;

the target visual piece determining module 13 is configured to select a visual piece corresponding to the training target from a preset visual piece set as a target visual piece;

a preset model determining module 14, configured to determine a preset model according to the target visual element;

and the training optimization module 15 is configured to train the preset model by using the visual image data to obtain a training optimization model.

In one embodiment, the preset model determining module 14 further includes:

the built-in base model determining unit is used for selecting a built-in base model corresponding to the target visual part from a preset built-in base model set to serve as a target built-in base model;

and the preset model determining unit is used for determining a preset model according to the target visual piece and the target built-in base model.

In one embodiment, the training optimization module 15 further comprises:

the meta-operator determining unit is used for determining a meta-operator according to the target built-in base model, wherein the meta-operator performs calculation processing on the visual image data;

and training the visual image data according to all the meta-operators to obtain a training optimization model.

In this example, training the optimization module 15 then comprises:

the parameter information acquisition module is used for acquiring the parameter information of the network equipment;

the deployment model determining module is used for determining a deployment model according to the training optimization model and the parameter information of the network equipment;

and the application module is used for applying the deployment model to the network equipment so that the network equipment performs operation corresponding to the training target on the image data according to the deployment model.

In one embodiment, the deployment model determining module further comprises:

the analysis model determining unit is used for analyzing the training optimization model to obtain an analysis model;

and the deployment model generation unit is used for optimizing and compiling the calculation graph of the analysis model according to the parameter information of the network equipment to generate a deployment model.

For specific limitations of the apparatus for optimizing model training based on visual images, reference may be made to the above limitations of the method for optimizing model training based on visual images, and details thereof are not repeated here. The modules in the model training and optimizing device based on visual images can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the computer device is used for storing data involved in the model training optimization method based on the visual images. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of model training optimization based on visual images.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor when executing the computer program implements the steps of the method for optimizing model training based on visual images in the above embodiments, such as the steps S101 to S105 shown in fig. 2 and other extensions of the method and related steps. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units of the visual image-based model training optimization apparatus in the above embodiments, such as the functions of the modules 11 to 15 shown in fig. 3. To avoid repetition, further description is omitted here.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer apparatus by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc.

The memory may be integrated in the processor or may be provided separately from the processor.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the visual image-based model training optimization method in the above-described embodiments, such as the steps S101 to S105 shown in fig. 2 and extensions of other extensions and related steps of the method. Alternatively, the computer program is executed by a processor to implement the functions of the modules/units of the visual image-based model training optimization apparatus in the above-described embodiments, such as the functions of the modules 11 to 15 shown in fig. 3. To avoid repetition, further description is omitted here.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It should be understood that the above-described embodiments are merely exemplary of some, and not all, embodiments of the present application, and that the drawings illustrate preferred embodiments of the present application without limiting the scope of the claims appended hereto. This application is capable of embodiments in many different forms and the embodiments are provided so that this disclosure will be thorough and complete. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A model training optimization method based on visual images is characterized by comprising the following steps:

acquiring visual image data and user configuration parameters;

determining a preset model according to the target visual piece;

training the preset model by adopting the visual image data to obtain a training optimization model;

wherein the determining a preset model according to the target visual element comprises:

selecting a built-in base model corresponding to the target visual part from a preset built-in base model set as a target built-in base model;

and determining a preset model according to the target visual piece and the target built-in base model.

2. The method of claim 1, wherein said training the pre-set model using the visual image data to obtain a trained optimization model comprises:

determining a meta-operator according to the target built-in basis model, wherein the meta-operator performs calculation processing on the visual image data;

3. The method of any of claims 1 or 2, wherein after said training said pre-set model using said visual image data to obtain a trained optimization model, said method further comprises:

acquiring parameter information of network equipment;

determining a deployment model according to the training optimization model and the parameter information of the network equipment;

4. The method of claim 3, wherein the determining a deployment model based on the trained optimization model and the parameter information for the network device comprises:

analyzing the training optimization model to obtain an analysis model;

and according to the parameter information of the network equipment, optimizing and compiling the calculation graph of the analysis model to generate a deployment model.

5. A model training optimization device based on visual images is characterized by comprising:

the training target determining module is used for determining a training target for the visual image data according to the user configuration parameters;

the training optimization module is used for training the preset model by adopting the visual image data to obtain a training optimization model;

wherein the preset model determining module comprises:

6. The visual image-based model training optimization apparatus of claim 5, wherein the training optimization module thereafter comprises:

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method for visual image-based model training optimization as claimed in any one of claims 1 to 4 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method for visual-image-based model training optimization according to any one of claims 1 to 4.