CN113361578B

CN113361578B - Training method and device for image processing model, electronic equipment and storage medium

Info

Publication number: CN113361578B
Application number: CN202110602898.XA
Authority: CN
Inventors: 谢群义; 陈毅; 钦夏孟; 章成全; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2023-08-04
Anticipated expiration: 2041-05-31
Also published as: CN113361578A

Abstract

The disclosure provides a training method, a training device, electronic equipment and a storage medium of an image processing model, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning and the like, and can be applied to an image recognition scene. The specific implementation scheme is as follows: acquiring a plurality of training data, and searching a network model set in a target search space to obtain a candidate network model, wherein the candidate network model comprises: the system comprises a plurality of computing layers and a plurality of sensitivity degree values, wherein the sensitivity degree values correspond to the computing layers respectively; the candidate network model is processed according to the plurality of sensitivity values to obtain a network model to be trained, and the network model to be trained is trained by adopting the plurality of training data to obtain an image processing model, so that the structure of the image processing model can be effectively simplified, the training efficiency of the image processing model can be effectively improved, and the image processing effect can be effectively assisted to be improved.

Description

Training method and device for image processing model, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to an image recognition scene, in particular to a training method and device of an image processing model, electronic equipment and a storage medium.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning technology, a deep learning technology, a big data processing technology, a knowledge graph technology and the like.

In the related art, when a network model to be trained is trained to obtain an image processing model, a structure of the network model to be trained is designed and trained with high labor cost, so that the training cost is high, and the training effect and the image processing performance of the image processing model are affected.

Disclosure of Invention

A training method, apparatus, electronic device, storage medium, and computer program product for an image processing model are provided.

According to a first aspect, there is provided a training method of an image processing model, comprising: acquiring a plurality of training data; searching the network model set in the target search space to obtain a candidate network model, wherein the candidate network model comprises: a plurality of computing layers; determining a plurality of sensitivity values respectively corresponding to the plurality of calculation layers; processing the candidate network model according to the plurality of sensitivity values to obtain a network model to be trained; and training the network model to be trained by adopting a plurality of training data to obtain an image processing model.

According to a second aspect, there is provided an image processing method comprising: acquiring an image to be processed; and inputting the image to be processed into the image processing model obtained by training the training method of the image processing model so as to obtain target information output by the image processing model.

According to a third aspect, there is provided a training apparatus of an image processing model, comprising: the first acquisition module is used for acquiring a plurality of training data; the search module is used for searching the network model set in the target search space to obtain candidate network models, and the candidate network models comprise: a plurality of computing layers; the determining module is used for determining a plurality of sensitivity values corresponding to the plurality of computing layers respectively; the processing module is used for processing the candidate network model according to the plurality of sensitivity values to obtain a network model to be trained; and the training module is used for training the network model to be trained by adopting a plurality of training data so as to obtain an image processing model.

According to a fourth aspect, there is provided an image processing apparatus comprising: the second acquisition module is used for acquiring the image to be processed; the recognition module is used for inputting the image to be processed into the image processing model obtained by training by the training device of the image processing model so as to obtain target information output by the image processing model.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, so that the at least one processor can execute the training method of the image processing model proposed by the embodiment of the present disclosure or execute the image processing method proposed by the embodiment of the present disclosure.

According to a sixth aspect, a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a training method of an image processing model as set forth in the embodiments of the present disclosure or to perform an image processing method as set forth in the embodiments of the present disclosure is provided.

According to a seventh aspect, a computer program product is proposed, comprising a computer program which, when executed by a processor, implements the training method of the image processing model proposed by the embodiments of the present disclosure or performs the image processing method proposed by the embodiments of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an image processing model training process according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 8 is a schematic diagram according to a seventh embodiment of the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing a training method of an image processing model in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

It should be noted that, the execution body of the training method of the image processing model in this embodiment is a training device of the image processing model, and the device may be implemented in a software and/or hardware manner, and the device may be configured in an electronic device, where the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning and the like, and can be applied to an image recognition scene, so that the structure of an image processing model can be effectively simplified, the training efficiency of the image processing model can be effectively improved, and the image recognition effect can be effectively assisted and improved.

Wherein, artificial intelligence (Artificial Intelligence), english is abbreviated AI. It is a new technical science for researching, developing theory, method, technology and application system for simulating, extending and expanding human intelligence.

Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. The final goal of deep learning is to enable a machine to analyze learning capabilities like a person, and to recognize text, images, and sound data.

The computer vision means that a camera and a computer are used for replacing human eyes to perform machine vision such as recognition, tracking and measurement on targets, and further graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect.

Image processing, also called image processing, refers to a technique of analyzing an image by a computer to achieve a desired result. Image processing is an important basic module in the fields of computer vision and deep learning, and can be applied to scenes such as image recognition, image segmentation, image transformation, image classification and the like.

As shown in fig. 1, the training method of the image processing model includes:

s101: a plurality of training data is acquired.

In an embodiment of the disclosure, a plurality of training data is first acquired for subsequent training of an image processing model.

The plurality of training data may be, for example, a plurality of image data, and the image data may be image data acquired by using various image acquisition devices, or may be image data acquired from the internet, which is not limited thereto.

In some embodiments, the training data may be image data in any scene, such as: the training data is financial bill image data, that is, the image processing model provided in the embodiment of the present disclosure may perform a task of financial bill processing, and the input sample data is: the plurality of financial document images (the financial document images may be images captured by the camera device on the financial document of the entity) to realize recognition of the financial document images, specifically, for example, recognition of characters in the financial document images, that is, optical character recognition (Optical Character Recognition, OCR) recognition may be realized.

S102: searching the network model set in the target search space to obtain a candidate network model, wherein the candidate network model comprises: a plurality of computation layers.

After the plurality of training data are obtained, further, the embodiment of the disclosure searches the network model set in the target search space to obtain candidate network models.

The network model set may be related to an actual application scenario of the image processing model, for example: the image processing model is used for identifying characters in the financial bill image, and then a plurality of network models for character identification are determined as the network model set, that is, the embodiment of the disclosure supports determining the network model set according to the actual application scene of the image processing model.

For example, the network model set of the present embodiment may include: residual neural network architecture (Res-Net Block), sequence-and-Excitation network architecture (SE-Block), recurrent neural network architecture (Recurrent Neural Network Block, RNN-Block), and any other possible network architecture, without limitation herein.

And a search space made up of a collection of network models may be referred to as a target search space, namely: the target search space includes a plurality of network model structures therein and supports network model search functions.

Embodiments of the present disclosure may search a network structure for image processing from a target search space, and the searched network model may be referred to as a candidate network model, for example: the candidate network model may be one or more of Res-Block, SE-Block, RNN-Block, without limitation.

Wherein an automatic model search technique (Neural Architecture Search, nas) may be used to search candidate network models from the target search space, or any other possible model search technique may be used to search candidate network models, without limitation.

Also, multiple computational layers may be included in the candidate network model, such as: convolution layer, pooling layer, activation function layer, full connection layer, and any other possible computation layer, without limitation.

S103: a plurality of sensitivity level values corresponding to the plurality of calculation layers respectively are determined.

After the candidate network model is obtained, further, the embodiment of the disclosure may determine a plurality of sensitivity level values corresponding to a plurality of calculation layers respectively.

The sensitivity level may represent, for example, the influence of the computing layer on the whole candidate network model, and the sensitivity level value may intuitively represent the strength of the influence of each computing layer on the candidate network model.

In some embodiments, for example, the multiple sensitivity level values corresponding to the multiple computing layers may be determined by calculating a loss value (loss) of each computing layer to the candidate network model, or the multiple sensitivity level values may be determined by any other possible manner, which is not limited herein.

S104: and processing the candidate network model according to the plurality of sensitivity values to obtain a network model to be trained.

After determining the plurality of sensitivity level values corresponding to the plurality of computing layers, further embodiments of the disclosure process the candidate network model according to the plurality of sensitivity level values to obtain the network model to be trained.

In some embodiments, the calculation layer of the candidate network model may be deleted according to the sensitivity level value, for example, the calculation layer with the small sensitivity level value is deleted, so as to obtain the network model to be trained.

In other embodiments, the parameters of the calculation layer of the candidate network model may be optimized and adjusted according to the sensitivity level value to obtain the network model to be trained, or the candidate network model may be processed in any other possible manner, which is not limited herein.

The candidate network model processed by the sensitivity level value can have a better network structure or can reduce the volume of the model, thereby being beneficial to the subsequent training of the image processing model.

S105: and training the network model to be trained by adopting a plurality of training data to obtain an image processing model.

After the network model to be trained is obtained, the embodiment of the disclosure may train the network model to be trained by using a plurality of training data (e.g., a plurality of financial bill image data) until the model converges, so as to obtain an image processing model.

In this embodiment, by acquiring a plurality of training data and searching a network model set in a target search space, a candidate network model is obtained, where the candidate network model includes: the system comprises a plurality of computing layers, a plurality of sensitivity values corresponding to the computing layers respectively, and a candidate network model is processed according to the sensitivity values to obtain a network model to be trained, and the network model to be trained is trained by adopting a plurality of training data to obtain an image processing model, so that the structure of the image processing model can be effectively simplified, the training efficiency of the image processing model can be effectively improved, and the image recognition effect can be effectively assisted to be improved.

Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 2, the training method of the image processing model includes:

s201: a plurality of image processing tasks corresponding to the image processing scene is determined.

Among other things, image processing scenarios include, for example, image recognition, character recognition (OCR recognition), image segmentation, and any other possible processing scenario, which may specifically be, for example: and automatically identifying the characters in the financial bill image.

The process of automatically identifying text in the financial billing image may be implemented by a plurality of image processing tasks, such as an input task, a feature extraction task, an identification task, an output task, and any other possible tasks, without limitation.

In the embodiment of the disclosure, a plurality of image processing tasks corresponding to an image processing scene are first determined, for example: a plurality of image processing tasks for performing financial instrument word recognition are determined.

S202: a plurality of network model sets corresponding to the plurality of image processing tasks, respectively, are determined.

Further, a plurality of network model sets corresponding to the plurality of image processing tasks are determined, that is, the embodiments of the present disclosure may determine the plurality of network model sets according to the actual image processing tasks, so that the determined plurality of network model sets may be matched with the plurality of image processing tasks, and thus may complete the corresponding image processing tasks.

S203: a target search space is generated from the plurality of network model sets.

Further, a target search space is generated according to a plurality of network model sets, namely: the plurality of network models are assembled to form the target search space, and the target search space can support the network model function, wherein the generation mode of the target search space can be adopted in the generation mode of the related technology, and the generation mode is not limited herein.

Therefore, the embodiment of the disclosure can generate different image search spaces for different image processing tasks, so that the requirements of different image processing scenes can be met, and the network model set is determined according to the image processing tasks, so that the network model set can be matched with the image processing tasks, and further the image processing effect can be improved.

S204: a plurality of training data is acquired.

The description of S204 may be specifically referred to the above embodiments, and will not be repeated here.

S205: searching the network model set in the target search space to obtain a plurality of initial network models.

The network model initially searched from the target search space may be referred to as an initial network model, which may be a model or models, without limitation.

In some embodiments, the plurality of initial network models may be understood as initially screened network models corresponding to image processing tasks, such as: the plurality of initial network models obtained by searching in the target search space are Res-Block and SE-Block, which means that the Res-Block and SE-Block are more matched with the image processing task, and the determination range of the candidate network models can be narrowed by preliminary screening.

S206: and training the plurality of initial network models by adopting the plurality of training data, and acquiring a plurality of image processing effects respectively corresponding to the plurality of trained initial network models.

After obtaining the plurality of initial network models, the plurality of initial network models can be further trained by adopting the plurality of training data, and a plurality of image processing effects can be obtained.

The image processing effect may be represented by the speed of image processing, the accuracy of image processing, and any other possible effect, and is not limited herein.

In some embodiments, the plurality of image processing effects may represent effects of the initial network model to complete the image processing task, and thus the image processing effects may become a criterion for the plurality of initial network models as candidate network models.

S207: and selecting a candidate network model from a plurality of initial network models according to the plurality of image processing effects.

Further, embodiments of the present disclosure may select a candidate network model from among a plurality of initial network models based on a plurality of image processing effects.

In some embodiments, the plurality of initial network model models may be ranked according to image processing effects, such as: the method comprises the steps of sorting a plurality of initial network model models according to the speed of image processing, or sorting the plurality of initial network model models according to the accuracy of image processing, and selecting candidate network models according to sorting results.

In other embodiments, the plurality of initial network model models may be ranked in combination with the speed of image processing, the accuracy of image processing, and the corresponding weight values, and the candidate network model may be selected according to the ranking result, or any other possible manner may be used to select the candidate network model from the plurality of initial network models, which is not limited herein.

It should be appreciated that the above examples are merely illustrative of how the candidate network model may be determined, and that in practice, the candidate network model may be determined in any other possible manner, without limitation.

In this embodiment, a plurality of initial network models can be determined, so that the selection range of candidate network models can be primarily narrowed, and the calculation amount can be reduced. In addition, the candidate network model is determined according to the image processing effect, so that the accuracy of the determination of the candidate network model is effectively improved, the requirement of an image processing task can be met, and the image processing effect is further improved.

S208: a plurality of sensitivity level values corresponding to the plurality of calculation layers respectively are determined.

S209: and processing the candidate network model according to the plurality of sensitivity values to obtain a network model to be trained.

S210: and training the network model to be trained by adopting a plurality of training data to obtain an image processing model.

The descriptions of S208-S210 may be specifically referred to the above embodiments, and are not repeated here.

In this embodiment, by acquiring a plurality of training data and searching a network model set in a target search space, a candidate network model is obtained, where the candidate network model includes: the system comprises a plurality of computing layers, a plurality of sensitivity values corresponding to the computing layers respectively, and a candidate network model is processed according to the sensitivity values to obtain a network model to be trained, and the network model to be trained is trained by adopting a plurality of training data to obtain an image processing model, so that the structure of the image processing model can be effectively simplified, the training efficiency of the image processing model can be effectively improved, and the image recognition effect can be effectively assisted to be improved. In addition, the embodiment can also generate different image search spaces for different image processing tasks, so that the requirements of different image processing scenes can be met, the network model set is determined according to the image processing tasks, and the network model set can be matched with the image processing tasks. And by determining a plurality of initial network models, the selection range of candidate network models can be primarily narrowed, and the calculation amount in the model determining process can be reduced. In addition, a plurality of initial network models can be determined, so that the selection range of candidate network models can be preliminarily narrowed, and the calculation amount can be reduced. In addition, the candidate network model is determined according to the image processing effect, so that the accuracy of the determination of the candidate network model is effectively improved, the requirement of an image processing task can be met, and the image processing effect is further improved.

Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure.

As shown in fig. 3, the training method of the image processing model includes:

s301: a plurality of training data is acquired.

S302: searching the network model set in the target search space to obtain candidate network models.

The descriptions of S301 to S302 may be specifically referred to the above embodiments, and are not repeated herein.

S303: the candidate network model is trained using the plurality of training data to determine a first loss value corresponding to the candidate network model.

In the present embodiment, in the operation of determining a plurality of sensitivity level values corresponding to a plurality of calculation layers, a candidate network model is first trained using a plurality of training data, and the obtained loss value is referred to as a first loss value.

S304: and deleting a plurality of calculation layers from the candidate network models respectively, and acquiring a plurality of second intermediate network models obtained after deleting the corresponding calculation layers for a plurality of times.

Further, a plurality of calculation layers are deleted from the candidate network models respectively, and a plurality of second intermediate network models obtained after deleting the corresponding calculation layers for a plurality of times are obtained.

For example, a layer-by-layer traversal method may be adopted to traverse each calculation layer among the plurality of calculation layers, delete the traversed calculation layer, and use the candidate network model after deleting the calculation layer as the second intermediate network model. Thus, deleting for each computing layer may result in a corresponding second intermediate network model.

S305: a plurality of second loss values corresponding to the plurality of second intermediate network models, respectively, are determined.

Further, a plurality of second loss values corresponding to the plurality of second intermediate network models, respectively, are determined, for example: the training data may be used to train the plurality of second intermediate network models, respectively, to obtain a corresponding plurality of second loss values.

S306: a plurality of loss change values between the first loss value and the plurality of second loss values are determined and used as a plurality of sensitivity values.

After the first loss value and the plurality of second loss values are determined, a plurality of loss change values between the first loss value and the plurality of second loss values are determined, for example: and respectively carrying out difference value calculation on the first loss value and the second loss values to obtain a plurality of loss change values, and taking the loss change values as a plurality of sensitivity values. In practical application, when a certain calculation layer is removed, the overall loss of the evaluation set rises, the layer is judged to be an effective layer, and when the overall loss of the evaluation set falls or is unchanged after the certain calculation layer is removed, the layer is judged to be a deletable layer. Therefore, the sensitivity level value is determined by calculating the loss value, the influence of each calculation layer on the candidate network model can be clearly represented, and the calculation is simple and easy to realize, so that the training process of the model can be optimized to a certain extent.

It will be appreciated that the above examples are merely illustrative of determining the sensitivity level, and that in practice the sensitivity level may be determined in any other possible manner, without limitation.

S307: and processing the candidate network model according to the plurality of sensitivity values to obtain a first intermediate network model.

In some embodiments, the candidate network model further comprises: a plurality of encoding nodes (Encoder nodes), and a plurality of decoding nodes (Decoder nodes). In the operation of processing the candidate network model according to the plurality of sensitivity values to obtain the network model to be trained, the candidate network model is first processed according to the plurality of sensitivity values to obtain a first intermediate network model.

The candidate network model after the plurality of sensitivity level values are processed may be referred to as a first intermediate network model.

In the operation of determining the first intermediate network model, the embodiment of the disclosure may first determine a target computing layer from among a plurality of computing layers according to a plurality of sensitivity values.

In some embodiments, a corresponding sensitivity threshold may be set, and a plurality of sensitivity level values corresponding to the plurality of computing layers respectively are compared with the sensitivity threshold, and then a target computing layer is determined according to a comparison result, for example: the calculation layer corresponding to the sensitivity level value smaller than the sensitivity threshold is taken as a target calculation layer, and the target calculation layer can be one calculation layer or a plurality of calculation layers, and is not limited herein. The determination speed of the target calculation layer can be improved through the comparison of the sensitivity threshold values, and the training speed of the model can be improved. In addition, the sensitivity threshold can be flexibly adjusted, so that the requirement of sensitivity of a target calculation layer in different application scenes can be met.

In other embodiments, the computing layer may be further designated as a target computing layer according to an actual application scenario, for example: the convolution layer is designated as the target calculation layer.

Further, the target computing layer is deleted from among the plurality of computing layers to obtain remaining computing layers, for example: and deleting the convolution layers in the operation blocks (blocks) in the candidate network model to obtain the rest calculation layers. The deletion target calculation layer may be understood as clipping (Channel) the network structure of the candidate network model.

Further generating a first intermediate network model from the remaining computational layers, a plurality of encoding nodes (encoders), and a plurality of decoding nodes (dncoders), namely: clipping the candidate network model can obtain a clipping model (a first intermediate network model), and then performing small adjustment on the clipped model to obtain a candidate network model with smaller volume. In the embodiment, the target calculation layer in the candidate network model is cut, so that the volume of the candidate network model can be obviously reduced, the subsequent model training speed is improved, and the miniaturized design of the image processing model is realized. Experiments show that the training speed of the model can be increased by 6.16 times, and the volume of the model is reduced by 95.5%.

In practical application, since the coordinate regression effect strongly depends on the head layer, if there is no association between the corresponding calculation layer and the target calculation layer (for example, the target calculation layer is the head layer), the corresponding calculation layer is taken as the target calculation layer, that is, the target calculation layer in the head layer may not be deleted in the embodiment of the present disclosure, so that the coordinate regression effect may be ensured, and accuracy of the model may be ensured.

S308: and processing the first intermediate network model according to a plurality of distillation loss values respectively corresponding to the plurality of coding nodes and the plurality of decoding nodes to obtain a network model to be trained.

In practical application, miniaturization of the model may be accompanied by loss of precision, and distillation technology can effectively migrate 'knowledge' of a large model to a small model, so that the precision of the small model is improved.

The present implementation may process the first intermediate network model according to a plurality of distillation loss values corresponding to a plurality of encoding nodes and a plurality of decoding nodes, respectively, that is: the embodiment supports the adoption of distillation technology to process the first intermediate network model, and realizes the miniaturization of the model.

Specifically, the present disclosure may set a distillation loss (loss) supervision after a plurality of computation blocks (blocks) in a plurality of encoding nodes Encoder outputs, and simultaneously set a distillation loss (loss) supervision at a plurality of decoding nodes Encoder outputs, obtain a plurality of distillation loss values according to the distillation loss (loss) supervision, and process the first intermediate network model according to the plurality of distillation loss values to obtain the network model to be trained. Therefore, in the embodiment, the small model precision based on Nas and cutting can be aligned to the large model through the distillation technology, and the technical effect of considering model precision in the model miniaturization process is achieved.

In some embodiments, the scheme can also support an offline quantization method to accelerate the prediction speed of the model. Specifically, the remaining calculation layers, the plurality of encoding nodes, and the plurality of first output data corresponding to the plurality of decoding nodes may be determined, where the first output data corresponds to a data type, for example: floating point float type 32.

Further, determining a data mapping relationship, where the data mapping relationship is used to map a plurality of first output data to a plurality of corresponding second output data, where data types of the first output data and the second output data are different, and the data type of the first output data is a float32 type, and the data type of the second output data is: integer int8 type or other data types. In actual operation, a float32 numerical distribution map of each layer can be obtained through calculation through a small number of scene evaluation sets (such as financial bill image data), and the mapping mode from each float32 value to an int8 interval can be determined through determining the maximum and minimum values of the float32 numerical distribution map, so that the int8 quantization is realized. Therefore, the offline quantization method can improve the prediction speed of the image processing model, and further the image recognition effect is achieved.

In addition, the network model provided by the embodiment of the disclosure may be a network model of a hierarchical structure (for example, an op-level quantization model), and in the quantization process, a hierarchy may be selected for quantization, namely: the layers involved in quantization may be selectively decided, thereby minimizing loss of accuracy. The method is verified in a financial bill scene, the single model can be accelerated by 10 times, and the index is almost lossless.

S309: and inputting a plurality of training data into the network model to be trained to obtain prediction information output by the network model to be trained, wherein the prediction information is quantized according to the data mapping relation.

After determining the data mapping relationship, the embodiment may input a plurality of training data (for example, financial bill image data) into the network model to be trained, so as to obtain the prediction information output by the network model to be trained, where the prediction information is actual output data in the training process of the network model, for example: the prediction information may be name, type, specific information of the financial bill, etc., and the prediction information is determined according to actual requirements, which is not limited herein.

And, the prediction information is quantized according to the data mapping relationship, that is, the data type of the prediction information may be int8 type.

S310: and if the loss values between the prediction information and the plurality of marking information respectively corresponding to the plurality of training data meet the set conditions, taking the network model obtained by training as an image processing model.

Further, it is determined that the loss value between the predicted information and the plurality of pieces of label information corresponding to the plurality of pieces of training data satisfies a set condition, for example: the model convergence condition is satisfied.

The labeling information corresponds to training data, for example: the training data is a financial bill, and the labeling information may be name, type, specific information, etc. of the financial bill, which is not limited herein.

That is, the embodiment supports the supervised training mode to train the model until the loss value between the prediction information and the labeling information meets the set condition, and the training of the model is completed. Therefore, the accuracy of model training can be improved by adopting a supervised training mode, and the prediction information is processed through the mapping relation, so that the training speed of the model can be improved.

In a specific example, fig. 4 is a schematic diagram of an image processing model training process according to an embodiment of the present disclosure, as shown in fig. 4, a model search is first performed, and a pruning is further performed on the searched model, that is: cutting the model structure; further, the 'knowledge' of the large model can be migrated to the small model (the model after cutting) in a distillation mode, so that the precision of the small model after cutting is supervised; further, the model after distillation treatment is quantized to obtain a final image processing model, or the model after clipping can be directly quantized to obtain an image processing model, which is not limited.

In this embodiment, by acquiring a plurality of training data and searching a network model set in a target search space, a candidate network model is obtained, where the candidate network model includes: the system comprises a plurality of computing layers, a plurality of sensitivity values corresponding to the computing layers respectively, and a candidate network model is processed according to the sensitivity values to obtain a network model to be trained, and the network model to be trained is trained by adopting a plurality of training data to obtain an image processing model, so that the structure of the image processing model can be effectively simplified, the training efficiency of the image processing model can be effectively improved, and the image recognition effect can be effectively assisted to be improved. Furthermore, the influence of each calculation layer on the candidate network model can be clearly represented by determining the sensitivity level value through calculating the loss value, and the calculation is simple and easy to realize, so that the training process of the model can be optimized to a certain extent. And the sensitivity threshold can be flexibly adjusted, so that the requirement of sensitivity of a target calculation layer under different application scenes can be met. And the coordinate regression effect can be ensured, and the accuracy of the model is ensured. In addition, small model precision based on Nas and cutting can be aligned to a large model through a distillation technology, and the technical effect of considering model precision in the model miniaturization process is achieved. The accuracy of model training can be improved by adopting a supervised training mode, and the prediction information is processed through the mapping relation, so that the training speed of the model can be improved.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 5, the image processing method includes:

s501: and acquiring an image to be processed.

The embodiment of the disclosure firstly acquires an image to be processed, wherein the image to be processed can be an image under any scene, for example, the image processing method can be applied to a financial bill identification scene, and the image to be processed can be a financial bill image.

S502: and inputting the image to be processed into the image processing model obtained by training the training method of the image processing model so as to obtain target information output by the image processing model.

After the image to be processed is obtained, the image to be processed (such as a financial bill image) is further input into the image processing model obtained by training the training method of the image processing model, and corresponding target information is output, wherein the target information is as follows: the type of financial instrument, textual information of the financial instrument, and any other possible information, are not limiting herein.

In this embodiment, the image to be processed is obtained and input into the image processing model obtained by training the training method of the image processing model to obtain the target information output by the image processing model, and the image processing model obtained by training the training method of the image processing model is used for processing the image to be processed, so that the image processing efficiency can be effectively improved, and the image processing effect can be effectively improved.

Fig. 6 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 6, the training device 60 for an image processing model includes:

a first obtaining module 601, configured to obtain a plurality of training data;

the searching module 602 is configured to search the network model set in the target search space to obtain candidate network models, where the candidate network models include: a plurality of computing layers;

a first determining module 603, configured to determine a plurality of sensitivity level values corresponding to a plurality of computing layers respectively;

a processing module 604, configured to process the candidate network model according to the plurality of sensitivity values, so as to obtain a network model to be trained; and

the training module 605 is configured to train the network model to be trained using a plurality of training data to obtain an image processing model.

Optionally, in some embodiments of the present disclosure, as shown in fig. 7, fig. 7 is a schematic diagram of a training apparatus 70 of the image processing model according to a fifth embodiment of the present disclosure, including: the first acquisition module 701, the search module 702, the first determination module 703, the processing module 704, and the training module 705, where the training device 70 for an image processing model further includes:

a second determining module 706, configured to determine a plurality of image processing tasks corresponding to the image processing scene;

A third determining module 707 configured to determine a plurality of network model sets corresponding to a plurality of image processing tasks, respectively; and

a generation module 708 is configured to generate a target search space according to the plurality of network model sets.

Optionally, in some embodiments of the present disclosure, as shown in fig. 7, the search module 702 includes:

a searching sub-module 7021, configured to search the network model set in the target search space to obtain a plurality of initial network models;

the training submodule 7022 is used for training a plurality of initial network models respectively by adopting a plurality of training data and acquiring a plurality of image processing effects respectively corresponding to the trained initial network models; and

a selection submodule 7023 is configured to select a candidate network model from among a plurality of initial network models according to a plurality of image processing effects.

Optionally, in some embodiments of the present disclosure, as shown in fig. 7, the processing module 704 includes:

a first processing sub-module 7041, configured to process the candidate network model according to the plurality of sensitivity values, so as to obtain a first intermediate network model;

the second processing sub-module 7042 is configured to process the first intermediate network model according to a plurality of distillation loss values corresponding to a plurality of encoding nodes and a plurality of decoding nodes, respectively, to obtain a network model to be trained.

Optionally, in some embodiments of the present disclosure, the first processing sub-module 7041 is specifically configured to: determining a target calculation layer from a plurality of calculation layers according to the plurality of sensitivity values; deleting a target calculation layer from among the plurality of calculation layers to obtain a remaining calculation layer; a first intermediate network model is generated from the remaining computational layers, the plurality of encoding nodes, and the plurality of decoding nodes.

Optionally, in some embodiments of the present disclosure, as shown in fig. 7, the first determining module 703 includes:

a first determining submodule 7031 for training the candidate network model with a plurality of training data to determine a first loss value corresponding to the candidate network model;

a deletion sub-module 7032, configured to delete multiple computing layers from among the candidate network models, and obtain multiple second intermediate network models obtained after deleting the corresponding computing layers multiple times;

a second determining submodule 7033, configured to determine a plurality of second loss values corresponding to a plurality of second intermediate network models, respectively; and

a third determining submodule 7034 is configured to determine a plurality of loss change values between the first loss value and the plurality of second loss values as a plurality of sensitivity values.

Optionally, in some embodiments of the present disclosure, the first processing sub-module 7041 is specifically configured to:

and determining a corresponding calculation layer of the sensitivity level value smaller than the sensitivity threshold, and taking the corresponding calculation layer as a target calculation layer.

and when the corresponding calculation layer and the target calculation layer have no association relation, taking the corresponding calculation layer as the target calculation layer.

determining a residual calculation layer, a plurality of coding nodes and a plurality of first output data corresponding to a plurality of decoding nodes respectively; and determining a data mapping relation, wherein the data mapping relation is used for mapping a plurality of first output data to a plurality of corresponding second output data respectively, the data types of the first output data and the second output data are different, and the data mapping relation is used for training the network model to be trained.

Optionally, in some embodiments of the present disclosure, the training module 705 is specifically configured to: inputting a plurality of training data into a network model to be trained to obtain prediction information output by the network model to be trained, wherein the prediction information is quantized according to a data mapping relation; and if the loss values between the prediction information and the plurality of marking information respectively corresponding to the plurality of training data meet the set conditions, taking the network model obtained by training as an image processing model.

It can be understood that the training device 70 for an image processing model in fig. 7 of the present embodiment and the training device 60 for an image processing model in the foregoing embodiment, the first acquisition module 701 and the first acquisition module 601 in the foregoing embodiment, the search module 702 and the search module 602 in the foregoing embodiment, the first determination module 703 and the first determination module 603 in the foregoing embodiment, the processing module 704 and the processing module 604 in the foregoing embodiment, and the training module 705 and the training module 605 in the foregoing embodiment may have the same functions and structures.

It should be noted that the foregoing explanation of the training method of the image processing model is also applicable to the training device of the image processing model in this embodiment, and will not be repeated here.

Fig. 8 is a schematic diagram according to a sixth embodiment of the present disclosure.

As shown in fig. 8, the image processing apparatus 80 includes:

a second acquiring module 801, configured to acquire an image to be processed;

the recognition module 802 is configured to input an image to be processed into the image processing model obtained by training by the training device for image processing model, so as to obtain target information output by the image processing model.

It should be noted that the foregoing explanation of the image processing method is also applicable to the image processing apparatus of the present embodiment, and is not repeated here.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 9 is a block diagram of an electronic device for implementing a training method of an image processing model in accordance with an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, a training method of an image processing model, or an image processing method.

For example, in some embodiments, the image processing model training method, or the image processing method, may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method of the image processing model described above, or the image processing method may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform a training method of the image processing model, or an image processing method, by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

The training method for implementing the image processing model of the present disclosure, or the program code of the image processing method, may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable image processing model training apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present application may be performed in parallel or sequentially or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of training an image processing model, comprising:

acquiring a plurality of training data, wherein the training data are image data in any scene;

searching a network model set in a target search space to obtain candidate network models, wherein the candidate network models comprise: a plurality of computing layers;

determining a plurality of sensitivity level values respectively corresponding to the plurality of calculation layers;

processing the candidate network model according to the plurality of sensitivity values to obtain a network model to be trained; and

Training the network model to be trained by adopting the plurality of training data to obtain an image processing model;

the candidate network model further includes: the method comprises the steps of processing the candidate network model according to the plurality of sensitivity values to obtain a network model to be trained, and the method comprises the following steps:

determining a target computing layer from among the plurality of computing layers according to the plurality of sensitivity values;

deleting the target computing layer from among the plurality of computing layers to obtain remaining computing layers;

generating a first intermediate network model according to the remaining computational layers, the plurality of encoding nodes, and the plurality of decoding nodes;

and processing the first intermediate network model according to a plurality of distillation loss values respectively corresponding to the plurality of coding nodes and the plurality of decoding nodes to obtain the network model to be trained.

2. The method of claim 1, further comprising, prior to said searching the set of network models within the target search space for candidate network models:

determining a plurality of image processing tasks corresponding to the image processing scene;

determining a plurality of network model sets respectively corresponding to the plurality of image processing tasks; and

And generating the target search space according to the plurality of network model sets.

3. The method of claim 2, wherein the searching the set of network models within the target search space to obtain candidate network models comprises:

searching the network model set in the target search space to obtain a plurality of initial network models;

training the plurality of initial network models by adopting the plurality of training data, and acquiring a plurality of image processing effects respectively corresponding to the plurality of trained initial network models; and

and selecting the candidate network model from the plurality of initial network models according to the plurality of image processing effects.

4. The method of claim 1, wherein the determining a plurality of sensitivity level values corresponding to the plurality of computing layers, respectively, comprises:

training the candidate network model with the plurality of training data to determine a first loss value corresponding to the candidate network model;

deleting a plurality of calculation layers from the candidate network models respectively, and acquiring a plurality of second intermediate network models obtained after deleting the corresponding calculation layers for a plurality of times;

Determining a plurality of second loss values respectively corresponding to the plurality of second intermediate network models; and

and determining a plurality of loss change values between the first loss value and the second loss values respectively as the sensitivity values.

5. The method of claim 4, wherein the determining a target computing layer from among the plurality of computing layers based on the plurality of sensitivity values comprises:

and determining a corresponding calculation layer of the sensitivity level value smaller than the sensitivity threshold, and taking the corresponding calculation layer as the target calculation layer.

6. The method of claim 5, wherein the regarding the corresponding computing layer as the target computing layer comprises:

and if the corresponding calculation layer and the target calculation layer have no association relation, taking the corresponding calculation layer as the target calculation layer.

7. The method of claim 1, after the generating the first intermediate network model from the remaining computational layers, the plurality of encoding nodes, and the plurality of decoding nodes, further comprising:

determining the remaining calculation layers, the plurality of encoding nodes and a plurality of first output data corresponding to the plurality of decoding nodes respectively;

Determining a data mapping relation, wherein the data mapping relation is used for mapping the plurality of first output data to a plurality of corresponding second output data respectively, the data types of the first output data and the second output data are different, and the data mapping relation is used for training the network model to be trained.

8. The method of claim 7, wherein the training the network model to be trained using the plurality of training data to obtain an image processing model comprises:

inputting the training data into the network model to be trained to obtain prediction information output by the network model to be trained, wherein the prediction information is quantized according to the data mapping relation;

and if the loss values between the prediction information and the plurality of marking information respectively corresponding to the plurality of training data meet the set conditions, taking the trained network model as the image processing model.

9. An image processing method, comprising:

acquiring an image to be processed;

inputting the image to be processed into an image processing model obtained by training the training method of the image processing model according to any one of claims 1-8, so as to obtain target information output by the image processing model.

10. A training apparatus for an image processing model, comprising:

the first acquisition module is used for acquiring a plurality of training data, wherein the training data are image data in any scene;

the search module is used for searching the network model set in the target search space to obtain candidate network models, and the candidate network models comprise: a plurality of computing layers;

the first determining module is used for determining a plurality of sensitivity values corresponding to the plurality of computing layers respectively;

the processing module is used for processing the candidate network model according to the plurality of sensitivity values to obtain a network model to be trained; and

the training module is used for training the network model to be trained by adopting the plurality of training data so as to obtain an image processing model;

the candidate network model further includes: a plurality of encoding nodes, and a plurality of decoding nodes, wherein the processing module comprises:

the first processing submodule is used for determining a target calculation layer from the plurality of calculation layers according to the plurality of sensitivity values; deleting the target computing layer from among the plurality of computing layers to obtain remaining computing layers; generating a first intermediate network model according to the remaining computational layers, the plurality of encoding nodes, and the plurality of decoding nodes;

And the second processing sub-module is used for processing the first intermediate network model according to a plurality of distillation loss values respectively corresponding to the plurality of coding nodes and the plurality of decoding nodes so as to obtain the network model to be trained.

11. The apparatus of claim 10, further comprising:

a second determining module, configured to determine a plurality of image processing tasks corresponding to the image processing scene;

a third determining module, configured to determine a plurality of network model sets corresponding to the plurality of image processing tasks respectively; and

and the generating module is used for generating the target search space according to the plurality of network model sets.

12. The apparatus of claim 11, wherein the search module comprises:

the searching sub-module is used for searching the network model set in the target search space to obtain a plurality of initial network models;

the training sub-module is used for training the plurality of initial network models by adopting the plurality of training data respectively and acquiring a plurality of image processing effects corresponding to the plurality of initial network models after training; and

and the selecting sub-module is used for selecting the candidate network model from the initial network models according to the image processing effects.

13. The apparatus of claim 10, wherein the first determination module comprises:

a first determination sub-module for training the candidate network model with the plurality of training data to determine a first loss value corresponding to the candidate network model;

the deleting submodule is used for respectively deleting a plurality of calculation layers from the candidate network models and acquiring a plurality of second intermediate network models obtained after deleting the corresponding calculation layers for a plurality of times;

a second determining submodule, configured to determine a plurality of second loss values corresponding to the plurality of second intermediate network models, respectively; and

and a third determining submodule, configured to determine a plurality of loss change values between the first loss value and the plurality of second loss values respectively, and use the loss change values as the plurality of sensitivity values.

14. The apparatus of claim 13, wherein the first processing sub-module is specifically configured to:

15. The apparatus of claim 14, wherein the first processing sub-module is specifically configured to:

16. The apparatus of claim 10, wherein the first processing sub-module is specifically configured to:

17. The apparatus of claim 16, wherein the training module is specifically configured to:

18. An image processing apparatus comprising:

the second acquisition module is used for acquiring the image to be processed;

the recognition module is configured to input the image to be processed into an image processing model obtained by training the training device for an image processing model according to any one of claims 10 to 17, so as to obtain target information output by the image processing model.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8 or to perform the method of claim 9.

20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8 or to perform the method of claim 9.