CN113361578A

CN113361578A - Training method and device of image processing model, electronic equipment and storage medium

Info

Publication number: CN113361578A
Application number: CN202110602898.XA
Authority: CN
Inventors: 谢群义; 陈毅; 钦夏孟; 章成全; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-07
Anticipated expiration: 2041-05-31
Also published as: CN113361578B

Abstract

The disclosure provides a training method and device of an image processing model, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, particularly relates to the technical fields of computer vision, deep learning and the like, and can be applied to an image recognition scene. The specific implementation scheme is as follows: obtaining a plurality of training data, and searching a network model set in a target search space to obtain a candidate network model, wherein the candidate network model comprises: a plurality of calculation layers, and determining a plurality of sensitivity values respectively corresponding to the plurality of calculation layers; the candidate network model is processed according to the sensitivity values to obtain a network model to be trained, the network model to be trained is trained by adopting the training data to obtain an image processing model, and the structure of the image processing model can be effectively simplified, so that the training efficiency of the image processing model is effectively improved, and the image processing effect is effectively assisted to be improved.

Description

Training method and device of image processing model, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision and deep learning technologies, which can be applied in image recognition scenarios, and in particular, to a training method and apparatus for an image processing model, an electronic device, and a storage medium.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning technology, a deep learning technology, a big data processing technology, a knowledge map technology and the like.

In the related art, when a network model to be trained is trained to obtain an image processing model, a structure of the network model to be trained is designed at a high labor cost, so that the training cost is high, and the training effect and the image processing performance capability of the image processing model are affected.

Disclosure of Invention

A training method, an apparatus, an electronic device, a storage medium, and a computer program product for an image processing model are provided.

According to a first aspect, there is provided a training method of an image processing model, comprising: acquiring a plurality of training data; searching the network model set in the target search space to obtain a candidate network model, wherein the candidate network model comprises: a plurality of computing layers; determining a plurality of sensitivity values respectively corresponding to the plurality of calculation layers; processing the candidate network model according to the sensitivity values to obtain a network model to be trained; and training the network model to be trained by adopting a plurality of training data to obtain an image processing model.

According to a second aspect, there is provided an image processing method comprising: acquiring an image to be processed; and inputting the image to be processed into the image processing model obtained by training according to the training method of the image processing model to obtain target information output by the image processing model.

According to a third aspect, there is provided an apparatus for training an image processing model, comprising: the first acquisition module is used for acquiring a plurality of training data; the search module is used for searching the network model set in the target search space to obtain a candidate network model, and the candidate network model comprises: a plurality of computing layers; the determining module is used for determining a plurality of sensitivity values respectively corresponding to the plurality of computing layers; the processing module is used for processing the candidate network model according to the sensitivity values to obtain a network model to be trained; and the training module is used for training the network model to be trained by adopting a plurality of training data to obtain the image processing model.

According to a fourth aspect, there is provided an image processing apparatus comprising: the second acquisition module is used for acquiring an image to be processed; and the identification module is used for inputting the image to be processed into the image processing model obtained by training of the training device of the image processing model so as to obtain the target information output by the image processing model.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform a training method of an image processing model proposed by an embodiment of the present disclosure, or to perform an image processing method proposed by an embodiment of the present disclosure.

According to a sixth aspect, a non-transitory computer-readable storage medium is proposed, in which computer instructions are stored, the computer instructions being configured to cause the computer to perform the training method of the image processing model proposed by the embodiment of the present disclosure, or to perform the image processing method proposed by the embodiment of the present disclosure.

According to a seventh aspect, a computer program product is proposed, which comprises a computer program, which when executed by a processor implements the training method of the image processing model proposed by the embodiments of the present disclosure, or performs the image processing method proposed by the embodiments of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an image processing model training process proposed by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 8 is a schematic diagram according to a seventh embodiment of the present disclosure;

FIG. 9 is a block diagram of an electronic device for implementing a method of training an image processing model according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

It should be noted that an execution subject of the training method for an image processing model according to this embodiment is a training apparatus for an image processing model, the apparatus may be implemented in a software and/or hardware manner, the apparatus may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision, deep learning and the like, and can be applied to image recognition scenes to effectively simplify the structure of an image processing model, thereby effectively improving the training efficiency of the image processing model and effectively assisting in improving the image recognition effect.

Wherein, Artificial Intelligence (Artificial Intelligence), english is abbreviated as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final goal of deep learning is to make a machine capable of human-like analytical learning, and to recognize data such as characters, images, and sounds.

Computer vision means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eye observation or transmitted to an instrument for detection.

Image processing, also known as image processing, refers to a technique of analyzing an image by using a computer to achieve a desired result. The image processing is an important basic module in the fields of computer vision and deep learning, and can be applied to scenes such as image recognition, image segmentation, image transformation, image classification and the like.

As shown in fig. 1, the training method of the image processing model includes:

s101: a plurality of training data is acquired.

In the embodiment of the present disclosure, a plurality of training data are first obtained for subsequently training the image processing model.

The plurality of training data may be, for example, a plurality of image data, and the image data may be image data acquired by using various image acquisition devices, or may also be image data acquired from the internet, which is not limited to this.

In some embodiments, the training data may be image data in any scene, such as: the training data is financial bill image data, that is, the image processing model provided by the embodiment of the disclosure can execute a task of financial bill processing, and the input sample data is: the financial document image Recognition system comprises a plurality of financial document images (the financial document images can be images obtained by capturing a financial document of an entity by a camera device) to realize Recognition of the financial document images, and specifically, for example, Recognition of characters in the financial document images, namely Optical Character Recognition (OCR) Recognition can be realized.

S102: searching the network model set in the target search space to obtain a candidate network model, wherein the candidate network model comprises: a plurality of computing layers.

After the plurality of training data are obtained, further, the network model set in the target search space is searched to obtain the candidate network model.

The network model set may be related to an actual application scenario of the image processing model, for example: the image processing model is used for recognizing characters in the financial bill image, and then a plurality of network models for character recognition are determined as the network model set, that is, the embodiment of the disclosure supports determining the network model set according to the actual application scene of the image processing model.

For example, the network model set of the present embodiment may include: a residual Neural Network structure (Res-Net Block, Res-Block), a sequence-and-Excitation Block (SE-Block), a Recurrent Neural Network structure (RNN-Block), and any other possible Network structures, which are not limited herein.

And the search space, which is composed of a network model set, may be referred to as a target search space, that is: the target search space includes a plurality of network model structures therein, and the target search space supports a network model search function.

The embodiment of the present disclosure may search out a network structure for image processing from a target search space, and the searched network model may be referred to as a candidate network model, for example: the candidate network model may be one or more of Res-Block, SE-Block, RNN-Block, without limitation.

The candidate network model may be searched from the target Search space by using an automatic model Search technique (Nas), or may be searched by using any other possible model Search technique, which is not limited herein.

Also, multiple computational layers may be included in the candidate network model, such as: convolutional layers, pooling layers, activation function layers, fully-connected layers, and any other possible computation layers, without limitation.

S103: a plurality of sensitivity values corresponding to the plurality of computation layers, respectively, are determined.

After the candidate network model is obtained, further, the embodiment of the present disclosure may determine a plurality of sensitivity values respectively corresponding to the plurality of computation layers.

The sensitivity degree may represent the influence of the computation layer on the whole candidate network model, and the sensitivity degree value may visually represent the strength of the influence of each computation layer on the candidate network model.

In some embodiments, for example, a plurality of sensitivity values corresponding to the plurality of computing layers may be determined by calculating a loss value (loss) of each computing layer to the candidate network model, or may also be determined in any other possible manner, which is not limited herein.

S104: and processing the candidate network model according to the sensitivity values to obtain the network model to be trained.

After determining the sensitivity values corresponding to the computation layers, the embodiment of the present disclosure further processes the candidate network model according to the sensitivity values to obtain the to-be-trained network model.

In some embodiments, the computation layers of the candidate network models may be deleted according to the sensitivity values, for example, the computation layers with smaller sensitivity values are deleted to obtain the network model to be trained.

In other embodiments, the parameters of the computation layer of the candidate network model may be optimized and adjusted according to the sensitivity value to obtain the network model to be trained, or the candidate network model may be processed in any other possible manner, which is not limited herein.

The candidate network model processed by the sensitivity value can have a better network structure, or the volume of the model can be reduced, so that the subsequent training of the image processing model is facilitated.

S105: and training the network model to be trained by adopting a plurality of training data to obtain an image processing model.

After the network model to be trained is obtained, the embodiment of the disclosure may train the network model to be trained by using a plurality of training data (e.g., a plurality of financial bill image data) until the model converges, so as to obtain an image processing model.

In this embodiment, a candidate network model is obtained by obtaining a plurality of training data and searching a network model set in a target search space, where the candidate network model includes: the method comprises the steps of determining a plurality of sensitivity values corresponding to a plurality of calculation layers respectively, processing a candidate network model according to the sensitivity values to obtain a network model to be trained, training the network model to be trained by adopting a plurality of training data to obtain an image processing model, and effectively simplifying the structure of the image processing model, so that the training efficiency of the image processing model is effectively improved, and the image recognition effect is effectively assisted to be improved.

Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 2, the training method of the image processing model includes:

s201: a plurality of image processing tasks corresponding to the image processing scene is determined.

The image processing scenario includes, for example, image recognition, character recognition (OCR), image segmentation, and any other possible processing scenarios, and the image processing scenario may specifically be, for example: and automatically recognizing characters in the financial bill image.

The process of automatically identifying the text in the financial document image may be implemented by a plurality of image processing tasks, such as an input task, a feature extraction task, an identification task, an output task, and any other possible tasks, which is not limited herein.

In the embodiment of the present disclosure, first, a plurality of image processing tasks corresponding to an image processing scene are determined, for example: a plurality of image processing tasks for performing financial instrument text recognition is determined.

S202: a plurality of network model sets respectively corresponding to the plurality of image processing tasks are determined.

Further, a plurality of network model sets respectively corresponding to the plurality of image processing tasks are determined, that is, the embodiment of the present disclosure may determine the plurality of network model sets according to the actual image processing tasks, and thus the determined plurality of network model sets may be matched with the plurality of image processing tasks, so that the corresponding image processing tasks may be completed.

S203: and generating a target search space according to the plurality of network model sets.

Further, a target search space is generated according to the plurality of network model sets, namely: the target search space is formed by a plurality of network model sets, and the target search space can support the network model function, wherein the mode of generating the target search space can adopt the generation mode in the related art, and is not limited here.

Therefore, the embodiment of the disclosure can generate different image search spaces for different image processing tasks, so that requirements of different image processing scenes can be met, and the network model set is determined according to the image processing tasks, so that the network model set can be matched with the image processing tasks, and further, the image processing effect can be improved.

S204: a plurality of training data is acquired.

For the description of S204, reference may be made to the foregoing embodiments specifically, which are not described herein again.

S205: and searching the network model set in the target search space to obtain a plurality of initial network models.

The network model initially searched from the target search space may be referred to as an initial network model, which may be one model or a plurality of models, and is not limited herein.

In some embodiments, the plurality of initial network models may be understood as preliminarily screened network models corresponding to image processing tasks, such as: and if the initial network models obtained by searching in the target search space are Res-Block and SE-Block, the Res-Block and the SE-Block are more matched with the image processing task, and the determination range of the candidate network models can be narrowed through preliminary screening.

S206: and training the plurality of initial network models by adopting the plurality of training data respectively, and acquiring a plurality of image processing effects respectively corresponding to the plurality of trained initial network models.

After obtaining the plurality of initial network models, further, the plurality of initial network models can be trained respectively by adopting a plurality of training data and a plurality of image processing effects can be obtained.

The image processing effect may be represented by a speed of image processing, an accuracy of image processing, and any other possible effect, which is not limited herein.

In some embodiments, the plurality of image processing effects may represent the effect of the initial network model completing the image processing task, and the image processing effect may become a criterion for the plurality of initial network models as candidate network models.

S207: and selecting a candidate network model from the initial network models according to the image processing effects.

Further, the embodiment of the disclosure may select a candidate network model from among the plurality of initial network models according to a plurality of image processing effects.

In some embodiments, the plurality of initial network model models may be ordered according to image processing effects, such as: and sequencing the initial network model models according to the speed of image processing, or sequencing the initial network model models according to the accuracy of image processing, and selecting a candidate network model according to the sequencing result.

In other embodiments, the initial network model models may be ranked according to the speed of image processing, the accuracy of image processing, and the corresponding weight values, and a candidate network model may be selected according to the ranking result, or a candidate network model may be selected from the initial network models in any other possible manner, which is not limited herein.

It should be understood that the above examples are only illustrative of how to determine the candidate network model, and in practical applications, the candidate network model may be determined in any other possible manner, which is not limited herein.

In this embodiment, a plurality of initial network models may be determined, so that the selection range of candidate network models may be initially narrowed down, and the amount of calculation may be reduced. In addition, the candidate network model is determined according to the image processing effect, the accuracy of determining the candidate network model is effectively improved, the requirement of an image processing task can be met, and the image processing effect is further improved.

S208: a plurality of sensitivity values corresponding to the plurality of computation layers, respectively, are determined.

S209: and processing the candidate network model according to the sensitivity values to obtain the network model to be trained.

S210: and training the network model to be trained by adopting a plurality of training data to obtain an image processing model.

For the description of S208 to S210, reference may be made to the above embodiments, which are not described herein again.

In this embodiment, a candidate network model is obtained by obtaining a plurality of training data and searching a network model set in a target search space, where the candidate network model includes: the method comprises the steps of determining a plurality of sensitivity values corresponding to a plurality of calculation layers respectively, processing a candidate network model according to the sensitivity values to obtain a network model to be trained, training the network model to be trained by adopting a plurality of training data to obtain an image processing model, and effectively simplifying the structure of the image processing model, so that the training efficiency of the image processing model is effectively improved, and the image recognition effect is effectively assisted to be improved. In addition, the embodiment can also generate different image search spaces for different image processing tasks, so that the requirements of different image processing scenes can be met, and the network model set is determined according to the image processing tasks and can be matched with the image processing tasks. And by determining a plurality of initial network models, the selection range of the candidate network models can be preliminarily reduced, and the calculation amount of the model determining process can be reduced. In addition, a plurality of initial network models can be determined, so that the selection range of candidate network models can be preliminarily reduced, and the calculation amount can be reduced. In addition, the candidate network model is determined according to the image processing effect, the accuracy of determining the candidate network model is effectively improved, the requirement of an image processing task can be met, and the image processing effect is further improved.

Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure.

As shown in fig. 3, the training method of the image processing model includes:

s301: a plurality of training data is acquired.

S302: and searching the network model set in the target search space to obtain a candidate network model.

For the description of S301 to S302, reference may be made to the above embodiments, which are not described herein again.

S303: the candidate network model is trained using a plurality of training data to determine a first loss value corresponding to the candidate network model.

In the operation of determining a plurality of sensitivity values corresponding to a plurality of computation layers, a candidate network model is first trained using a plurality of training data, and the obtained loss value is referred to as a first loss value.

S304: and deleting the plurality of calculation layers from the candidate network models respectively, and acquiring a plurality of second intermediate network models obtained after the corresponding calculation layers are deleted for multiple times.

Further, a plurality of calculation layers are respectively deleted from the candidate network models, and a plurality of second intermediate network models obtained after the corresponding calculation layers are deleted for many times are obtained.

For example, a layer-by-layer traversal method may be adopted to traverse each computation layer among the multiple computation layers, delete the traversed computation layers, and use the candidate network model from which the computation layers are deleted as the second intermediate network model. Thus, deletion for each computing layer may result in a corresponding second intermediate network model.

S305: a plurality of second loss values corresponding to the plurality of second intermediate network models, respectively, are determined.

Further, a plurality of second loss values corresponding to the plurality of second intermediate network models, respectively, are determined, for example: the plurality of second intermediate network models may be trained respectively using training data to obtain a plurality of corresponding second loss values.

S306: a plurality of loss variation values between the first loss value and the plurality of second loss values, respectively, are determined and taken as a plurality of sensitivity values.

After the first loss value and the plurality of second loss values are determined, a plurality of loss change values between the first loss value and the plurality of second loss values are determined, for example: and respectively carrying out difference calculation on the first loss value and the plurality of second loss values to obtain a plurality of loss change values, and taking the plurality of loss change values as a plurality of sensitivity values. In practical application, when the overall loss of the evaluation set increases after a certain calculation layer is removed, the evaluation set is judged to be an effective layer, and when the overall loss of the evaluation set decreases or does not change after a certain calculation layer is removed, the evaluation set is judged to be a deletable layer. Therefore, the sensitivity value is determined by calculating the loss value, the influence of each calculation layer on the candidate network model can be clearly expressed, and the calculation is simple and easy to realize, so that the training process of the model can be optimized to a certain extent.

It is to be understood that the above examples are only illustrative for determining the sensitivity value, and in practical applications, the sensitivity value may be determined in any other possible manner, which is not limited herein.

S307: the candidate network model is processed according to the sensitivity values to obtain a first intermediate network model.

In some embodiments, the candidate network model further comprises: a plurality of encoding nodes (Encoder nodes), and a plurality of decoding nodes (Decoder nodes). In the operation of processing the candidate network model according to the sensitivity values to obtain the network model to be trained, the candidate network model is processed according to the sensitivity values to obtain a first intermediate network model.

The candidate network model processed by the sensitivity values may be referred to as a first intermediate network model.

In the operation of determining the first intermediate network model, the embodiment of the disclosure may first determine a target computing layer from among a plurality of computing layers according to a plurality of sensitivity values.

In some embodiments, a corresponding sensitivity threshold may be set, a plurality of sensitivity values corresponding to the plurality of computing layers are compared with the sensitivity threshold, and then a target computing layer is determined according to a comparison result, for example: and taking the computing layer corresponding to the sensitivity value smaller than the sensitivity threshold as a target computing layer, where the target computing layer may be one computing layer or multiple computing layers, and is not limited herein. The determination speed of the target calculation layer can be improved through comparison of the sensitivity threshold values, and further the training speed of the model can be improved. In addition, the sensitivity threshold can be flexibly adjusted, so that the requirement on the sensitivity of the target computing layer under different application scenes can be met.

In other embodiments, the computing layer may also be designated as a target computing layer according to an actual application scenario, for example: the convolutional layer is designated as the target computation layer.

Further, the target computing layer is deleted from among the plurality of computing layers to get the remaining computing layers, for example: and deleting the convolution layer in an operation block (block) in the candidate network model to obtain the residual calculation layer. Wherein, deleting the target computation layer can be understood as clipping (Channel) the network structure of the candidate network model.

Further generating a first intermediate network model from the remaining computation layers, the plurality of coding nodes (encoders), and the plurality of decoding nodes (Dncoders), namely: the candidate network model is clipped to obtain a clipping model (a first intermediate network model), and then the clipped model is adjusted by a small amount to obtain a candidate network model with a smaller volume. In the embodiment, the target calculation layer in the candidate network model is cut, so that the volume of the candidate network model can be obviously reduced, the subsequent model training speed is favorably improved, and the miniaturized design of the image processing model is realized. Experiments show that the training speed can be increased by 6.16 times through the cutting model, and the volume of the model is reduced by 95.5%.

In practical application, because the coordinate regression effect strongly depends on the head layer, if there is no association between the corresponding computation layer and the target computation layer (for example, the target computation layer is the head layer), the corresponding computation layer is used as the target computation layer, that is, the target computation layer in the head layer may not be deleted in the embodiment of the present disclosure, so that the coordinate regression effect may be ensured, and the accuracy of the model may be ensured.

S308: and processing the first intermediate network model according to a plurality of distillation loss values respectively corresponding to the plurality of coding nodes and the plurality of decoding nodes to obtain a network model to be trained.

In practical applications, miniaturization of the model may be accompanied by a loss of accuracy, and distillation techniques can effectively migrate the "knowledge" of the large model to the small model, thereby improving the accuracy of the small model.

The present implementation may process the first intermediate network model according to a plurality of distillation loss values corresponding to a plurality of encoding nodes and a plurality of decoding nodes, respectively, namely: the embodiment supports the adoption of distillation technology to process the first intermediate network model, and realizes the miniaturization of the model.

Specifically, the method can set distillation loss (loss) supervision after a plurality of computing blocks (blocks) in a plurality of encoding nodes Encoders are output, set distillation loss (loss) supervision at a plurality of decoding nodes Decoder output, obtain a plurality of distillation loss values according to the distillation loss (loss) supervision, and process a first intermediate network model according to the distillation loss values to obtain the network model to be trained. Therefore, in the embodiment, the small model based on Nas and clipping can be accurately aligned to the large model through the distillation technology, and the technical effect of considering the model accuracy in the process of model miniaturization is achieved.

In some embodiments, the scheme can also support an offline quantization method to accelerate the prediction speed of the model. Specifically, a plurality of first output data respectively corresponding to the remaining computation layers, the plurality of coding nodes, and the plurality of decoding nodes may be determined, where the first output data corresponds to a data type, for example: floating point float32 type.

Further, determining a data mapping relationship, where the data mapping relationship is used to map a plurality of first output data to a plurality of corresponding second output data, where data types of the first output data and the second output data are different, where the data type of the first output data is float32, and the data type of the second output data is, for example: integer int8 type or other data type. In actual operation, a float32 numerical distribution graph of each layer can be obtained through calculation by a small amount of scene evaluation sets (such as financial bill image data), and the mapping mode from each float32 value to an int8 interval can be determined by determining the maximum and minimum values of the numerical distribution graph, so that int8 quantization is realized. Therefore, the off-line quantification method can improve the prediction speed of the image processing model, and further improve the image identification effect.

In addition, the network model provided by the embodiment of the disclosure may be a network model with a hierarchical structure (for example, an op-level quantization model), and in the process of performing quantization, a hierarchy may be selected to perform quantization, that is: the layers participating in quantization may be selectively determined, thereby minimizing loss of precision. The method is verified in a financial bill scene, the speed of a single model can be increased by 10 times, and indexes are almost lossless.

S309: and inputting a plurality of training data into the network model to be trained to obtain the prediction information output by the network model to be trained, wherein the prediction information is quantized according to the data mapping relation.

After the data mapping relationship is determined, the embodiment may input a plurality of training data (for example, financial document image data) into the network model to be trained to obtain the prediction information output by the network model to be trained, where the prediction information is actual output data in the network model training process, for example: the forecast information may be the name, type, and specific information of the financial bill, etc., and the forecast information is determined according to the actual demand, which is not limited herein.

Also, the prediction information has been subjected to quantization processing according to a data mapping relationship, that is, the data type of the prediction information may be int8 type.

S310: and if the loss values between the prediction information and the plurality of marking information respectively corresponding to the plurality of training data meet the set conditions, taking the trained network model as an image processing model.

Further, it is determined that the loss value between the prediction information and the plurality of label information corresponding to the plurality of training data satisfies a set condition, for example: the model convergence condition is satisfied.

Wherein the labeling information corresponds to training data, for example: the training data is financial bills, and the labeling information may be names, types, specific information and the like of the financial bills, which is not limited here.

That is, the present embodiment supports the mode of supervised training to train the model until the loss value between the prediction information and the labeled information satisfies the set condition, and completes the training of the model. Therefore, the accuracy of model training can be improved by adopting a supervision training mode, and the prediction information is processed by the mapping relation, so that the training speed of the model can be improved.

In a specific example, fig. 4 is a schematic diagram of an image processing model training process proposed in the embodiment of the present disclosure, and as shown in fig. 4, first, a model search is performed, and further, pruning is performed on the searched model, that is: cutting the model structure; further, the 'knowledge' of the large model can be migrated to the small model (the cut model) by distillation to realize the supervision of the precision of the cut small model; further, the distillation-processed model may be subjected to quantization processing to obtain a final image processing model, or the clipping model may be directly subjected to quantization processing to obtain an image processing model, which is not limited in this respect.

In this embodiment, a candidate network model is obtained by obtaining a plurality of training data and searching a network model set in a target search space, where the candidate network model includes: the method comprises the steps of determining a plurality of sensitivity values corresponding to a plurality of calculation layers respectively, processing a candidate network model according to the sensitivity values to obtain a network model to be trained, training the network model to be trained by adopting a plurality of training data to obtain an image processing model, and effectively simplifying the structure of the image processing model, so that the training efficiency of the image processing model is effectively improved, and the image recognition effect is effectively assisted to be improved. In addition, the sensitivity value is determined by calculating the loss value, so that the influence of each calculation layer on the candidate network model can be clearly represented, and the calculation is simple and easy to realize, so that the training process of the model can be optimized to a certain extent. And the sensitivity threshold can be flexibly adjusted, so that the requirement on the sensitivity of the target calculation layer under different application scenes can be met. And the coordinate regression effect can be ensured, and the accuracy of the model is ensured. In addition, the small model based on Nas and clipping can be accurately aligned to the large model through the distillation technology, and the technical effect of considering the model accuracy in the model miniaturization process is achieved. The accuracy of model training can be improved by adopting a supervision training mode, and the prediction information is processed by a mapping relation, so that the training speed of the model can be improved.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 5, the image processing method includes:

s501: and acquiring an image to be processed.

The image processing method comprises the steps of firstly obtaining an image to be processed, wherein the image to be processed can be an image in any scene, for example, the image processing method can be applied to a financial bill identification scene, and the image to be processed can be a financial bill image.

S502: and inputting the image to be processed into the image processing model obtained by training the training method of the image processing model to obtain the target information output by the image processing model.

After the image to be processed is obtained, the image to be processed (for example, a financial bill image) is further input into the image processing model obtained by the training method of the image processing model, and corresponding target information is output, where the target information is, for example: financial instrument type, financial instrument textual information, and any other possible information, without limitation.

In this embodiment, the target information output by the image processing model is obtained by obtaining the image to be processed and inputting the image to be processed into the image processing model obtained by the training of the image processing model according to the above-mentioned training method.

Fig. 6 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 6, the training apparatus 60 for an image processing model includes:

a first obtaining module 601, configured to obtain a plurality of training data;

a searching module 602, configured to search the network model set in the target search space to obtain a candidate network model, where the candidate network model includes: a plurality of computing layers;

a first determining module 603, configured to determine a plurality of sensitivity values corresponding to the plurality of computing layers, respectively;

the processing module 604 is configured to process the candidate network model according to the multiple sensitivity values to obtain a to-be-trained network model; and

the training module 605 is configured to train the network model to be trained by using a plurality of training data to obtain an image processing model.

Optionally, in some embodiments of the present disclosure, as shown in fig. 7, fig. 7 is a schematic diagram of a training apparatus 70 for an image processing model according to a fifth embodiment of the present disclosure, including: the image processing model training device 70 comprises a first obtaining module 701, a searching module 702, a first determining module 703, a processing module 704 and a training module 705, wherein the image processing model training device further comprises:

a second determining module 706 for determining a plurality of image processing tasks corresponding to the image processing scene;

a third determining module 707 configured to determine a plurality of network model sets respectively corresponding to the plurality of image processing tasks; and

a generating module 708 configured to generate a target search space according to the plurality of network model sets.

Optionally, in some embodiments of the disclosure, as shown in fig. 7, the search module 702 includes:

the searching submodule 7021 is configured to search the network model set in the target search space to obtain a plurality of initial network models;

a training submodule 7022, configured to train the multiple initial network models with the multiple training data, and obtain multiple image processing effects corresponding to the trained multiple initial network models, respectively; and

the selecting sub-module 7023 is configured to select a candidate network model from the plurality of initial network models according to the plurality of image processing effects.

Optionally, in some embodiments of the present disclosure, as shown in fig. 7, the processing module 704 includes:

a first processing sub-module 7041, configured to process the candidate network model according to the multiple sensitivity values to obtain a first intermediate network model;

the second processing submodule 7042 is configured to process the first intermediate network model according to the plurality of distillation loss values respectively corresponding to the plurality of encoding nodes and the plurality of decoding nodes, so as to obtain a network model to be trained.

Optionally, in some embodiments of the present disclosure, the first processing sub-module 7041 is specifically configured to: determining a target computing layer from a plurality of computing layers according to the sensitivity values; deleting a target computing layer from the plurality of computing layers to obtain a remaining computing layer; a first intermediate network model is generated from the remaining computational layers, the plurality of encoding nodes, and the plurality of decoding nodes.

Optionally, in some embodiments of the present disclosure, as shown in fig. 7, the first determining module 703 includes:

a first determining sub-module 7031, configured to train the candidate network model using a plurality of training data to determine a first loss value corresponding to the candidate network model;

a deletion submodule 7032, configured to delete the multiple computation layers from the candidate network models, and obtain multiple second intermediate network models obtained after deleting the corresponding computation layers multiple times;

a second determining submodule 7033 configured to determine a plurality of second loss values respectively corresponding to the plurality of second intermediate network models; and

a third determining sub-module 7034 is configured to determine a plurality of loss variation values between the first loss value and the plurality of second loss values, respectively, as a plurality of sensitivity values.

Optionally, in some embodiments of the present disclosure, the first processing sub-module 7041 is specifically configured to:

and determining a computing layer corresponding to the sensitivity degree value smaller than the sensitivity threshold value, and taking the corresponding computing layer as a target computing layer.

and when the corresponding computing layer is not associated with the target computing layer, taking the corresponding computing layer as the target computing layer.

determining a plurality of first output data respectively corresponding to the remaining computation layers, a plurality of coding nodes and a plurality of decoding nodes; and determining a data mapping relation, wherein the data mapping relation is used for mapping the first output data to the corresponding second output data respectively, the data types of the first output data and the second output data are different, and the data mapping relation is used for training the network model to be trained.

Optionally, in some embodiments of the present disclosure, the training module 705 is specifically configured to: inputting a plurality of training data into the network model to be trained to obtain prediction information output by the network model to be trained, wherein the prediction information is quantized according to a data mapping relation; and if the loss values between the prediction information and the plurality of marking information respectively corresponding to the plurality of training data meet the set conditions, taking the trained network model as an image processing model.

It is understood that the training apparatus 70 of the image processing model in fig. 7 of the present embodiment and the training apparatus 60 of the image processing model in the foregoing embodiment, the first obtaining module 701 and the first obtaining module 601 in the foregoing embodiment, the searching module 702 and the searching module 602 in the foregoing embodiment, the first determining module 703 and the first determining module 603 in the foregoing embodiment, the processing module 704 and the processing module 604 in the foregoing embodiment, and the training module 705 and the training module 605 in the foregoing embodiment may have the same functions and structures.

It should be noted that the above explanation of the training method of the image processing model is also applicable to the training apparatus of the image processing model of the present embodiment, and is not repeated herein.

Fig. 8 is a schematic diagram according to a sixth embodiment of the present disclosure.

As shown in fig. 8, the image processing apparatus 80 includes:

a second obtaining module 801, configured to obtain an image to be processed;

the recognition module 802 is configured to input the image to be processed into the image processing model obtained by training of the training apparatus of the image processing model as described above, so as to obtain target information output by the image processing model.

It should be noted that the foregoing explanation of the image processing method is also applicable to the image processing apparatus of the present embodiment, and is not repeated here.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 is a block diagram of an electronic device for implementing a method of training an image processing model according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, for example, a training method of an image processing model, or an image processing method.

For example, in some embodiments, the training method of the image processing model, or the image processing method, may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM902 and/or communications unit 909. When the computer program is loaded into the RAM903 and executed by the computing unit 901, the training method of the image processing model described above, or one or more steps of the image processing method may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g. by means of firmware) to perform a training method of an image processing model, or an image processing method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

The training methods for implementing the image processing models of the present disclosure, or the program code for the image processing methods, may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable image processing model training apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training an image processing model, comprising:

acquiring a plurality of training data;

searching a network model set in a target search space to obtain a candidate network model, wherein the candidate network model comprises: a plurality of computing layers;

determining a plurality of sensitivity values respectively corresponding to the plurality of calculation layers;

processing the candidate network model according to the sensitivity values to obtain a network model to be trained; and

and training the network model to be trained by adopting the plurality of training data to obtain an image processing model.

2. The method of claim 1, further comprising, before the searching the set of network models in the target search space for the candidate network model:

determining a plurality of image processing tasks corresponding to an image processing scene;

determining a plurality of network model sets respectively corresponding to the plurality of image processing tasks; and

and generating the target search space according to the plurality of network model sets.

3. The method of claim 2, wherein the searching the set of network models within the target search space to obtain the candidate network model comprises:

searching the network model set in the target search space to obtain a plurality of initial network models;

training the plurality of initial network models by adopting the plurality of training data respectively, and acquiring a plurality of image processing effects respectively corresponding to the trained plurality of initial network models; and

and selecting the candidate network model from the initial network models according to the image processing effects.

4. The method of claim 1, the candidate network model further comprising: the method comprises the following steps of processing the candidate network model according to the sensitivity values to obtain a network model to be trained, wherein the steps comprise:

processing the candidate network model according to the sensitivity values to obtain a first intermediate network model;

and processing the first intermediate network model according to a plurality of distillation loss values respectively corresponding to the plurality of coding nodes and the plurality of decoding nodes to obtain the network model to be trained.

5. The method of claim 4, wherein said processing the candidate network model according to the plurality of sensitivity values to derive a first intermediate network model comprises:

determining a target computing layer from the plurality of computing layers according to the sensitivity values;

deleting the target computing layer from among the plurality of computing layers to obtain remaining computing layers;

generating the first intermediate network model from the remaining computational layers, the plurality of encoding nodes, and the plurality of decoding nodes.

6. The method of claim 5, wherein the determining a plurality of sensitivity values corresponding to the plurality of computing layers, respectively, comprises:

training the candidate network model using the plurality of training data to determine a first loss value corresponding to the candidate network model;

deleting a plurality of computing layers from the candidate network models respectively, and acquiring a plurality of second intermediate network models obtained after deleting the corresponding computing layers for a plurality of times;

determining a plurality of second loss values respectively corresponding to the plurality of second intermediate network models; and

determining a plurality of loss change values between the first loss value and the plurality of second loss values respectively as the plurality of sensitivity values.

7. The method of claim 6, wherein said determining a target computing layer from among the plurality of computing layers based on the plurality of sensitivity values comprises:

and determining a computing layer corresponding to the sensitivity degree value smaller than a sensitivity threshold value, and taking the corresponding computing layer as the target computing layer.

8. The method of claim 7, wherein the regarding the corresponding computing layer as the target computing layer comprises:

and if the corresponding computing layer does not have an association relation with a target computing layer, taking the corresponding computing layer as the target computing layer.

9. The method of claim 5, further comprising, after the generating the first intermediate network model from the remaining computational layers, the plurality of encoding nodes, and the plurality of decoding nodes:

determining a plurality of first output data respectively corresponding to the remaining computation layers, the plurality of coding nodes and the plurality of decoding nodes;

determining a data mapping relation, wherein the data mapping relation is used for mapping the first output data to a corresponding second output data respectively, the data types of the first output data and the second output data are different, and the data mapping relation is used for training the network model to be trained.

10. The method of claim 9, wherein the training the network model to be trained using the plurality of training data to obtain an image processing model comprises:

inputting the training data into the network model to be trained to obtain prediction information output by the network model to be trained, wherein the prediction information is quantized according to the data mapping relation;

and if the loss values between the prediction information and the plurality of marking information respectively corresponding to the plurality of training data meet set conditions, taking the trained network model as the image processing model.

11. An image processing method comprising:

acquiring an image to be processed;

inputting the image to be processed into the image processing model obtained by training the training method of the image processing model according to any one of the above 1-10, so as to obtain the target information output by the image processing model.

12. An apparatus for training an image processing model, comprising:

the first acquisition module is used for acquiring a plurality of training data;

a search module, configured to search a network model set in a target search space to obtain a candidate network model, where the candidate network model includes: a plurality of computing layers;

a first determining module, configured to determine a plurality of sensitivity values respectively corresponding to the plurality of computing layers;

the processing module is used for processing the candidate network model according to the sensitivity values to obtain a network model to be trained; and

and the training module is used for training the network model to be trained by adopting the plurality of training data to obtain an image processing model.

13. The apparatus of claim 12, further comprising:

a second determining module for determining a plurality of image processing tasks corresponding to the image processing scene;

a third determining module, configured to determine a plurality of network model sets respectively corresponding to the plurality of image processing tasks; and

and the generating module is used for generating the target search space according to the plurality of network model sets.

14. The apparatus of claim 13, wherein the search module comprises:

the searching submodule is used for searching the network model set in the target searching space to obtain a plurality of initial network models;

the training submodule is used for respectively training the plurality of initial network models by adopting the plurality of training data and acquiring a plurality of image processing effects respectively corresponding to the trained plurality of initial network models; and

and the selection sub-module is used for selecting the candidate network model from the plurality of initial network models according to the plurality of image processing effects.

15. The apparatus of claim 12, the candidate network model further comprising: a plurality of encoding nodes, and a plurality of decoding nodes, wherein the processing module comprises:

the first processing sub-module is used for processing the candidate network model according to the sensitivity values to obtain a first intermediate network model;

and the second processing submodule is used for processing the first intermediate network model according to a plurality of distillation loss values respectively corresponding to the plurality of coding nodes and the plurality of decoding nodes so as to obtain the network model to be trained.

16. The apparatus according to claim 15, wherein the first processing submodule is specifically configured to:

17. The apparatus of claim 16, wherein the first determining means comprises:

a first determining sub-module, configured to train the candidate network model using the plurality of training data to determine a first loss value corresponding to the candidate network model;

the deleting submodule is used for respectively deleting a plurality of computing layers from the candidate network models and acquiring a plurality of second intermediate network models obtained after the corresponding computing layers are deleted for a plurality of times;

a second determining submodule configured to determine a plurality of second loss values respectively corresponding to the plurality of second intermediate network models; and

and the third determining submodule is used for determining a plurality of loss change values between the first loss value and the plurality of second loss values respectively and serving as the plurality of sensitivity values.

18. The apparatus of claim 17, wherein the first processing submodule is specifically configured to:

19. The apparatus of claim 18, wherein the first processing submodule is specifically configured to:

and when the corresponding computing layer does not have an association relation with a target computing layer, taking the corresponding computing layer as the target computing layer.

20. The apparatus according to claim 16, wherein the first processing submodule is specifically configured to:

21. The apparatus of claim 20, wherein the training module is specifically configured to:

22. An image processing apparatus comprising:

the second acquisition module is used for acquiring an image to be processed;

and the recognition module is used for inputting the image to be processed into the image processing model obtained by training of the training device of the image processing model according to any one of the above 12-21 so as to obtain the target information output by the image processing model.

23. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10 or to perform the method of claim 11.

24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-10 or to perform the method of claim 11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-10 or performs the method of claim 11.