CN116703777A

CN116703777A - Image processing method, system, storage medium and electronic equipment

Info

Publication number: CN116703777A
Application number: CN202310764276.6A
Authority: CN
Inventors: 石剑锋; 刘罡
Original assignee: Wuhan Net Power Technology Co ltd
Current assignee: Wuhan Net Power Technology Co ltd
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-09-05

Abstract

The application discloses an image processing method, an image processing system, a storage medium and electronic equipment, which are used for acquiring a document image to be processed, carrying out definition enhancement operation on the document image to be processed through an image enhancement model which is pre-deployed in the embedded equipment to obtain an enhanced document image, wherein the image enhancement model is obtained through pre-training operation, the pre-training operation is operation for reducing the reasoning time of the image enhancement model, reducing the model power consumption of the image enhancement model and improving the precision of the image enhancement model, the image enhancement model is deployed in the embedded equipment through a preset deployment mode, and the preset deployment mode is a deployment mode for improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment.

Description

Image processing method, system, storage medium and electronic equipment

Technical Field

The present application relates to the field of document image enhancement, and more particularly, to an image processing method, system, storage medium, and electronic device.

Background

In daily work and life, we often need to take document images, such as contracts, invoices, certificates, etc. Due to the limitation of shooting conditions, the document images may have problems of shadow, blurring, noise, distortion and the like to a certain extent, so that the quality and the readability of the images are reduced, and the subsequent processing and use are inconvenient. Therefore, enhancement of images is needed to improve the sharpness, quality, and readability of the images.

The existing image enhancement mode needs support of a high-performance computer or special hardware, or the existing image enhancement mode is poor in image enhancement effect, poor in algorithm robustness and the like, so that the method cannot adapt to the shadow and blurring problems of a special scene, and the image is unclear.

Therefore, how to improve the definition, quality and readability of images is a problem to be solved by the present application.

Disclosure of Invention

In view of the above, the present application discloses an image processing method, system, storage medium and electronic device, which aim to perform sharpness enhancement operation on a document image to be processed through an image enhancement model deployed in an embedded device, so as to obtain an enhanced document image, thereby improving sharpness, quality and readability of the image.

In order to achieve the above purpose, the technical scheme disclosed by the method is as follows:

the first aspect of the application discloses an image processing method, which comprises the following steps:

acquiring a document image to be processed;

performing definition enhancement operation on the document image to be processed through an image enhancement model which is pre-deployed in the embedded equipment to obtain an enhanced document image; the image enhancement model is obtained through a pre-training operation; the pre-training operation is an operation of reducing the reasoning time of the image enhancement model, reducing the model power consumption of the image enhancement model and improving the precision of the image enhancement model; the image enhancement model is deployed in the embedded equipment in a preset deployment mode; the preset deployment mode is a deployment mode for improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment.

Preferably, the method further comprises:

acquiring a training data set;

the acquiring training data set includes:

acquiring a training picture set meeting preset clear conditions;

randomly adding shadow, blurring and/or noise degradation effects to the training picture set to obtain a comparison picture set;

performing data enhancement processing on the comparison picture set to obtain a picture set with enhanced data;

and unifying the sizes of the picture sets after the data enhancement by a unified size processing mode to obtain a training data set.

Preferably, the process of the pre-training operation of the image enhancement model includes:

constructing a lightweight guiding residual error network model and a heavyweight guiding residual error network model;

performing model pruning operation on the lightweight guided residual network model and the lightweight guided residual network model; the model pruning operation is used for reducing model reasoning time and reducing model power consumption;

performing model distillation on the lightweight guided residual error network model after model pruning operation through the lightweight guided residual error network model after model pruning operation; the model distillation is used for improving the precision of the lightweight guide residual error network model after the model pruning operation;

And model training is carried out on the heaviness guiding residual error network model after the model pruning operation and the lightweight guiding residual error network model after the model distillation through an absolute value mean error loss function, a multi-scale structure similarity loss function and the training data set, so as to obtain an image enhancement model.

Preferably, the process of deploying the image enhancement model in the embedded device through a preset deployment mode includes:

converting the model format of the graphic enhancement model through a deep learning prediction framework to obtain a graphic enhancement model with the converted model format;

carrying out model quantization on the image enhancement model; the model quantization is used for improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment;

creating an independent process in the embedded device; the independent process is an independent process isolated from a program main process of the embedded equipment;

and taking the independent process as a model reasoning process, and deploying the image enhancement model after model quantification in the embedded device through the model reasoning process.

Preferably, after the obtaining the document image to be processed, the performing sharpness enhancement operation on the document image to be processed by using an image enhancement model pre-deployed in the embedded device, before obtaining the enhanced document image, further includes:

Performing image preprocessing on the document image to be processed;

the process for preprocessing the document image to be processed comprises the following steps:

and carrying out automatic document selection and document direction adjustment on the document image to be processed.

Preferably, the method further comprises:

and displaying the enhanced document image through the embedded equipment.

Preferably, the method further comprises:

and generating a corresponding application program, and running the application program in the embedded device to execute image processing.

In a second aspect, the application discloses an image processing system, comprising:

a first acquisition unit configured to acquire a document image to be processed;

the operation unit is used for carrying out definition enhancement operation on the document image to be processed through an image enhancement model which is pre-deployed in the embedded equipment to obtain an enhanced document image; the image enhancement model is obtained through a pre-training operation; the pre-training operation is an operation of reducing the reasoning time of the image enhancement model, reducing the model power consumption of the image enhancement model and improving the precision of the image enhancement model; the image enhancement model is deployed in the embedded equipment in a preset deployment mode; the preset deployment mode is a deployment mode for improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment.

A third aspect of the present application discloses a storage medium comprising stored instructions, wherein the instructions, when executed, control a device in which the storage medium is located to perform the image processing method according to any one of the first aspects.

A fourth aspect of the application discloses an electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors by the image processing method according to any of the first aspects.

According to the technical scheme, the image processing method, the system, the storage medium and the electronic equipment are used for acquiring the document image to be processed, the definition enhancement operation is carried out on the document image to be processed through the image enhancement model which is pre-deployed in the embedded equipment, the enhanced document image is obtained, the image enhancement model is obtained through the pre-training operation, the pre-training operation is used for reducing the reasoning time of the image enhancement model, reducing the model power consumption of the image enhancement model and improving the accuracy of the image enhancement model, the image enhancement model is deployed in the embedded equipment in a preset deployment mode, and the preset deployment mode is a deployment mode for improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment. According to the scheme, the image is enhanced without an additional high-performance computer or special hardware to improve the definition of the image, the pre-training operation is only needed to be carried out on the image enhancement model, so that the pre-training operation can be carried out on the document image with the problems of shadow, blurring, noise and the like, the model reasoning time is reduced by carrying out the pre-training operation on the image enhancement model, the image enhancement model keeps better robustness, the pre-training operation is carried out on the image enhancement model in an embedded device such as a mobile phone by improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded device, the pre-training operation is carried out on the image enhancement model deployed in the embedded device to carry out the definition enhancement operation on the document image to be processed, and the enhanced document image is obtained, and therefore the definition, quality and readability of the image are improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a low contrast document image according to an embodiment of the present application;

fig. 3 is a schematic diagram of a guidereflite model according to an embodiment of the present application;

FIG. 4 is a schematic illustration of an enhanced document image disclosed in an embodiment of the present application;

FIG. 5 is a schematic diagram of an image processing system according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As known from the background art, the existing image enhancement method needs support of a high-performance computer or special hardware, or the existing image enhancement method has poor image enhancement effect, poor algorithm robustness and the like, so that the method cannot adapt to the shadow and blurring problems of a special scene, and the image is unclear. Therefore, how to improve the definition, quality and readability of images is a problem to be solved by the present application.

In order to solve the problems, the application discloses an image processing method, an image processing system, a storage medium and electronic equipment, which do not need to carry out enhancement on images by an additional high-performance computer or special hardware to improve the definition of the images, only need to carry out pre-training operation on the image enhancement model, so that the pre-training operation-carried out image enhancement model can directly carry out shadow removal, noise removal and definition enhancement on document images containing shadow, blurring, noise and other problems, the model reasoning time is reduced by carrying out the pre-training operation on the image enhancement model, the image enhancement model keeps better robustness, the pre-training deployment mode of improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment is adopted, the pre-training operation-carried out image enhancement model is deployed in the embedded equipment such as a mobile phone, and the like, and the definition enhancement operation is carried out on the document images to be processed by the image enhancement model deployed in the embedded equipment to obtain the enhanced document images, thereby improving the definition, quality and readability of the images. Specific implementations are illustrated by the following examples.

Referring to fig. 1, a flowchart of an image processing method according to an embodiment of the present application is shown, where the image processing method mainly includes the following steps:

s101: and acquiring a document image to be processed.

The method for acquiring the document image to be processed can acquire the document image to be processed by using equipment with shooting function such as a mobile phone, a tablet personal computer and the like, or import the pre-stored document image to be processed from the equipment such as the tablet personal computer and the like.

The document image to be processed is a document image which is not subjected to image preprocessing. The document image to be processed includes a contract image, an invoice image, a certificate image, and the like.

And carrying out image preprocessing on the document image to be processed. Image preprocessing includes automatic framing of documents and document orientation adjustment. And carrying out automatic document selection and document direction adjustment on the document image to be processed. Image preprocessing facilitates subsequent viewing of the content of the enhanced document image.

S102: performing definition enhancement operation on a document image to be processed through an image enhancement model which is pre-deployed in embedded equipment, so as to obtain an enhanced document image; the image enhancement model is obtained through a pre-training operation; the pre-training operation is an operation of reducing the reasoning time of the image enhancement model, reducing the model power consumption of the image enhancement model and improving the precision of the image enhancement model; the image enhancement model is deployed in the embedded equipment in a preset deployment mode; the preset deployment mode is a deployment mode for improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment.

The embedded device can be a mobile phone, a tablet computer and the like.

In the process of pre-training the image enhancement model, a training data set is required to be acquired first.

The specific process of acquiring training data is shown in A1-A4.

A1: and acquiring a training picture set meeting preset clear conditions.

The training picture set meeting the preset clear condition is the training picture set with the resolution being more than or equal to the preset resolution. The predetermined resolution may be 1024×768, or 1280×1024, etc. The specific preset resolution is set according to the actual situation, and the application is not particularly limited.

For example, a plurality of files with 1024×768 resolution ratios and with a PDF format, which are public and have no copyright limitation, are obtained, each page in the files is taken as a picture, and the pictures are regarded as a training picture set meeting preset clear conditions.

A2: and randomly adding shadow, blurring and/or noise degradation effects to the training picture set to obtain a comparison picture set.

The contrast picture set is a blurred, shadow and low-contrast document image. Specific contrast picture sets can be seen with reference to fig. 2, fig. 2 showing a schematic diagram of a low contrast document image. Fig. 2 is merely an example.

The shading process is as follows: for example, transparent images only containing shadows are obtained, and the shadow images are fused with high-definition document images (fusion can be carried out by using an image fusion function addWeighted of a cross-platform computer vision and machine learning software library (opencv)) to obtain a document image containing shadows.

Adding a blurring process: for example, the blurring effect is added to the training picture set meeting the preset definition condition by using image processing software (such as Photoshop software), and specifically, the degradation effect of adding blurring can be completed by using the filter operation of the image processing software.

The noise adding process comprises the following steps: for example, the opencv is used for adding the Gaussian noise to the training picture set meeting the preset clear condition, specifically, the degree of noise addition is controlled by adjusting the size of the Gaussian distribution standard deviation (sigma), and the larger the sigma is, the more the noise added, the more serious the picture damage is.

A3: and carrying out data enhancement processing on the comparison picture set to obtain the picture set with enhanced data.

In A3, the data enhancement processing is performed on the comparison picture set, so that the diversity of samples is enriched, and specific data enhancement processing methods include size compression, saturation conversion, brightness conversion, random cutting, rotation, content inversion and the like.

The generalization capability and the robustness of the model can be improved by data enhancement processing, and the sharpness enhancement capability of the model on images with different scales can be enhanced by size compression; the saturation transformation and the brightness transformation can enable the model to effectively enhance the document image under the environment of abnormal illumination or shadow; the document images in different directions can be enhanced by the enhancement model through random cutting, rotation and content overturning.

A4: and carrying out size unification on the data-enhanced picture set in a unified size processing mode to obtain a training data set.

In A4, in order to ensure the consistency of the input size during the training of the data enhancement model, the picture set after the data enhancement needs to be uniform in size, in the practical training, a plurality of 512x512 areas are sequentially cut from left to right and from top to bottom for each picture, and the shortage is filled with white, so that the integrity of the content characteristics of each picture is ensured, and key characteristics (namely a shadow part, a fuzzy part and a noise part in the picture) can be learned conveniently during the model training.

To facilitate an understanding of the insufficient white filling process, an example is illustrated herein:

when an original picture is 1000x1000, in order to meet the condition of size consistency during training, the original picture needs to be cut into 512x512 image blocks, and the 1000x1000 size can be cut into four small blocks of 512x512, 488x488, and 488x488, wherein the three small blocks of 488x488 and 488x512 and 512x488 do not meet the size of 512x512, so that white pixel supplementation is required for the insufficient size of the three small blocks, compared with direct enlargement to the size of 512x512, the detail texture of the original image can be completely reserved in a manner of incomplete supplementation, and the detail texture can be lost due to the fact that the image information is compressed in an enlarged manner.

The process of pre-training the image enhancement model is as in B1-B4.

B1: and constructing a lightweight guided residual error network model and a heavyweight guided residual error network model.

The process of pre-training the image enhancement model may involve sharpness enhancement algorithms. The definition enhancement algorithm adopts a deep learning technology, and the overall thought is as follows: on one hand, a document image definition enhancement model is trained through a large number of document images with high definition and corresponding fuzzy or low-contrast image data sets, so that the trained model can directly perform shadow removal, noise removal and definition enhancement on document images with shadow, fuzzy, noise and other problems; on the other hand, in order to ensure that the model can be used in embedded equipment such as mobile phones and the like and maximize the performance of the model under limited resources, technologies such as model pruning, distillation, quantization and the like are introduced to reduce the model reasoning time and keep better robustness.

The specific sharpness enhancement algorithm is as follows:

the network is built based on UNet (U-net), called guiderekunet (guided residual U-net), and specifically consists of two modules, namely a residual U-net (residual) module and a guided filtering (guiderev) module. In order to facilitate training and precision improvement of an image enhancement model, two guidererents, a lightweight guided residual network (guidereferente) model and a heavy guided residual network (guidereferenet plus) model are constructed, wherein the guidereference plus model is mainly used for isomorphic distillation of the guidereferente model, and the guidereferente model is used for deployment of an actual image enhancement model.

The GuideResunetPlus model has larger parameter quantity and higher precision, but has the defect of longer reasoning time consumption, so that the lower reasoning time consumption and the precision equivalent to that of the GuideResunetPlus model can be ensured when only a small quantity of parameters are used for the GuideResunetLite model, and isomorphic distillation is performed on the GuideResunetLite model. If isomorphic distillation is not used, the GuideResunetLite model has the characteristics of low parameter number and low time consumption, and the accuracy is much lower than that of the GuideResunetPlus model.

A schematic structural diagram of a specific guidereflite model is shown in fig. 3.

The rectangles in fig. 3 represent convolution layers without special description, the + represents feature map addition, and the C represents the concat operation, and the features are spliced in the channel number dimension.

The overall execution flow of fig. 3 is as shown in (1) - (4).

(1) The document image is input, the shape of the document image is (1, 3, 512, 512), and the image of 0.25 times is obtained through the downsampling module. The shape is the number of images, the number of channels (three channels of the color system RGB), the number of lines (high) of a single image, and the number of columns (wide) of a single image. Where 1 denotes the number of images, 3 denotes the number of image channels, the first 512 from the left in brackets denotes the number of individual image lines (high), and the second 512 denotes the number of individual image lines (wide).

(2) The 0.25 times image is input into a ResUNet module, and is respectively subjected to a two-layer downsampling process, wherein each downsampling layer comprises three convolution layers, the first convolution layer is used for extracting shallow features, and the last two convolution layers consist of a residual block. When the bottom layer is reached, a convolution layer is firstly subjected to, then a residual block is subjected to, finally, two layers of up-sampling processes are subjected to, the composition mode of each layer is consistent with that of down-sampling, and the down-sampling layer and the up-sampling layer with the same channel number can execute a feature map concat operation. A tensor is output after two layers of upsampling.

(3) The output of the ResUNet module is sent to the GuideConv module, and two tensors, namely a weight parameter c and a bias parameter d, are output, wherein the parameters correspond to parameters of 0.25 times of the image.

(4) And sending the parameter c and the parameter d into an up-sampling module, counting and finally outputting, and outputting an image with enhanced definition corresponding to the original document image.

The sharpness enhancement algorithm can be optimized and adjusted according to different characteristics of the document image (such as the characteristics of shadow, noise, blur and the like, more particularly, the conditions of shadow, noise and blur existing in the document in the real world (only shadow exists in the document, only noise exists in the document or only blur exists in the document), and possibly two or more combinations exist (both shadow exists in the document and blur exists in the document and the like)), so that the image processing effect and reliability are improved.

B2: performing model pruning operation on the lightweight guide residual error network model and the heavyweight guide residual error network model; model pruning operations are used to reduce model inference time and reduce model power consumption.

In order for the image enhancement model to be deployed on an embedded device, two conditions must be met: one is that the reasoning of the image enhancement model is less time consuming. The other is that the accuracy of the image enhancement model is kept at a good level (the accuracy is more than 90%). Therefore, the pruning technology can be used for reducing the parameter quantity of the image enhancement model, ensuring better model precision, and the smaller parameter quantity means less calculation quantity during model reasoning and correspondingly lower time consumption.

B3: performing model distillation on the lightweight guided residual error network model after model pruning operation through the lightweight guided residual error network model after model pruning operation; model distillation is used for improving the precision of the lightweight guided residual network model after model pruning operation.

It should be noted that, in order to have better accuracy, a model distillation technique is adopted, and the foregoing guiderekutplus model is used to distill the guiderekutplus model, where the guiderekutplus model and the guiderekutplus are identical in algorithm structure, except that the depth is not identical to the first layer channel number, the depth of the guiderekutplus is only three layers, the first layer channel number is 16, and the depth of the guiderekutplus is five layers, the first layer channel number is 64, the guiderekutplus has more parameter amounts, and can learn more complex features, so that the guiderekutplus can also approach the accuracy of the guiderekutplus in order to use the guiderekutplus as a teacher model, and the guiderekutplus as a student model, so that the student can learn continuously to the teacher, and finally has or approaches the teacher's ability.

B4: and model training is carried out on the heaviness guide residual error network model after the model pruning operation and the lightweight guide residual error network model after the model distillation through an absolute value mean error loss function, a multi-scale structure similarity loss function and a training data set, so as to obtain an image enhancement model.

Since the document image has text, illustration, and the like, the absolute value mean error (L1) loss and the multi-scale-structure similarity (MS-SSIM) loss are used as the loss functions in our training. Wherein L1 is used for ensuring that the predicted document image and the document image of the label can be in one-to-one correspondence on the pixel points, and MS-SSIM is used for ensuring the color and structural consistency of the predicted document image and the document image of the label.

The general factors of the training process include the optimizer, learning rate and loss function.

The training process is to let the model learn in what way (optimizer), at what frequency (learning rate), and towards what target (loss function).

Considering that the image enhancement model needs to be deployed in embedded equipment such as mobile phones, the real-time performance and the power consumption are necessarily required, so that the method is optimized from two aspects: on the one hand, in order to enable the image enhancement model to be calculated faster in reasoning, the network depth of the image enhancement model is reduced based on an original U network, namely the parameter number of the model is reduced, a proper amount of pruning operation is carried out on the image enhancement model deployed to the embedded equipment, only three layers are reserved, meanwhile, in order to ensure that the accuracy is not lost too much, a residual block (residual block) is introduced into each layer, and a stack layer learns new features on the basis of input features by utilizing an identity mapping mechanism of the residual block, so that the image enhancement model has better performance; on the other hand, the document image sharpness enhancement problem is regarded as a linear equation solving problem, and is shaped as: y=ax+b, where y represents the document image after sharpness enhancement, x represents the blurred or low contrast image, a represents the weight parameter, and b represents the bias parameter. In order to quickly solve y, downsampling x to 0.25 times, namely x1=0.25 x, wherein y1=c×1+d, a=4c, and b=4d, so that in actual training, a network only needs to learn two parameters, namely c and d, wherein c is a weight parameter, d is a bias parameter, the weight parameter and bias of x1 which is downsampled to 0.25 times are corresponding, and finally the two parameters are downsampled to the original image size in the same ratio, so that better real-time performance can be ensured during high-resolution document image processing. This procedure can be regarded as a guided or booted procedure, i.e. the aforementioned GuideConv.

The original U network is a Unet network, and the Unet network is a commonly used segmentation network in the field of image segmentation.

Finally, the two parameters are sampled upwards in the same ratio to the original image size, so that the high-resolution document image processing can also ensure better real-time performance, and the specific reasons are as follows:

for example, parameters of the first convolution layer in the model network are: input channel number 3, output channel number 16, convolution kernel size 3x3.

The formula of the calculated amount of the convolution layer is as follows: (2 xCixKxK-1) xHxWxC0.

Where Ci is the input channel, K is the convolution kernel size, H, W is the spatial dimension of the output feature map, C0 is the number of output channels, and x is the blurred or low contrast image.

1. The amount of computation of a 2000x2000 image without downsampling through the first convolution layer in the network is: (2 x3x3x 3-1) x2000x2000x16.

2. The amount of computation of a 2000x2000 image over the first convolution in the network when downsampling by a factor of 0.25 is: (2 x3x3x 3-1) x2000x0.25x16.

The above is that the calculated amount of the downsampled image is lower in terms of the calculated amount, and the difference between the calculated amount and the downsampled image is 16 times, which means that the downsampled image has better instantaneity. The following describes how the parameters of the downsampled image are upsampled:

solving the task into a linear equation problem: y=ax+b, where y represents the document image after sharpness enhancement, x represents the blurred or low contrast image, a represents the weight parameter, and b represents the bias parameter. Downsampling x to 0.25 times, and recording x1=0.25 x, then y1=c x1+d, a=4c, b=4d can obtain values of c and d through model training, then performing bilinear interpolation operation on c and d, upsampling the c and d by 4 times, obtaining values of a and b at this time, and substituting the values into the formula y=ax+b to obtain the final y. This procedure adds only one bilinear interpolation operation and an ax+b operation, which are extremely time-consuming. Therefore, better real-time performance can be ensured during processing of the high-resolution document image.

The process of deploying the image enhancement model on the embedded device in a preset deployment mode is shown as C1-C4.

C1: and converting the model format of the graphic enhancement model through a deep learning prediction framework (such as a paldlelite framework) to obtain the graphic enhancement model with the converted model format.

In C1, using a paldlelite frame, converting the trained image enhancement model into a model format which is available to a CPU end of the embedded device, so that the embedded device uses the image enhancement model after the model format conversion.

C2: carrying out model quantization on the image enhancement model; model quantization is used to increase the computational efficiency of the image enhancement model and reduce the memory occupancy of the image enhancement model in the embedded device.

In order to accelerate the reasoning time of the image enhancement model at the Central Processing Unit (CPU) end of the embedded equipment, the image enhancement model needs to be subjected to model quantization.

Specifically, the trained model is subjected to int8 quantization, the data type in the model is converted from a 32-bit floating point type to an 8-bit integer type, and the data type is converted from the 32-bit floating point type to the 8-bit integer type, so that the occupied bytes are less, the calculation efficiency is improved, and the memory occupancy rate of the image enhancement model in embedded equipment is reduced.

And C3: creating an independent process in the embedded device; the independent process is an independent process isolated from the program main process of the embedded device.

In order to ensure the running stability of the embedded equipment program, an independent process is opened up in the CPU of the embedded equipment to be used as a model reasoning, and is isolated from a program main process, so that the model reasoning is not influenced by the main process. The independent process is used to ensure the stability of the program running of the embedded device.

The specific tunneling process is as follows:

when the embedded equipment inputs the document image and executes the definition enhancement, the application program starts a Service process, the process independently and outside the main process, simultaneously transmits the document image data to the process, then starts to execute a model reasoning process, ends the reasoning, transmits the result to the main process, and simultaneously destroys the Service process.

The model reasoning process is as follows:

when the application program is opened, the system starts to pre-load the model, when the document definition enhancement is operated, the system normalizes the document image from 0-255 to 0-1 and converts the document image into a tensor (tensor), then the input shape of the model is defined, the shape is (1, H, W, C), 1 represents the image quantity, H, W represents the image height and width, C represents the image channel quantity, all input in the program is RGB image, therefore C is 3, because the model involves convolution operation (the size is changed to the power of 2), the system also adjusts the size of the document image, generally, the size is adjusted to be a multiple of 128, the image tensor is input into the model by the system, reasoning is performed, the pixel point value of each pixel point of the essence image and each parameter in the model network are calculated, the process is completed in the CPU, the model outputs a corresponding result after waiting for 1-2s, the result is not final, the model needs to be normalized to be converted into the data format of the final data, and the data is required to be converted into the data format of 0-255 by the application program.

And C4: and taking the independent process as a model reasoning process, and deploying the image enhancement model quantized by the model in the embedded device through the model reasoning process.

And displaying the enhanced document image through the embedded equipment. A schematic diagram of a specific enhanced document image is shown in fig. 4. Fig. 4 is merely an example.

For ease of understanding the process of the image processing method, the description is given here by way of example in connection with the scene embodiment:

for example, a mobile phone application program is opened, and a user can download and install the application program of the scheme in a mobile phone application market;

selecting a document image, wherein a user can shoot the document image by using a camera of a mobile phone, and can import the existing document image;

image preprocessing, wherein an application program performs image preprocessing on a document image, and the document is automatically selected in a frame mode, direction is adjusted and the like;

sharpness enhancement, an application program uses an image enhancement model and a sharpness enhancement algorithm which are deployed in the embedded device in advance to carry out sharpness enhancement on a document image, remove shadows and improve contrast and sharpness of the document image;

And outputting the result, namely outputting the document image with enhanced definition into a mobile phone screen or a storage device for viewing or further processing such as saving and the like by a user.

The scheme comprises the following components of mobile phone equipment, a software program, a definition enhancing algorithm and a CPU processor.

The mobile phone device may be an embedded device such as a smart phone or a tablet computer.

The software program running on the mobile phone equipment comprises modules of document image acquisition, preprocessing, enhancement, output and the like.

The definition enhancement algorithm is used for realizing the algorithm for enhancing the definition of the document image and is based on the deep learning algorithm.

And the CPU is used for executing image processing and definition enhancing algorithms.

The scheme utilizes the CPU of the embedded equipment to enhance the document image (namely, the scheme adopts a deep learning technology to generate a model algorithm which can enhance the document image, and the execution of the model reasoning (calculation) process needs to depend on the CPU, the GPU or other calculation cards, so that the scheme executes the model reasoning process through the CPU of the embedded equipment), does not need additional special hardware or high-performance computers, can conveniently realize the rapid enhancement of the document image on the embedded equipment such as mobile phones and the like, maximizes the performance under limited resources, has better algorithm robustness, ensures that a user can process the document image on site immediately, improves the readability and the usability of the document image, and simultaneously furthest protects the document information safety of the user. The document image enhancement process of the scheme is quick, can be completed in situ immediately, and is convenient for users to use and process; the definition enhancement algorithm of the scheme can be optimized and adjusted according to different document image characteristics, and the image processing effect and reliability are improved.

The scheme can conveniently realize the rapid enhancement of the document image on the embedded equipment such as the mobile phone and the like, and improves the readability and usability of the document image. The technology of the application has wide application prospect, and can provide high-efficiency, convenient and high-quality image enhancement service for enterprises, individuals, public institutions and the like.

In the embodiment of the application, the image is enhanced without an additional high-performance computer or special hardware to improve the definition of the image, the pre-training operation is only needed to be carried out on the image enhancement model, so that the pre-training operation-carried out image enhancement model can directly carry out shadow removal, noise removal and definition enhancement on document images with problems of shadow, blurring, noise and the like, the model reasoning time is reduced by carrying out the pre-training operation on the image enhancement model, the image enhancement model keeps better robustness, the pre-training operation-carried out image enhancement model is deployed in embedded equipment such as mobile phones and the like by improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment, and the definition enhancement operation is carried out on the document images to be processed by the image enhancement model deployed in the embedded equipment to obtain the enhanced document images, thereby improving the definition, quality and readability of the images.

Based on the image processing method disclosed in fig. 1 of the foregoing embodiment, the embodiment of the present application correspondingly discloses an image processing system, as shown in fig. 5, where the image processing system includes a first obtaining unit 501 and an operation unit 502.

A first acquisition unit 501 for acquiring a document image to be processed.

An operation unit 502, configured to perform sharpness enhancement operation on a document image to be processed through an image enhancement model that is pre-deployed in an embedded device, to obtain an enhanced document image; the image enhancement model is obtained through a pre-training operation; the pre-training operation is an operation of reducing the reasoning time of the image enhancement model, reducing the model power consumption of the image enhancement model and improving the precision of the image enhancement model; the image enhancement model is deployed in the embedded equipment in a preset deployment mode; the preset deployment mode is a deployment mode for improving the calculation efficiency of the image enhancement model and reducing the memory occupancy rate of the image enhancement model in the embedded equipment.

Further, the image processing system further includes a second acquisition unit.

And the second acquisition unit is used for acquiring the training data set.

The second acquisition unit comprises an acquisition module, an addition module, a first processing module and a second processing module.

The acquisition module is used for acquiring the training picture set meeting the preset clear condition.

And the adding module is used for randomly adding shadow, blurring and/or noise degradation effects to the training picture set to obtain a comparison picture set.

The first processing module is used for carrying out data enhancement processing on the comparison picture set to obtain the picture set with enhanced data.

And the second processing module is used for unifying the sizes of the data-enhanced picture sets in a unified size processing mode to obtain a training data set.

Further, the operation unit 502 of the pre-training operation of the image enhancement model includes a construction module, a first operation module, a second operation module, and a training module.

The construction module is used for constructing a lightweight guiding residual error network model and a heavyweight guiding residual error network model.

The first operation module is used for carrying out model pruning operation on the lightweight guiding residual error network model and the heavyweight guiding residual error network model; model pruning operations are used to reduce model inference time and reduce model power consumption.

The second operation module is used for carrying out model distillation on the lightweight guided residual error network model after the model pruning operation through the lightweight guided residual error network model after the model pruning operation; model distillation is used for improving the precision of the lightweight guided residual network model after model pruning operation.

The training module is used for carrying out model training on the heaviness guide residual error network model after the model pruning operation and the lightweight guide residual error network model after the model distillation through the absolute value mean error loss function, the multi-scale structure similarity loss function and the training data set to obtain an image enhancement model.

Further, the image enhancement model is deployed in the operation unit 502 of the embedded device in a preset deployment manner, and includes a conversion module, a model quantization module, a creation module and a deployment module.

And the conversion module is used for converting the model format of the graphic enhancement model through the deep learning prediction framework to obtain the graphic enhancement model with the converted model format.

The model quantization module is used for carrying out model quantization on the image enhancement model; model quantization is used to increase the computational efficiency of the image enhancement model and reduce the memory occupancy of the image enhancement model in the embedded device.

The creation module is used for creating an independent process in the embedded equipment; the independent process is an independent process isolated from the program main process of the embedded device.

The deployment module is used for taking the independent process as a model reasoning process and deploying the image enhancement model quantized by the model in the embedded equipment through the model reasoning process.

Further, the image processing system further comprises a preprocessing unit.

And the preprocessing unit is used for preprocessing the image of the document image to be processed.

The preprocessing unit is specifically used for automatically selecting the document and adjusting the direction of the document for the document image to be processed.

Further, the image processing system further comprises a display unit.

And the display unit is used for displaying the enhanced document image through the embedded equipment.

Further, the image processing system further comprises a generating unit.

And the generating unit is used for generating a corresponding application program and running the application program in the embedded device to execute image processing.

The embodiment of the application also provides a storage medium, which comprises stored instructions, wherein the equipment where the storage medium is controlled to execute the image processing method when the instructions run.

The embodiment of the application also provides an electronic device, the structure of which is shown in fig. 6, specifically including a memory 601, and one or more instructions 602, where the one or more instructions 602 are stored in the memory 601, and configured to be executed by the one or more processors 603 to perform the image processing method described above.

The specific implementation process and derivative manner of the above embodiments are all within the protection scope of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The system and system embodiments described above are merely illustrative, in which the elements described as clustered elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. An image processing method, the method comprising:

acquiring a document image to be processed;

2. The method as recited in claim 1, further comprising:

acquiring a training data set;

The acquiring training data set includes:

acquiring a training picture set meeting preset clear conditions;

3. The method of claim 2, wherein the pre-training operation of the image enhancement model comprises:

4. The method according to claim 1, wherein the deploying of the image enhancement model in the embedded device by the preset deployment method comprises:

5. The method according to claim 1, wherein after the obtaining the document image to be processed, the sharpness enhancement operation is performed on the document image to be processed by an image enhancement model pre-deployed in an embedded device, and before obtaining the enhanced document image, the method further comprises:

Performing image preprocessing on the document image to be processed;

6. The method as recited in claim 1, further comprising:

and displaying the enhanced document image through the embedded equipment.

7. The method as recited in claim 1, further comprising:

8. An image processing system, the system comprising:

9. A storage medium comprising stored instructions, wherein the instructions, when executed, control a device in which the storage medium is located to perform the image processing method of any one of claims 1 to 7.

10. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to perform the image processing method of any of claims 1-7 by one or more processors.