CN117036179A

CN117036179A - Image processing method, device, storage medium and computer equipment

Info

Publication number: CN117036179A
Application number: CN202211112047.8A
Authority: CN
Inventors: 余鹏飞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2023-11-10

Abstract

The application discloses an image processing method, which comprises the following steps: acquiring a document image; inputting the document image into a classification model, and obtaining classification information of the document image through the classification model; processing the document image according to at least one processing mode of image enhancement processing based on an enhancement model and image correction processing based on a correction model based on the classification information to obtain a preprocessed image, wherein the preprocessed image carries the classification information; and performing color model processing on the preprocessed image based on the classification information to obtain a target image corresponding to the document image, wherein the color model processing is used for at least adjusting one of the saturation, the contrast and the brightness of the preprocessed image. The application can realize the simultaneous multidimensional image processing of the document images by applying the artificial intelligence technology, so that the interference information and the distortion pictures on the document images of different types can be effectively optimized, thereby improving the display effect of the document images and the efficiency of image processing.

Description

Image processing method, device, storage medium and computer equipment

Technical Field

The present application relates to the field of computer vision, and more particularly, to an image processing method, apparatus, storage medium, and computer device.

Background

Document images refer to images obtained by photographing/scanning a document or screen-cutting the document by a photographing/scanning device. In an automation office scenario, it is often necessary to convert document images into a different document format for storage and display. For this reason, the user has a high demand for the display effect of the document image content, for example, it is necessary to clearly highlight the text content in the document image.

Because the document image is easily influenced by ambient light and photographing experience of photographing personnel in the process of acquiring the document image, the acquired document image is subjected to optimization processing before the document image is used, so that a standard picture meeting the use requirement is obtained. In the related technology, the optimization processing of the document image is simpler, and the image optimization effect is poorer.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, a storage medium and computer equipment. Aims to improve the optimization effect of the document image.

In one aspect, an embodiment of the present application provides an image processing method, including: acquiring a document image; inputting the document image into a classification model, and obtaining classification information of the document image through the classification model; processing the document image according to at least one processing mode of image enhancement processing based on an enhancement model and image correction processing based on a correction model based on the classification information to obtain a preprocessed image, wherein the preprocessed image carries the classification information; performing color model processing on the preprocessed image based on the classification information to obtain a target image corresponding to the document image, wherein the color model processing is used for at least adjusting one of the saturation, contrast and brightness of the preprocessed image; the enhancement model is obtained by performing combination loss calculation training on a synthetic image generated based on an interference image and a standard image serving as sample data and the interference image and the standard image serving as tag data and is used for removing interference information of a document image; the correction model is obtained by generating a first mapping training matrix and a second mapping training matrix based on the distorted sample image as sample data, performing secondary mapping correction training by taking a real first mapping matrix and a real second mapping matrix as tag data, and performing mapping correction on the distorted sample image based on the first mapping training matrix.

In another aspect, an embodiment of the present application further provides a video processing apparatus, including: the image acquisition module is used for acquiring a document image; the image classification module is used for inputting the document image into the classification model and obtaining classification information of the document image through the classification model; the image processing module is used for processing the document image according to at least one processing mode of image enhancement processing based on an enhancement model and image correction processing based on a correction model based on the classification information to obtain a preprocessed image, wherein the preprocessed image carries the classification information; the image adjustment module is used for carrying out color model processing on the preprocessed image based on the classification information to obtain a target image corresponding to the document image, and the color model processing is used for at least adjusting one of the saturation, contrast and brightness of the preprocessed image; the enhancement model is obtained by performing combination loss calculation training on a synthetic image generated based on an interference image and a standard image serving as sample data and the interference image and the standard image serving as tag data and is used for removing interference information of a document image; the correction model is obtained by generating a first mapping training matrix and a second mapping training matrix based on the distorted sample image as sample data, performing secondary mapping correction training by taking a real first mapping matrix and a real second mapping matrix as tag data, and performing mapping correction on the distorted sample image based on the first mapping training matrix.

In another aspect, an embodiment of the present application further provides a computer readable storage medium storing program code, where the program code, when executed by a processor, performs the above-described image processing method.

On the other hand, the embodiment of the application also provides a computer device, which comprises a processor and a memory, wherein the memory stores computer program instructions, and the computer program instructions execute the image processing method when being called by the processor.

In another aspect, embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a storage medium; the processor of the computer device reads the computer instructions from the storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the image processing method described above.

The image processing method provided by the application can acquire the document image, input the document image into the classification model to obtain the classification information of the document image, further process the document image according to at least one processing mode of image enhancement processing based on the enhancement model and image correction processing based on the correction model based on the classification information to obtain the preprocessed image, wherein the preprocessed image carries the classification information, and process the preprocessed image with a color model based on the classification information to obtain a target image corresponding to the document image, wherein the color model processing is used for at least adjusting one of saturation, contrast and brightness of the preprocessed image. The enhancement model is obtained by performing combination loss calculation training on a synthetic image generated based on an interference image and a standard image serving as sample data and the interference image and the standard image serving as tag data and is used for removing interference information of a document image; the correction model is obtained by generating a first mapping training matrix and a second mapping training matrix based on the distorted sample image as sample data, performing secondary mapping correction training by taking a real first mapping matrix and a real second mapping matrix as tag data, and performing mapping correction on the distorted sample image based on the first mapping training matrix. Therefore, through classifying the document images, the document images with different image types can be subjected to image preprocessing with multiple dimensions, including at least removing interference information on the document images or correcting picture distortion of the document images, so that an integral solution is provided for optimizing the document images for multiple items, and the optimization effect and efficiency of the document images are greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of a system architecture according to an embodiment of the present application.

Fig. 2 shows a flowchart of an image processing method according to an embodiment of the present application.

Fig. 3 shows a flowchart of an image processing method according to an embodiment of the present application.

Fig. 4 is a schematic flow chart of another image processing method according to an embodiment of the present application.

Fig. 5 shows an application scenario diagram of an image processing method according to an embodiment of the present application.

Fig. 6 shows a network architecture diagram of a preset enhanced network according to an embodiment of the present application.

Fig. 7 shows a network architecture diagram of a preset correction network according to an embodiment of the present application.

Fig. 8 shows a network structure diagram of a classification model according to an embodiment of the present application.

Fig. 9 shows a flow chart of an adaptive gray scale process according to an embodiment of the present application.

Fig. 10 shows another adaptive gray scale process flow chart provided by an embodiment of the present application.

Fig. 11 is a block diagram of an image processing apparatus according to an embodiment of the present application.

Fig. 12 is a block diagram of a computer device according to an embodiment of the present application.

Fig. 13 is a block diagram of a computer readable storage medium according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present application and are not to be construed as limiting the present application.

In some of the processes described in the specification, claims and drawings above, a number of steps occurring in a particular order are included, but it should be understood that the steps may be performed out of order or performed in parallel, the sequence numbers of the steps merely being used to distinguish between the various steps, the sequence numbers themselves not representing any order of execution. Furthermore, the descriptions of "first" and "second" and the like herein are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

In order to enable those skilled in the art to better understand the solution of the present application, the following description will make clear and complete descriptions of the technical solution of the present application in the embodiments of the present application with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that, in the specific embodiment of the present application, related data such as document images, sample data, and tag data, when applied to a specific product or technology of the embodiment of the present application, needs to be licensed or agreed by a user, and the collection, use, and processing of the related data need to comply with related laws and regulations and standards of related countries and regions.

In the application scene related to the document processing, due to the influence of factors such as illumination, imaging algorithm, paper quality, shooting angle and the like, the document image is usually provided with the problems such as shadow shielding, uneven illumination, yellowing of paper, color distortion and the like, and if the document image is shot on an electronic screen, the document image also has moire phenomenon, so that the execution result of a downstream task is influenced.

The related art has a single function for the optimization method of the above problems, for example, the optimization method can only process the problems from the perspective of the gray level diagram of the document, and cannot process the light interference problem of the color document at the same time. In addition, different optimization methods cannot coexist, for example, the saturation of an image is too high during sharpening, more residues remain during shadow removal, details of the image are easy to lose during black-and-white processing, and the like, so that the optimization effect on a document image is poor.

In order to solve the above problems, the inventor has studied and proposed an image processing method provided by the embodiment of the application, which can obtain a document image, input the document image into a classification model to obtain classification information of the document image, further process the document image according to at least one of image enhancement processing based on an enhancement model and image correction processing based on a correction model based on the classification information to obtain a preprocessed image, the preprocessed image carries the classification information, and perform color model processing on the preprocessed image based on the classification information to obtain a target image corresponding to the document image, wherein the color model processing is used for at least adjusting one of saturation, contrast and brightness of the preprocessed image. The enhancement model is used for removing interference information of the document image; the correction model is used to correct the picture distortion of the document image. Because the document image is subjected to multidimensional image preprocessing and optimization processing classified into the document image, various display problems on the document image can be simultaneously subjected to various optimizations, such as sharpening processing, shadow removing processing, black and white processing and the like, so that the document image is subjected to various optimizations, and the optimization effect of the document image is enhanced.

The architecture of the system of the image processing method according to the present application will be described first.

As shown in fig. 1, the image processing method provided by the embodiment of the present application may be applied to a system 300, where a data acquisition device 320 is used to acquire training data. For the image processing method of the embodiment of the present application, the training data may include an interference image for training, a standard image, a distorted sample image, a document sample image, a classification tag, a true first mapping matrix, and a true second mapping matrix. The real first mapping matrix and the real second mapping matrix can be used as labels and are calculated manually in advance. After the training data is collected, the data collection device 320 may store the training data in the database 340 and the training device 360 trains the target model 301 based on the training data maintained in the database 340.

Alternatively, the training device 360 may train the preset enhancement network based on the interference image and the standard image until the preset enhancement network meets the preset condition, so as to obtain the trained target model 301, that is, the enhancement model. Optionally, the training device 360 may also train the preset correction network based on the distorted sample image, the real first mapping matrix and the real second mapping matrix until the preset correction network meets the preset condition, so as to obtain the trained target model 301, that is, the correction model. Optionally, the training device 360 may also train the preset classification network based on the document sample image and the classification label until the preset classification network meets the preset condition, so as to obtain the trained target model 301, that is, the classification model.

The object model 301 described above can be used to implement the image processing method provided in the embodiment of the present application. The object model 301 in the embodiment of the present application may be specifically a deep neural network model, for example, a convolutional neural network (Convolutional Neural Networks, CNN). In practical applications, the training data maintained in the database 340 is not necessarily acquired by the data acquisition device 320, but may be received from other devices. It should be noted that, the training device 360 is not necessarily completely based on the training data maintained by the database 340 to perform training of the target model 301, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.

The target model 301 obtained by training according to the training device 360 may be applied to different systems or devices, such as the execution device 310 shown in fig. 1, where the execution device 310 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented Reality (Augmented Reality, AR)/Virtual Reality (VR), or may also be a server or cloud, etc.

In fig. 1, the execution device 310 may be used to interact with external devices, for example, a user may input data to the execution device 310 over a network using the client device 330. The input data may include, in an embodiment of the present application: document images input by the client device. In the image preprocessing, image adjustment, or calculation performed by the calculation module 311 of the execution device 310 on the document image, the execution device 310 may call data, codes, and the like in the data storage system 350 for corresponding calculation processing, or may store data, instructions, and the like obtained by the corresponding calculation processing in the data storage system 350.

Finally, the execution device 310 returns the processing result, for example, the target image generated by the target model 301 to the client device 330 via the network, thereby providing the user with the processing result. It should be noted that the training device 360 may generate the corresponding target model 301 based on different training data for different targets or different tasks, and the corresponding target model 301 may be used to achieve the targets or to perform the tasks, thereby providing the user with the desired result.

Alternatively, the system shown in fig. 1 may be a Client-Server (C/S) system architecture, the execution device 310 may be a Server (e.g., a cloud Server), and the Client device 330 may be a Client (e.g., a notebook computer). The user can upload the document image to the cloud server through the network by using the image processing software in the notebook computer, and when the cloud server receives the document image, the cloud server uses the target model 301 to perform image classification, image enhancement processing, image correction processing and color model processing to generate a target image, and returns the target image to the notebook computer, so that the user can acquire the target image on the image processing software.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the architecture and application scenario of the system described in the embodiment of the present application are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation on the technical solution provided by the embodiment of the present application. As can be known to those skilled in the art, with the evolution of the system architecture and the appearance of new application scenarios, the technical solution provided by the embodiment of the present application is also applicable to similar technical problems.

Referring to fig. 2, fig. 2 is a flow chart illustrating an image processing method according to an embodiment of the application. In a specific embodiment, the image processing method is applied to the image processing apparatus 500 shown in fig. 11 and the computer device 600 (fig. 12) configured with the image processing apparatus 500.

In the following, a specific flow of the present embodiment will be described by taking a computer device as an example, and it is to be understood that the computer device applied in the present embodiment may be a server or a terminal, and the server may be an independent physical server, or may be a server cluster or a distributed system formed by multiple physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDNs, blockchains, and basic cloud computing services such as big data and artificial intelligence platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The image processing method specifically comprises the following steps:

Step S110: a document image is acquired.

The document image may be a screen capturing image, or may be an image captured by an imaging device or scanned by a scanning device, that is, a non-screen capturing image. Alternatively, the document image may be an image of a photographed paper or an image of a photographed screen.

Because the shooting process is easily affected by light, paper color and paper shape distortion and mole patterns generated by a screen, and the color of a normal screen capturing image can not meet the requirement of shift sometimes, the image processing method provided by the embodiment of the application can optimize a document image so as to shift the document image.

As an embodiment, an image captured by an image capturing apparatus may be used as a document image, or a screen may be acquired as a document image by screen capturing software. For example, when the user needs to shift the paper contract to a document in PDF format, an image of the paper contract scanned by the scanning apparatus may be acquired as a document image.

Step S120: and inputting the document image into a classification model, and obtaining classification information of the document image through the classification model.

In consideration of the fact that the scheme for optimizing the document image in the related technology has functional singleness and mutual exclusivity, however, the document image has various display problems at the same time, such as picture distortion, uneven illumination, shadow shielding, uneven materials, moire interference, insufficient color development of a black-and-white document and the like. Therefore, the embodiment of the application provides an integral solution for optimizing the document image, and the display problem in the image can be processed correspondingly aiming at the image type of the document image by classifying the document image.

In some embodiments, the step of inputting the document image into a classification model, and obtaining classification information of the document image by the classification model may include:

(1) And performing conversion operation on the document image to obtain an image vector corresponding to the document image.

The conversion operation is used for scaling the document image to the specified size, because the acquired document image is often different in size and cannot meet the size requirements of subsequent enhancement and correction calculation, and by means of the conversion operation, the document image is scaled to the specified size before calculation, so that the calculation resources can be saved and the calculation efficiency can be improved.

As one embodiment, the document image may be converted into an image vector of the same specification by a conversion operation. For example, the dimension is 224×224×3 image vector. Alternatively, the conversion operation may be based on the center point of the original image, and cut into squares with the shorter sides of the original image as the sides, or may be based on bilinear interpolation or bicubic interpolation (resolution), which is not limited herein.

(2) The image vector is input to a classification model, and classification information of the document image is determined by the classification model.

In order to improve the optimization efficiency of the document images, the document images can be classified by using the classification model, and then the document images with different classification information are subjected to corresponding subsequent processing. Wherein the classification information refers to that the image type of the document image at least comprises one of a screen capturing image and a non-screen capturing image, and the non-screen capturing image may refer to an image acquired by an image capturing apparatus or a scanning apparatus. Alternatively, the classification information may also include a color image, a black-and-white image, and a screen image (an image obtained by photographing a screen).

As an embodiment, the classification model may be trained based on a document sample image and a classification label, where the document sample image may be various types of document images acquired from the real world, such as a color document image, a captured document image (i.e., a non-screenshot image), and manually marking the image type as the classification label, thereby forming a training dataset. When the image vector corresponding to the document image is input into the classification model, the classification model can determine classification information of the document image, and the classification information of each document image can be one or more types of classification information, for example, images in videos intercepted from a computer display screen, and the classification information can comprise color images and screen capturing images.

The screen capturing image refers to an image captured on the display screen by the screen capturing algorithm. Non-screenshot images refer to images taken directly from the real world by shooting or scanning. Because the screen capturing image has little interference information, a target image meeting the requirement of executing a downstream task can be obtained through simple image adjustment. However, because the non-screen capturing image is easily affected by natural light or natural or artificial factors such as shooting angles, there is often much interference information on the image, for example, the distortion problem of the document shape in the image content, so that the target image meeting the requirement of executing the downstream task can be obtained by performing image enhancement and image correction on the non-screen capturing image.

Step S130: and processing the document image according to at least one of image enhancement processing based on an enhancement model and image correction processing based on a correction model based on the classification information to obtain a preprocessed image.

In the embodiment of the application, after the classification model classifies the document images, the document images of different types can be pertinently optimized, the enhancement model can carry out image enhancement on the document images, interference information (shadow, illumination, texture, mole pattern and the like) in the document images is removed, the correction model can carry out image correction on the document images, and the images with distorted shapes are converted into images with normal shapes. The document image may be subjected to multi-dimensional image preprocessing by a combination of the classification model, the enhancement model, and the correction model.

Optionally, the combination mode of the enhancement model and the correction model is self-defined, and different combination modes can be adopted among the models. For example, the classification model may be used to classify the image, the enhancement model may be used to enhance the image, and the correction model may be used to correct the image. The document image may be subjected to image correction using only the correction model. The document image may also be image enhanced using only the enhancement model.

It should be noted that, the use sequence of the enhancement model and the correction model may be parallel or serial, and there is no mutual exclusion of functions, so that the enhancement model and the correction model may be mutually called, thereby realizing multi-dimensional optimization of the document image, that is, the optimization of multiple functions may be completed through one input.

In some embodiments, the step of processing the document image based on the classification information in at least one of an enhancement model-based image enhancement process and a correction model-based image correction process to obtain the preprocessed image may include:

(1) When the classification information of the document image is a non-screen-capturing image, the non-screen-capturing image is input to the enhancement model, and the enhancement image is output.

Because the non-screen-capturing image is easy to cause the problem of image display, the non-screen-capturing image can be enhanced and corrected. In an embodiment of the application, the enhancement model is used to remove interference information, such as shadows, illumination, moire, etc., from the image. The enhancement model is obtained by taking a synthesized image generated based on an interference image and a standard image as sample data and taking the interference image and the standard image as tag data to perform combination loss calculation training, can be used for performing feature extraction, learns a mapping relation from a finer and more complex low-quality image to a high-quality image, and performs enhancement calculation, so that interference information in an outgoing image is enhanced.

The interference image is an image with interference information such as shadow, illumination, texture, mole pattern and the like. The standard image is a document image without interference information. Alternatively, the interference image may be directly obtained from the third party image database, and the standard image may be obtained by preprocessing the obtained document image, removing the interference information, where the preprocessing may use an existing image enhancement algorithm, which is not limited herein.

In one embodiment, when the document image is a non-screenshot image, the step of inputting the non-screenshot image to the enhancement model and outputting the enhancement image may include:

and (1.1) inputting the non-screen capturing image into an enhancement model for feature extraction to generate a feature image.

(1.2) performing enhancement calculations based on the feature image and the non-screenshot image, generating an enhanced image.

For example, when it is determined that the document image is a non-screen-capturing image, the non-screen-capturing image (document image) may be input to an enhancement model for feature extraction to generate a feature image, where feature extraction may improve image quality including retaining high frequency information such as color and brightness in the image, increasing image fidelity, etc., and suppressing high frequency noise of the image. Further, enhancement calculation is performed based on the feature image and the non-screen-capturing image, and an enhancement image is generated, wherein the enhancement calculation may be a division operation performed on the feature image and the non-screen-capturing image, so as to remove interference information, such as moire, shadow, and the like, in the image.

The feature extraction is performed on the non-screen-capturing image to generate the feature image, and the enhancement calculation is performed on the basis of the feature image and the non-screen-capturing image, which is equivalent to performing secondary enhancement, so that the enhancement model can greatly improve the image enhancement effect, and interference information in the document image is effectively removed.

(2) The enhanced image is input to the correction model, and the corrected image is output as a preprocessed image.

In general, the image shot by the user is irregular, and there are problems such as distortion of paper and improper position, and therefore, after image enhancement is performed on the non-screen-capturing image, the obtained enhanced image can be subjected to image correction by using a correction model. The correction model is obtained by performing secondary mapping correction training on a first mapping training matrix and a second mapping training matrix which are generated based on a distorted sample image as sample data and a real first mapping matrix and a real second mapping matrix as tag data. The correction model may include a first mapping network and a second mapping network, each mapping network being a mapping matrix for determining two-dimensional coordinates of the document image. The first mapping network is used for preliminarily predicting the mapping matrix, and the second mapping network is used for finely adjusting the mapping matrix preliminarily predicted by the first mapping network, so that the effect of correcting and recovering the distorted image is improved.

As an embodiment, the step of inputting the enhanced image to the correction model and outputting the corrected image may include:

(2.1) inputting the enhanced image into a first mapping network, generating a first mapping matrix through the first mapping network.

(2.2) inputting the first mapping matrix into a second mapping network, generating the second mapping matrix through the second mapping network.

(2.3) performing image coordinate mapping on the second mapping matrix to obtain a corrected image.

After obtaining the enhanced image of the document image, the enhanced image may be input to a first mapping network, a first mapping matrix is generated through the first mapping network, the first mapping matrix is further input to a second mapping network, a second mapping matrix is generated through the second mapping network, and finally, image coordinate mapping is performed on the second mapping matrix, so as to obtain a corrected image. For example, image coordinate mapping (Remap) is performed using an open source cross-platform computer vision and machine learning software library (OpenCV). The corrected image may then be used as a preprocessed image.

Step S140: and performing color model processing on the preprocessed image based on the classification information to obtain a target image corresponding to the document image.

The color model processing refers to fine adjustment of an image, and is used for adjusting at least one of saturation, contrast and brightness of a preprocessed image, for example, improving the contrast, saturation and brightness of a document image, so that a final target image result meets the requirement of a downstream task, and it can be understood that the color model processing is equivalent to image enhancement of the document image again, so that the display effect of a document shifted by the document image is better, for example, characters are displayed more clearly on the background.

As an embodiment, after obtaining the preprocessed image of the document image, different types of color model processing may be performed according to the classification information corresponding to the preprocessed image. Specifically, when the pre-processed image is a color image, it can be adjusted for contrast, saturation, and brightness. When the preprocessed image is a black-and-white image, it can be subjected to adaptive gray-scale processing.

Referring to fig. 3 for an example, fig. 3 shows a flowchart of an image processing method. After the document image is acquired, the document image is input into a classification model, and the classification model determines classification information of the document image.

When the document image is a non-screen-capturing image, the non-screen-capturing image, namely the document image, is input into the enhancement model, and the enhancement model carries out image enhancement on the document image to obtain an enhanced image. Further, the enhanced image is input into a correction model for image correction to obtain a correction image, and further, the correction image is subjected to color model processing to obtain a target image corresponding to the document image.

When the document image is a screen capturing image, color model processing can be directly performed on the screen capturing image, namely, the document image, specifically, in the process of performing the color model processing, if the document image is a black-and-white image, the document image is subjected to self-adaptive gray processing to obtain a target image, and if the document image is a color image, the document image is subjected to adjustment of contrast, saturation and brightness to obtain the target image.

In the embodiment of the application, a document image can be acquired and input into a classification model to obtain classification information of the document image, and then the document image is processed according to at least one processing mode of image enhancement processing based on an enhancement model and image correction processing based on a correction model based on the classification information to obtain a preprocessed image, the preprocessed image carries the classification information, and the preprocessed image is processed by a color model based on the classification information to obtain a target image corresponding to the document image, wherein the color model processing is used for at least adjusting one of saturation, contrast and brightness of the preprocessed image. Therefore, by classifying the document images, at least the processing of removing interference information on the document images or correcting image display problems such as picture distortion of the document images can be carried out on the document images of different image types, so that an integral solution is provided for carrying out multiple optimization on the document images, and the optimization effect and efficiency of the document images are greatly improved.

The methods described in connection with the above embodiments are described in further detail below by way of example.

The image processing method of the present application relates to artificial intelligence (Artificial Intelligence, AI) technology, which is a theory, method, technology and application system that simulates, extends and expands human intelligence, senses environment, acquires knowledge and uses knowledge to obtain an optimal result using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

One branch of Computer Vision (CV) technology as an artificial intelligence technology is a science for researching how to make a machine "look at", and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, detection and measurement on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data.

Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, image processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

The image processing method provided in this embodiment relates to a technology such as computer vision of artificial intelligence, and will be described below by taking a specific integration of an image processing apparatus in a computer device as an example, and details will be described with respect to a flowchart shown in fig. 4 in conjunction with an application scenario shown in fig. 5, where the computer device may be a server or a terminal device. Referring to fig. 4, fig. 4 illustrates another image processing method according to an embodiment of the present application, and in a specific embodiment, the image processing method may be applied to the image shift scene shown in fig. 5.

The image shift scene may provide a user with a text recognition (Optical Character Recognition, OCR) function, in particular, to optimize a document picture taken by the user and shift to a document format. The image shift service provider provides a service side including a cloud training server 410 and a cloud execution server 430. The cloud training server 410 is configured to train out an enhancement model, a correction model and a classification model, and the cloud execution server 430 is configured to deploy the models trained by the cloud training server 410, optimize document images sent by clients, including image preprocessing and classification, and shift the optimized target images. Wherein, when the client uses the image shift service for the user, the image shift software 420 is opened on the smart phone.

It should be noted that, the image processing method provided by the embodiment of the present application includes training the preset enhancement network, the preset correction network and the preset classification network. The training of the preset network can be performed in advance according to the acquired training sample data set, and then the enhancement model, the correction model and the classification model obtained by training can be used for direct calculation every time the document image is required to be subjected to image preprocessing and classification, so that the network training is not required to be performed again every time the image preprocessing and classification are performed. The image processing method specifically comprises the following steps:

Step S210: the computer device is trained to obtain an enhanced model.

In the embodiment of the application, the enhancement model is used for enhancing the image to remove interference information in the image, such as shadows, illumination, moire and the like. The enhancement model is obtained by taking a synthetic image generated based on an interference image and a standard image as sample data and performing iterative training based on a first enhancement loss value between the interference image and a predicted interference image and a second enhancement loss value between the standard image and a de-interference image, wherein the predicted interference image is obtained based on the prediction of the synthetic image, and the de-interference image is obtained according to the synthetic image and the predicted interference image.

In some embodiments, the step of training the computer device to obtain the enhancement model may include:

(1) The computer device obtains a first training data set.

Wherein the first training data set comprises an interference image and a standard image. The interference image refers to an image with interference information such as shadows, lights, textures, mole marks, and the like. The standard image refers to a document image without interference information. For example, the cloud training server 410 may obtain a first training data set from the data acquisition device, the first training data set may have a plurality of interference images and standard images.

(2) And the computer equipment performs image synthesis on the interference image and the standard image to obtain a synthesized image.

Unlike the related technology, the present application inputs the document image with interference information directly into the neural network for image enhancement to perform network training, and the present application proposes to use the synthesized image based on the interference image and the standard document image as input for network training to reach the aim of combining loss calculation to train the network. For example, loss calculation is performed based on the interference image and the predicted interference image, and loss calculation is performed based on the standard document image and the interference-removed image, so that the enhancement model obtained after training has stronger image enhancement capability.

Illustratively, the cloud training server 410 may be based on the interference image I _t And standard image I _o The document image with the interference information is synthesized as a training sample for inputting the predictive enhancement model. For example, cloud training server 410 may send interference image I _t And standard image I _o The images are input into a Moire pattern generation countermeasure network (Moire-Generative Adversarial Network, MRGAN) to synthesize the images, thereby generating synthesized images.

(3) The computer device predicts a predicted interference image in the composite image through a preset enhancement network.

Referring to fig. 6, fig. 6 shows a network architecture diagram of a preset enhanced network. The preset enhancement network can perform feature extraction to obtain a predicted interference image and enhancement calculation to obtain a de-interference image. The feature extraction may be performed by normalizing the image and include a deep neural network (Deep Neural Networks, DNN) for performing feature extraction on the image, alternatively, the deep neural network may be a convolutional neural network, for example, a scaled global image segmentation network (Range Scaling Global U-Net, RSGUnet), which is not limited herein. Wherein the predicted interference image is an image generated based on interference information extracted from the synthesized image, for example, a mask image in the real world. The interference removing image is an image obtained by removing interference information from the synthesized image.

Wherein the preset enhancement network can synthesize the image I _s The normalized data are input into a deep neural network which takes an RSGUnet network as a Backbone network (backhaul). The normalization process normalizes the pictures to a matrix ranging from 0 to 1 and then scales to a vector of dimensions n×512×512×3, where n is the size of each input Batch. The predicted interference image F can be predicted after the feature extraction of the deep neural network _w (I _s ) Is n x 512 x 3.

(4) The computer device obtains a de-interference image from the composite image and the predicted interference image.

Illustratively, the cloud training server 410 may be based on the interference image I _t And standard image I _o Synthesis of composite image I _s And will synthesize image I _s Input to a preset enhancement network, and output a composite image I _s After normalization processing, the cloud training server 410 may input the synthesized image after normalization processing into an RSGUnet network to obtain a predicted interference image F _w (I _s ) Further, cloud training server 410 may be based on predictive interference image F _w (I _s ) And composite image I _s Enhancement calculation is carried out to obtain a de-interference image I _r 。

(5) The computer device obtains a first enhancement loss value based on the interference image and the predicted interference image.

Illustratively, the cloud training server 410 may be based on the interference image I _t And predicting drynessDisturbance of the image F _w (I _s ) Determining a first enhancement loss value L of a preset enhancement network _α (I _t ,F _w (I _s )). The specific calculation formula is as follows:

L _α (I _t ,F _w (I _s ))＝ρ ₁ ×L ₁ +ρ ₂ ×L _MS-SSIM +ρ ₃ ×L _VGG +ρ ₄ ×L _GAN +ρ ₅ ×L _TV

specifically, L _α (I _t ,I _o ) Comprises the following steps:

L ₁ ＝‖I _t -F _w (I _s )‖ ₁

L _MS-SSIM ＝1-MS-SSIM(I _t ,F _w (I _s ))

L ₁ for measuring color information and brightness information of the image. L (L) _MS-SSIM For retaining more high frequency information.

L _VGG For ensuring prediction of interference image F _w (I _s ) And interference image I _t The similarity of the characteristic performance is obtained by calculating the Euclidean distance mean value of the Feature Map among a plurality of layers through a pre-trained VGG (Visual Geometry Group) network.

L _GAN ＝-∑logD(I _t ,F _w (I _s ))

L _GAN For approximating the predicted interference image F _w (I _s ) And interference image I _t Perceived distance between, thus minimizing L _GAN The generated enhanced image can be made more realistic.

L _TV For suppressing high frequency noise. ρ is a weight coefficient, which can be based on specific trainingThe training process and experimental experience.

(6) The computer device obtains a second enhancement loss value based on the standard image and the de-interference image.

Illustratively, the cloud training server 410 is based on a standard image I _o And de-interference image I _r A second enhancement loss value L of the preset enhancement network can be determined _β (I _o ,I _r ). The specific calculation formula is as follows:

L _β (I _o ,I _r )＝‖I _o -I _r ‖ ₁

(7) And the computer equipment carries out iterative training on the preset enhancement network according to the enhancement total loss value of the first enhancement loss value and the second enhancement loss value until the trained preset enhancement network meets preset conditions, so as to obtain an enhancement model.

Illustratively, an enhanced total loss value L of the enhanced network is preset _σ (I _t ,I _o ) Can be obtained from the first enhancement loss value L _α (I _t ,F _w (I _s ) And a second enhancement loss value L _β (I _o ,I _r ) The addition determination, the target loss function of the preset enhancement network is as follows:

L _σ (I _t ,I _o )＝L _α (I _t ,F _w (I _s ))+L _β (I _o ,I _r )。

enhancing the total loss value L by optimizing _σ (I _t ,I _o ) And performing iterative training on the preset enhancement network until the preset enhancement network meets preset conditions, so as to obtain a trained enhancement model. Thereby removing the interference information in the document image and achieving the effect of enhancing the document image.

The preset conditions may be: the enhanced total loss value is smaller than a preset value, the enhanced total loss value is not changed any more, or the training times reach the preset times, etc. Alternatively, an optimizer may be employed to optimize the objective loss function, setting the Learning Rate (Learning Rate), the Batch Size at training (Batch Size), and the time period of training (Epoch) based on experimental experience.

Compared with the prior art that the neural network for image enhancement takes the picture with interference as input and takes the clean picture as output to train the network, the application trains the enhancement model, takes the synthesized picture as input and takes the standard clean picture as output, carries out combined loss calculation, and ensures better image enhancement effect of the trained enhancement model.

Step S220: the computer device is trained to obtain a correction model.

In an embodiment of the application, the correction model is used to perform image correction on the image so that the distorted image is converted into a normal form of image. The correction model is obtained by performing secondary mapping correction training based on the distorted sample image, the real first mapping matrix and the real second mapping matrix.

In some embodiments, the step of training the computer device to obtain a correction model may include:

(1) The computer device obtains a second training data set.

Wherein the second training data set comprises a warped sample image, a true first mapping matrix and a true second mapping matrix. The mapping matrix refers to a matrix with 2-dimensional coordinate mapping information of pixels on the image. Optionally, when constructing the second training data set, the 3D modeling software may be used to synthesize a coordinate mapping relationship, construct a mapping matrix according to the coordinate mapping relationship as a real mapping matrix of the tag data, and calculate coordinate mapping (Remap) based on the real mapping matrix constructed by the coordinate mapping relationship, so as to obtain a distorted sample image as sample data. In the actual network training process, the actual first mapping matrix and the actual second mapping matrix are the same tag data, the actual first mapping matrix is used for guiding the first mapping network to more accurately conduct preliminary prediction coordinate mapping matrix in the network training process, and the actual second mapping matrix is used for guiding the second mapping network to more accurately conduct fine adjustment on the preliminary prediction coordinate mapping matrix in the network training process.

For example, the cloud training server 410 may obtain a warped sample image, a true first mapping matrix, and a true second mapping matrix from the data acquisition device.

(2) The computer device obtains a preset correction network.

Referring to fig. 7, fig. 7 shows a network architecture diagram of a preset correction network. The preset correction network comprises a first mapping network and a second mapping network. The mapping network may be a convolutional neural network with a structure of a Unet network, for example, a mobiletv 2 network. The mapping network is used for extracting image features, and outputting predicted pixel point 2-dimensional coordinate information, namely (x, y) mapping information of each pixel point.

(3) And the computer equipment performs secondary mapping correction training on the first mapping network and the second mapping network in the preset correction network through the second training data set until the preset correction network meets preset conditions, and a trained correction model is obtained.

The preset correction network comprises a first mapping network and a second mapping network, the first mapping network can predict coordinate mapping information of the distorted sample image to obtain a first mapping training matrix, when an intermediate correction image is generated based on the first mapping training matrix and the distorted sample image, coordinate conversion can be carried out on the intermediate correction image to obtain an intermediate mapping matrix, and the intermediate mapping matrix and the first mapping training matrix are spliced to obtain a spliced mapping matrix. And predicting coordinate mapping information of the spliced mapping matrix by using the second mapping network to obtain a second mapping training matrix.

As an implementation manner, the step of performing, by the computer device, secondary mapping correction training on the first mapping network and the second mapping network in the preset correction network through the second training sample set until the preset correction network meets a preset condition, and obtaining the trained correction model may include:

(3.1) the computer device inputting the warped sample image to a first mapping network, generating a first mapping training matrix through the first mapping network.

Illustratively, the cloud training server 410 may warp the sample image I _p Input to a mobiletv 2 network, the warped sample image I _p Is n x 192 x 3, where n is the size of each input Batch, the mobiletv 2 network can extract a warped sample image I _p To predict 2D coordinate mapping information and further output distorted sample image I _p Corresponding first mapping training matrixA vector having dimensions n×384×384×2, and a third dimension of the vector is 2D coordinate mapping information of each pixel point.

(3.2) the computer device deriving an intermediate correction image based on the first mapping training matrix and the warped sample image.

Illustratively, the cloud training server 410 may train the matrix based on the first mapping And warp the sample image I _p Obtaining an intermediate corrected image I _m . Specifically, training matrix +.>For warped sample image I _p Image coordinate mapping (Remap) is performed to obtain an intermediate corrected image I _m 。

And (3.3) the computer equipment performs coordinate transformation on the intermediate correction image to obtain an intermediate mapping matrix, and based on the intermediate mapping matrix and the first mapping training matrix, a spliced mapping matrix is obtained.

The fact that the used convolution layer of the mapping network cannot learn the structural information in the image is considered, and prediction of the coordinate mapping information is affected. Therefore, the convolution can be assisted by adding two coordinate channels before the convolution through the coordinate conversion (CoordConv) operation to enable the filter (convolution kernel) to know the position of the filter (convolution kernel), so that the second mapping network can predict the coordinate mapping information better and accurately, and the correction effect of the preset correction network is improved.

Illustratively, the cloud training server 410 may intermediate correct the image I _m Performing coordinate transformation (Coordconv) to obtain an intermediate mapping matrix K with dimensions 384×384×2, and combining the intermediate mapping matrix K with the first mapping training matrixAnd splicing to obtain a splicing mapping matrix U.

(3.4) the computer device inputting the splice mapping matrix to a second mapping network to obtain a second mapping training matrix.

Illustratively, the cloud training server 410 may input the stitched mapping matrix u to another mobiletv 2 network to obtain a second mapped training matrixVectors with dimensions n x 384 x 2.

And (3.5) the computer equipment acquires a correction total loss value of the preset correction network based on the first mapping training matrix, the real first mapping matrix, the second mapping training matrix and the real second mapping matrix, and carries out iterative training on the preset correction network according to the correction total loss value until the preset correction network meets preset conditions, so as to obtain a trained correction model.

Illustratively, the cloud training server 410 may train the matrix based on the first mappingAnd a true first mapping matrix C, determining a first correction loss value +.>The specific calculation formula is as follows:

wherein, gamma is a weight coefficient, which can be determined according to a specific training process and experimental experience.

Further, cloud training server 410 may train the matrix based on the second mappingAnd a true second mapping matrix B, determining a second correction loss value of the preset correction network +.>The specific calculation formula is as follows:

wherein,the weight coefficient can be determined according to a specific training process and experimental experience.

Further, the cloud training server 410 may calculate a first correction loss value based on the first correction loss valueAnd a second correction loss valueObtaining a correction total loss value L of a preset correction network _C (I _p C, B), the target loss function of the preset correction network is as follows:

wherein epsilon and tau are weight coefficients, which can be determined according to specific training procedures and experimental experience.

Correcting the total loss value L by optimization _C (I _p And C, B), carrying out iterative training on the preset correction network until the preset correction network meets preset conditions, and obtaining a trained enhancement model. Thereby correcting the document image and distorting the imageConverted into a normal shape image.

The preset conditions may be: the enhanced total loss value is smaller than a preset value, the enhanced total loss value is not changed any more, or the training times reach the preset times, etc. Alternatively, an optimizer may be employed to optimize the objective loss function, setting the learning rate, the batch size during training, and the period of training based on experimental experience.

The correction model is obtained by generating a first mapping training matrix and a second mapping training matrix based on a distorted sample image as sample data and performing secondary mapping correction training by taking a real first mapping matrix and a real second mapping matrix as tag data. The correction model obtained through the secondary mapping correction training adopts two mapping networks (a second mapping network and a second mapping network), after the first mapping network outputs 2D coordinates, the 2D coordinates are spliced to obtain a spliced matrix after adding coordinate information generated by Coordconv, and then the spliced matrix is input into the second mapping network for fine adjustment, which is equivalent to the adjustment of two distorted pictures, so that the effect of recovering the distorted pictures is greatly improved.

Step S230: the computer device is trained to obtain a classification model.

Referring to fig. 8, fig. 8 shows a network structure diagram of a classification model. The classification model includes an input layer, two hidden layers, and an output layer. The training data set used in training the classification model may include a document sample image I _d And a classification label associated with the document sample image.

Illustratively, the cloud training server 410 performs iterative training on a preset classification network based on the training data set, where the preset classification network may have a mobiletv 2 network as a backbone network, and an objective loss function of the preset classification network is specifically as follows:

wherein M is the number of categories of the document sample image, and in the embodiment of the present application, m=5, tooThat is, the classification information of the document image includes 5 kinds of screen shot images, non-screen shot images, color images, black-and-white images, and screen images, wherein a screen shot image refers to a screen shot image acquired by a screen shot program on a computer device, and conversely, is a non-screen shot image except for the screen shot image. Whether it is a screen shot image or a non-screen shot image, it may be a color image or a black-and-white image. The screen image refers to a non-screen capturing image obtained by photographing a display screen by the image pickup apparatus, and it is understood that the screen image is generally moire-patterned. y is _ic For a sign function (0 or 1), if the true class of the document sample image is equal to c, take 1, otherwise go to 0.P is p _ic The predicted probability that sample i belongs to category c is observed. By optimizing the objective loss function L _class And training to obtain a classification model.

For example, the training of the preset enhancement network, the preset correction network, and the preset classification network in the above steps S210 to S220 may be performed on the cloud training server 410, and when the trained enhancement model, correction model, and classification model are obtained by the cloud training server 410, the enhancement model, correction model, and classification model may be deployed on the cloud execution server 430, so as to perform classification calculation, image enhancement, and image correction on the document image transmitted from the client, that is, the image shift software 420.

Step S240: the computer device acquires a document image.

As one embodiment, a camera of the computer device may capture a paper document to obtain a document image. Illustratively, in an image shift scenario, a user may open image shift software 420 installed on a smartphone to scan a paper document 440 on a desktop. Further, the image shift software 420 may obtain the document image of the paper document 440 through a rear camera of the smart phone. And transmits the document image to the cloud execution server 430.

Step S250: the computer device inputs the document image into a classification model, and classification information of the document image is obtained through the classification model.

In the embodiment of the application, an integral solution is provided for optimizing the document image, and the display problem in the image can be processed correspondingly aiming at the image type of the document image by classifying the document image.

In some embodiments, the computer device inputs the document image into a classification model, and the step of obtaining classification information of the document image by the classification model may include:

(1) The computer equipment performs conversion operation on the document image to obtain an image vector corresponding to the document image.

As one embodiment, the computer device may convert the document image into an image vector of the same specification through a conversion operation. Illustratively, after receiving the document image sent by the image transformation software 420, the cloud execution server 430 may perform a transformation operation on the document image by using the resolution function of OpenCV to obtain an image vector v with dimensions 224×224×3 _I 。

(2) The computer device inputs the image vector into a classification model, from which classification information of the document image is determined.

As one implementation, the computer device may input the image vector to a classification model by which classification information for the document image is determined. For example, the cloud execution server 430 may vector the image vector v _I The method is characterized in that the method is input into a classification model taking a MobilenetV2 network as a backbone network for feature extraction, and then the classification model outputs a vector with 5 dimensions, wherein the 5 dimensions of the vector respectively represent the probability that a document image belongs to classification information in 5, and the numerical value range of each dimension is between 0 and 1, and the closer to 1, the larger the probability that the document image belongs to the classification information. When the value of the dimension is greater than 0.8, it can be determined that the document image belongs to the classification information.

Step S260: the computer device processes the document image based on the classification information in at least one of an image enhancement process based on the enhancement model and an image correction process based on the correction model to obtain a preprocessed image.

In the embodiment of the application, after the classification model classifies the document images, the document images of different types can be pertinently optimized, the enhancement model can carry out image enhancement on the document images, interference information (shadow, illumination, texture, mole pattern and the like) in the document images is removed, the correction model can carry out image correction on the document images, and the images with distorted shapes are converted into images with normal shapes. The document image may be subjected to multi-dimensional image preprocessing and classification by a combination of classification models, enhancement models, and correction models.

In some embodiments, the step of processing the document image based on the classification information in at least one of an enhancement model-based image enhancement process and a correction model-based image correction process to obtain a preprocessed image may include:

(1) When the classification information of the document image is a non-screen-capturing image, the computer device inputs the non-screen-capturing image to the enhancement model, and outputs the enhancement image.

As one embodiment, the computer device inputs the non-screenshot image to the enhancement model, and the step of outputting the enhanced image may include:

(1.1) the computer device inputting the non-screenshot image into the enhancement model for feature extraction, generating a feature image.

(1.2) the computer device performing an enhancement calculation based on the feature image and the non-screenshot image, generating an enhanced image.

As one implementation, the enhancement model may include a prediction module and an enhancement module. Illustratively, the cloud execution server 430 determines that the document image is a non-screenshot image I' _s The non-screenshot image I 'can be then' _s Inputting to a prediction module, generating a characteristic image F by the prediction module _w (I′ _s ). Further, the cloud execution server 430 may perform the feature image F _w (I′ _s ) And non-screenshot image I' _s Input to enhancement module, generate enhanced image I 'through enhancement module' _s 。

(2) The computer device inputs the enhanced image to the correction model, and outputs the corrected image as a preprocessed image.

As an embodiment, the computer device inputs the enhanced image to a correction model, and the step of outputting the corrected image as the preprocessed image may include:

(2.1) the computer device inputting the enhanced image to a first mapping network, generating a first mapping matrix through the first mapping network.

(2.2) the computer device inputting the first mapping matrix to a second mapping network, generating the second mapping matrix through the second mapping network.

(2.3) the computer device performs image coordinate mapping on the second mapping matrix to obtain a corrected image.

As an embodiment, the correction model may comprise a first mapping network and a second mapping network. Illustratively, the cloud execution server 430 may enhance the image I' _s Input to a first mapping network, and generate a first mapping matrix C' through the first mapping network. Further, the cloud execution server 430 may generate a first mapping matrix C ' with the first mapping network, input the first mapping matrix C ' to the second mapping network, generate a second mapping matrix B ' with the second mapping network, and perform image coordinate mapping on the second mapping matrix B ' with the Remap function of OpenCV to obtain a corrected image I ' _e I.e. pre-processing the image.

Step S270: and the computer equipment performs color model processing on the preprocessed image based on the classification information to obtain a target image corresponding to the document image.

Illustratively, if cloud execution server 430 determines that image I 'is preprocessed' _e For color images, the preprocessing image I 'can be processed' _e Adjustment of Hue (Hue), saturation (Saturation) and brightness (Lightness) is performed. If the cloud execution server 430 determines that the image I 'is preprocessed' _e For black and white image, the preprocessing image I 'can be processed' _e And performing adaptive gray scale processing.

Specifically, referring to fig. 9, fig. 9 shows an adaptive gray scale process flow chart. Cloud execution server 430 may utilize the cvttcolor function of OpenCV to preprocess image I 'in black and white' _e Converting into gray level diagram, calculating gray level histogram to obtain array hist-expressed straightAnd (5) a square chart. Further, a cumulative distribution Accumulator is calculated based on the array hist. For example, the value of the first element in the Accumulator is the first element of hist, and the value of each element of the subsequent accumulators is the sum of the value of the current hist and the value of the last element in the Accumulator.

Further, the cloud execution server 430 may calculate the upper and lower grayscale boundaries according to the boundary threshold. For example, if the initial value min of the lower boundary is set to 0, the loop determines whether the accmulator [ min ] is smaller than the boundary threshold, if so, min+1 is set until the accmulator [ min ] is greater than the boundary threshold, and the loop is skipped to obtain the value of the lower boundary min. And (3) setting the initial value max of the upper boundary as the length of hist minus 1, circularly judging whether the Accumulator [ max ] is larger than the boundary threshold value, if so, carrying out max+1 until the Accumulator [ max ] is smaller than the threshold value, and jumping out the circulation to obtain the value of max.

The boundary threshold is a value set manually, and is adjusted by a professional in the calculation process, so that the intensity of the calculation effect of the algorithm is controlled, and different boundary thresholds can be used according to experimental experience to meet scene requirements under different conditions. Further, the cloud execution server 430 may calculate the target image S according to the upper and lower gray boundaries, and the core calculation formula is as follows:

θ＝255/(max-min)

specifically, an element smaller than 0 in S is set to 0, and an element larger than 255 in S is set to 255. The adaptive gradation processing can be performed simultaneously in any process using the enhancement model or the correction model.

Referring to fig. 10 for an example, fig. 10 shows another adaptive gray scale process flow diagram. The cloud execution server 430 performs image enhancement on the input image using the enhancement model or performs distortion recovery on the input image using the correction model, resulting in a preprocessed image. Further, the preprocessed image is converted into a gray scale map, and a gray scale histogram represented by the array is calculated. Further, an accumulated distribution is calculated based on the gradation histogram, and upper and lower gradation boundaries are calculated from the boundary threshold of the accumulated distribution, and further, a gradation map (target image) is calculated from the upper and lower gradation boundaries.

Referring to fig. 11, a block diagram of an image processing apparatus 500 according to an embodiment of the application is shown. The image processing apparatus 500 includes: an image acquisition module 510 for acquiring a document image; the image classification module 520 is configured to input the document image into a classification model, and obtain classification information of the document image through the classification model; an image processing module 530, configured to process the document image according to at least one of an image enhancement process based on the enhancement model and an image correction process based on the correction model based on the classification information, to obtain a preprocessed image, where the preprocessed image carries the classification information; the image adjustment module 540 is configured to perform color model processing on the preprocessed image based on the classification information, so as to obtain a target image corresponding to the document image, where the color model processing is configured to adjust at least one of saturation, contrast, and brightness of the preprocessed image; the enhancement model is obtained by performing combination loss calculation training on a synthetic image generated based on an interference image and a standard image serving as sample data and the interference image and the standard image serving as tag data and is used for removing interference information of a document image; the correction model is obtained by generating a first mapping training matrix and a second mapping training matrix based on the distorted sample image as sample data, performing secondary mapping correction training by taking a real first mapping matrix and a real second mapping matrix as tag data, and performing mapping correction on the distorted sample image based on the first mapping training matrix.

In some embodiments, the image processing module 530 may include: the enhancement unit is used for inputting the non-screen-capturing image into the enhancement model and outputting the enhancement image when the classification information of the document image is the non-screen-capturing image; and a correction unit for inputting the enhanced image to the correction model and outputting the corrected image as a preprocessed image.

In some embodiments, the image processing apparatus 500 may include: the first data acquisition module is used for acquiring a first training data set, and the first training data set comprises an interference image and a standard image; the image synthesis module is used for carrying out image synthesis on the interference image and the standard image to obtain a synthesized image; the interference prediction module is used for predicting a predicted interference image in the synthesized image through a preset enhancement network; the interference removing module is used for obtaining an interference removing image according to the synthesized image and the predicted interference image; the first loss calculation module is used for obtaining a first enhancement loss value based on the interference image and the predicted interference image; the second loss calculation module is used for obtaining a second enhancement loss value based on the standard image and the interference elimination image; the first training module is used for carrying out iterative training on the preset enhancement network according to the first enhancement loss value and the enhancement total loss value of the second enhancement loss value until the trained preset enhancement network meets preset conditions, and an enhancement model is obtained.

In some embodiments, the image processing apparatus 500 may include: the second data acquisition module is used for acquiring a second training data set, and the second training data set comprises a distorted sample image, a real first mapping matrix and a real second mapping matrix; the second training module is used for carrying out secondary mapping correction training on the first mapping network and the second mapping network in the preset correction network through the second training data set until the preset correction network meets preset conditions, and a trained correction model is obtained.

In some embodiments, the second training module may include: the first generation unit is used for inputting the distorted sample image into a first mapping network, generating a first mapping training matrix through the first mapping network, and predicting coordinate mapping information of the distorted sample image by the first mapping network; the second generation unit is used for obtaining an intermediate correction image based on the first mapping training matrix and the distorted sample image; the third generating unit is used for carrying out coordinate transformation on the intermediate correction image to obtain an intermediate mapping matrix, and obtaining a splicing mapping matrix based on the intermediate mapping matrix and the first mapping training matrix; a fourth generating unit, configured to input the splice mapping matrix to a second mapping network, to obtain a second mapping training matrix, where the second mapping network is configured to predict coordinate mapping information of the splice mapping matrix; the model training unit is used for acquiring a correction total loss value of a preset correction network based on the first mapping training matrix, the real first mapping matrix, the second mapping training matrix and the real second mapping matrix, and carrying out iterative training on the preset correction network according to the correction total loss value until the preset correction network meets preset conditions, so as to obtain a trained correction model.

In some embodiments, the model training unit may be specifically configured to: determining a first correction loss value of a preset correction network based on the first mapping training matrix and the real first mapping matrix; determining a second correction loss value of a preset correction network based on the second mapping training matrix and the real second mapping matrix; and acquiring a corrected total loss value of the preset correction network according to the first corrected loss value and the second corrected loss value.

In some embodiments, the enhancement unit may be specifically configured to: inputting the non-screen capturing image into an enhancement model for feature extraction to generate a feature image; and performing enhancement calculation based on the characteristic image and the non-screen-capturing image to generate an enhanced image.

In some embodiments, the correction model comprises a first mapping network and a second mapping network, the correction unit may be specifically configured to: inputting the enhanced image into a first mapping network, and generating a first mapping matrix through the first mapping network; inputting the first mapping matrix into a second mapping network, and generating a second mapping matrix through the second mapping network; and performing image coordinate mapping on the second mapping matrix to obtain a corrected image.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In several embodiments provided by the present application, the coupling of the modules to each other may be electrical, mechanical, or other.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

According to the scheme provided by the application, the document image can be acquired and input into the classification model to obtain the classification information of the document image, the document image is further processed according to at least one processing mode of image enhancement processing based on the enhancement model and image correction processing based on the correction model based on the classification information to obtain the preprocessed image, the preprocessed image carries the classification information, and the preprocessed image is subjected to color model processing based on the classification information to obtain the target image corresponding to the document image, wherein the color model processing is used for at least adjusting one of the saturation, the contrast and the brightness of the preprocessed image. Therefore, by classifying the document images, at least the processing of removing interference information on the document images or correcting image display problems such as picture distortion of the document images can be carried out on the document images of different image types, so that an integral solution is provided for carrying out multiple optimization on the document images, and the optimization effect and efficiency of the document images are greatly improved.

As shown in fig. 12, an embodiment of the present application further provides a computer apparatus 600, where the computer apparatus 600 includes a processor 610, a memory 620, a power source 630, and an input unit 640, and the memory 620 stores computer program instructions that, when invoked by the processor 610, can implement the various method steps provided in the above embodiments. It will be appreciated by those skilled in the art that the structure of the computer device shown in the drawings does not constitute a limitation of the computer device, and may include more or less components than those illustrated, or may combine certain components, or may be arranged in different components. Wherein:

processor 610 may include one or more processing cores. The processor 610 connects various parts within the overall battery management system using various interfaces and lines, and overall controls the computer device by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 620, invoking data stored in the memory 620, performing various functions of the battery management system and processing data, and executing various functions of the computer device and processing data. Alternatively, the processor 610 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 610 may integrate one or a combination of several of a central processor 610 (Central Processing Unit, CPU), an image processor 610 (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 610 and may be implemented solely by a single communication chip.

The Memory 620 may include a random access Memory 620 (Random Access Memory, RAM) or a Read-Only Memory 620 (Read-Only Memory). Memory 620 may be used to store instructions, programs, code sets, or instruction sets. The memory 620 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, etc. The storage data area may also store data created by the computer device in use, such as phonebook and audio video data, and the like. Accordingly, the memory 620 may also include a memory controller to provide the processor 610 with access to the memory 620.

The power supply 630 may be logically connected to the processor 610 through a power management system, so that functions of managing charging, discharging, and power consumption management are implemented through the power management system. The power supply 630 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

An input unit 640, the input unit 640 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device 600 may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 610 in the computer device loads executable files corresponding to the processes of one or more application programs into the memory 620 according to the following instructions, and the processor 610 executes the application programs stored in the memory 620, so as to implement the various method steps provided in the foregoing embodiment.

As shown in fig. 13, an embodiment of the present application further provides a computer readable storage medium 700, where the computer readable storage medium 700 stores computer program instructions 610, the computer program instructions 710 being executable by a processor to perform the method described in the above embodiment.

The computer readable storage medium may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium comprises a Non-volatile computer readable storage medium (Non-Transitory Computer-Readable Storage Medium). The computer readable storage medium 700 has storage space for program code to perform any of the method steps described above. The program code can be read from or written to one or more computer program products. The program code may be compressed, for example, in a suitable form.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the various alternative implementations provided in the above embodiments.

Although the present application has been described in terms of the preferred embodiments, it should be understood that the present application is not limited to the specific embodiments, but is capable of numerous modifications and equivalents, and alternative embodiments and modifications of the embodiments described above, without departing from the spirit and scope of the present application.

Claims

1. An image processing method, the method comprising:

Acquiring a document image;

inputting the document image into a classification model, and obtaining classification information of the document image through the classification model;

processing the document image according to at least one processing mode of image enhancement processing based on an enhancement model and image correction processing based on a correction model based on the classification information to obtain a preprocessed image, wherein the preprocessed image carries the classification information;

performing color model processing on the preprocessed image based on the classification information to obtain a target image corresponding to the document image, wherein the color model processing is used for at least adjusting one of saturation, contrast and brightness of the preprocessed image;

the enhancement model is obtained by performing combination loss calculation training on a synthetic image generated based on an interference image and a standard image serving as sample data and the interference image and the standard image serving as tag data and is used for removing interference information of a document image; the correction model is obtained by generating a first mapping training matrix and a second mapping training matrix based on a distorted sample image as sample data, performing secondary mapping correction training by taking a real first mapping matrix and a real second mapping matrix as tag data, and performing mapping correction on the distorted sample image based on the first mapping training matrix.

2. The method according to claim 1, wherein the processing the document image based on the classification information in at least one of an enhancement model-based image enhancement process and a correction model-based image correction process to obtain a preprocessed image includes:

when the classification information of the document image is a non-screen-capturing image, inputting the non-screen-capturing image into an enhancement model, and outputting an enhancement image;

and inputting the enhanced image into a correction model, and outputting a correction image as a preprocessing image.

3. The method according to claim 1, wherein the enhancement model is trained by:

acquiring a first training data set, wherein the first training data set comprises an interference image and a standard image;

image synthesis is carried out on the interference image and the standard image, so that a synthesized image is obtained;

predicting a predicted interference image in the synthesized image through a preset enhancement network;

obtaining a de-interference image according to the synthesized image and the predicted interference image;

obtaining a first enhancement loss value based on the interference image and the predicted interference image;

obtaining a second enhancement loss value based on the standard image and the interference-free image;

And performing iterative training on the preset enhancement network according to the enhancement total loss value of the first enhancement loss value and the second enhancement loss value until the trained preset enhancement network meets preset conditions, so as to obtain the enhancement model.

4. The method according to claim 1, wherein the correction model is trained by:

acquiring a second training data set, wherein the second training data set comprises a distorted sample image, a real first mapping matrix and a real second mapping matrix;

and performing secondary mapping correction training on a first mapping network and a second mapping network in a preset correction network through the second training data set until the preset correction network meets preset conditions, so as to obtain a trained correction model.

5. The method of claim 4, wherein performing secondary mapping correction training on the first mapping network and the second mapping network in the preset correction network through the second training data set until the preset correction network meets a preset condition, and obtaining a trained correction model includes:

inputting the distorted sample image into a first mapping network, and generating a first mapping training matrix through the first mapping network, wherein the first mapping network is used for predicting coordinate mapping information of the distorted sample image;

Obtaining an intermediate correction image based on the first mapping training matrix and the warped sample image;

performing coordinate conversion on the intermediate correction image to obtain an intermediate mapping matrix, and obtaining a spliced mapping matrix based on the intermediate mapping matrix and the first mapping training matrix;

inputting the spliced mapping matrix into a second mapping network to obtain a second mapping training matrix, wherein the second mapping network is used for predicting coordinate mapping information of the spliced mapping matrix;

based on the first mapping training matrix, the real first mapping matrix, the second mapping training matrix and the real second mapping matrix, obtaining a correction total loss value of the preset correction network, and performing iterative training on the preset correction network according to the correction total loss value until the preset correction network meets preset conditions, so as to obtain a trained correction model.

6. The method of claim 5, wherein the obtaining the corrected total loss value for the preset correction network based on the first mapping training matrix, the real first mapping matrix, the second mapping training matrix, and the real second mapping matrix comprises:

Determining a first correction loss value of the preset correction network based on the first mapping training matrix and the real first mapping matrix;

determining a second correction loss value of the preset correction network based on the second mapping training matrix and the real second mapping matrix;

and acquiring a corrected total loss value of the preset correction network according to the first corrected loss value and the second corrected loss value.

7. The method of claim 2, wherein the inputting the non-screenshot image to an enhancement model, outputting an enhanced image, comprises:

inputting the non-screen-capturing image into an enhancement model for feature extraction, and generating a feature image;

and performing enhancement calculation based on the characteristic image and the non-screen-capturing image, and generating an enhanced image.

8. The method of claim 2, wherein the correction model includes a first mapping network and a second mapping network, the inputting the enhanced image to the correction model, outputting a corrected image as a preprocessed image, comprising:

inputting the enhanced image into a first mapping network, and generating a first mapping matrix through the first mapping network;

Inputting the first mapping matrix into a second mapping network, and generating a second mapping matrix through the second mapping network;

and performing image coordinate mapping on the second mapping matrix to obtain a corrected image.

9. An image processing apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring a document image;

the image classification module is used for inputting the document image into a classification model, and obtaining classification information of the document image through the classification model;

the image processing module is used for processing the document image according to at least one processing mode of image enhancement processing based on an enhancement model and image correction processing based on a correction model based on the classification information to obtain a preprocessed image, wherein the preprocessed image carries the classification information;

the image adjustment module is used for carrying out color model processing on the preprocessed image based on the classification information to obtain a target image corresponding to the document image, wherein the color model processing is used for adjusting at least one of the saturation, contrast and brightness of the preprocessed image;

10. A computer readable storage medium having stored therein program code which is callable by a processor to perform the method according to any one of claims 1 to 8.

11. A computer device, comprising:

a memory;

one or more processors coupled with the memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-8.