WO2023287091A1

WO2023287091A1 - Method and apparatus for processing image

Info

Publication number: WO2023287091A1
Application number: PCT/KR2022/009607
Authority: WO
Inventors: Jun Li; Jianxing LIANG; Yiwen Yang; Meng Wang
Original assignee: Samsung Electronics Co., Ltd.
Priority date: 2021-07-15
Filing date: 2022-07-04
Publication date: 2023-01-19
Also published as: CN115700728A

Abstract

The present disclosure relates to a method and apparatus for processing an image. The method comprises obtaining an original image including one or more objects and one or more shadows; generating, based on the original image, a first mask image indicating area of the one or more objects and a second mask image indicating shadow area and non-shadow area on the original image; obtaining, based on the original image and the first mask image, a first image; and obtaining, based on the first image and the second mask image, a second image, by reducing a difference between a feature of the one or more objects in the shadow area and a feature of the one or more objects in the non-shadow area. At the same time, an artificial intelligence model may be used to perform the above method for processing an image that is performed by an electronic device.

Description

METHOD AND APPARATUS FOR PROCESSING IMAGE

The present disclosure relates to the field of image processing, and specifically to a method and apparatus for processing an image.

When people use a camera or a mobile phone to take a photograph of a document, due to ambient light issues (e.g., a single light source, light being blocked, and insufficient light) and a restriction of a shooting posture, and the like, a portion of area of the photograph taken for the document would be covered by a shadow, which results in a poor readability of the content of the photograph, and disadvantageously affects the later viewing, archiving, printing, sharing and disseminating.

The present disclosure provides a method and apparatus for processing an image to solve at least one of the above problems in the related art.

According to a first aspect of embodiments of the present disclosure, a method for processing an image is provided. The method comprises obtaining an original image including one or more objects and one or more shadows, wherein the object comprise at least one of a text, a drawing or a table, generating, based on the original image, a first mask image indicating area of the one or more objects, generating, based on the original image, a second mask image indicating shadow area and non-shadow area on the original image, obtaining, based on the original image and the first mask image, a first image, and obtaining, based on the first image and the second mask image, a second image, by reducing a difference between a feature of the one or more objects in the shadow area on the first image and a feature of the one or more objects in the non-shadow area on the first image. The one or more objects in the shadow area on the first image includes the one or more shadows

According to an embodiment, the generating, based on the original image, a first mask image indicating area of the one or more objects may comprise generating, based on the original image, a text mask image as the first mask image, using a text detection model referring to an artificial intelligence model. The obtaining, based on the original image and the first mask image, a first image may comprise obtaining, based on the original image and the first mask image, a first image, wherein the one or more texts in the shadow area on the first image includes the one or more shadows. One or more objects in the shadow area on the first image may include the one or more shadows.

According to an embodiment, the generating, based on the original image, a second mask image indicating shadow area and non-shadow area on the original image may comprise generating, based on the original image, a second mask image indicating shadow area and non-shadow area on the original image, using a shadow detection model referring to an artificial intelligence model.

According to an embodiment, the obtaining, based on the original image and the first mask image, a first image may comprise identifying, based on the original image and the first mask image, object area and non-object area from the original image, and filling a non-object area in the original image with a predetermined color or pattern, to obtain the first image.

According to an embodiment, the predetermined color or pattern may be a pre-specified color or pattern, or an identical or similar color or pattern obtained according to a background color or pattern of the original image.

According to an embodiment, the obtaining, based on the original image and the first mask image, a first image may comprise identifying, based on the original image and the first mask image, object area and non-object area from the original image, eliminating the one or more shadow from object area, using a shadow elimination model referring to an artificial intelligence model, eliminating the one or more shadow from non-object area, using a shadow elimination model referring to an artificial intelligence model, combining the object area and non-object area to obtain first image.

According to an embodiment, the obtaining, based on the first image and the second mask image, a second image may comprise identifying, based on the first image and the second mask image, the shadow area on the first image and the non-shadow area on the first image, and performing first processing on the shadow area, and/or performing second processing on the non-shadow area, wherein the first processing and the second processing are opposite operations.

According to an embodiment, the first processing may comprise at least one of expanding an edge of a non-text area in the shadow area towards the text area in the shadow area, increasing a brightness of a text in the text area in the shadow area or reducing a contrast of the text in the text area in the shadow area. The second processing may comprise at least one of contracting an edge of a non-text area in the non-shadow area relative to the text area in the shadow area, reducing a brightness of a text in the text area in the non-shadow area or increasing a contrast of the text in the text area in the non-shadow area. The feature may comprise at least one of brightness, contrast, thickness.

According to a first aspect of embodiments of the present disclosure, a method for processing an image is provided. The method comprises acquiring an original image of a document image having a shadow, acquiring text area information and shadow area information from the original image, obtaining an original text area from the original image based on the text area information, obtaining a first image based on the original text area and the original image, wherein a non-text area on the first image not having the shadow, and processing the first image by adjusting a text in the first image based on the shadow area information to obtain a second image.

According to an embodiment, the processing the first image may be to reduce a difference between a text in a shadow area on the first image and a text in a non-shadow area on the first image.

According to an embodiment, the acquiring text area information and shadow area information from the original image may comprise using a text detection model to obtain a text area mask image as the text area information, based on the original image, and using a shadow detection model to obtain a shadow area mask image as the shadow area information, based on the original image. The text detection model and the shadow detection model are artificial intelligence models.

According to an embodiment, the obtaining an original text area from the original image based on the text area information may comprise determining the original text area from the original image based on the text area information, and obtaining the original text area. The obtaining the first image based on the original text area and the original image may comprise filling a non-text area in the original image with a predetermined color or pattern, to obtain the first image.

According to an embodiment, the obtaining the first image based on the original text area and the original image may comprise eliminating the shadow from the original image based on the original image and the shadow area information to obtain a third image, using a shadow elimination model referring to an artificial intelligence model, determining the original text area from the original image based on the text area information, and superimposing the original text area onto the third image to obtain the first image.

According to an embodiment, the method for processing an image may further comprise eliminating the shadow from the original image based on the original image and the shadow area information to obtain a third image, using a shadow elimination model referring to an artificial intelligence model, and superimposing a text area in the second image onto the third image to obtain a fourth image.

According to an embodiment, the processing the first image by adjusting a text in the first image based on the shadow area information to obtain a second image may comprise determining a text area in a shadow area and a text area in a non-shadow area in the first image based on the shadow area information, and performing first processing on the text area in the shadow area, and/or performing second processing on the text area in the non-shadow area. The first processing and the second processing are opposite operations.

According to an embodiment, the first processing may comprise at least one of expanding an edge of a non-text area in the shadow area towards the text area, increasing a brightness of a text in the text area in the shadow area, or reducing a contrast of the text in the text area in the shadow area. The second processing may comprise at least one of contracting an edge of a non-text area in the non-shadow area relative to the text area, reducing a brightness of a text in the text area in the non-shadow area, or increasing a contrast of the text in the text area in the non-shadow area.

According to an embodiment, the method for processing an image may further comprise displaying a mode option for selecting a first mode or a second mode, receiving an input instruction to select the mode option, performing, in response to an input instruction to select the first mode being received, a method for processing an image in the first mode, and performing, in response to an input instruction to select the second mode being received, a method for processing an image in the second mode, comprising acquiring the original image of the document image having the shadow, acquiring the shadow area information from the original image, and using the shadow elimination model to eliminate the shadow from the original image based on the original image and the shadow area information to obtain the third image that the shadow is eliminated, the shadow elimination model referring to the artificial intelligence model.

According to an embodiment, the shadow detection model is trained by acquiring a training sample, wherein the training sample comprises a document image sample having a shadow and a corresponding annotated shadow area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image, inputting the document image sample having the shadow into the shadow detection model to obtain a predicted shadow area mask image, calculating a loss function based on the predicted shadow area mask image and the annotated shadow area mask image, and adjusting parameters of the shadow detection model based on the calculated loss function.

According to an embodiment, the text detection model may be trained by acquiring a training sample, wherein the training sample comprises a document image sample having a shadow and a corresponding annotated text area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image, inputting the document image sample having the shadow into the text detection model to obtain a predicted text area mask image, calculating a loss function based on the predicted text area mask image and the annotated text area mask image, and adjusting parameters of the text detection model based on the calculated loss function.

According to an embodiment, the shadow elimination model may be trained by acquiring a training sample, wherein the training sample comprises a target document image sample, a corresponding document image sample having a shadow, and a corresponding annotated shadow area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image, inputting the document image sample having the shadow and the annotated shadow area mask image into the shadow elimination model to obtain a predicted document image, calculating a loss function based on the target document image sample and the predicted document image, and adjusting parameters of the shadow elimination model based on the calculated loss function.

According to a second aspect of the embodiments of the present disclosure, an apparatus for processing an image is provided. The apparatus comprises an image acquiring unit, configured to acquire an original image of a document image having a shadow, an information acquiring unit, configured to acquire text area information and shadow area information from the original image, a first image obtaining unit, configured to reserve an original text area from the original image based on the text area information, to obtain a first image, a non-text area on the first image not having the shadow, and a second image obtaining unit, configured to adjust a text in the first image based on the shadow area information, to obtain a second image.

According to an embodiment, the adjusting may be to reduce a difference between a text in a shadow area and a text in a non-shadow area.

According to an embodiment, the information acquiring unit may be configured to use a text detection model to obtain a text area mask image as the text area information, based on the original image, and use a shadow detection model to obtain a shadow area mask image as the shadow area information, based on the original image. The text detection model and the shadow detection model are artificial intelligence models.

According to an embodiment, the first image obtaining unit may be configured to determine and reserve the original text area from the original image based on the text area information, and fill a non-text area in the original image with a predetermined color or pattern, to obtain the first image.

According to an embodiment, the first image obtaining unit may be configured to use a shadow elimination model to eliminate the shadow from the original image based on the original image and the shadow area information, to obtain a third image that the shadow is eliminated, the shadow elimination model referring to an artificial intelligence model, determine the original text area from the original image based on the text area information, and superimpose the original text area onto the third image to obtain the first image.

According to an embodiment, the apparatus for processing an image may further comprise a fourth image obtaining unit, configured to use the shadow elimination model to eliminate the shadow from the original image based on the original image and the shadow area information, to obtain the third image that the shadow is eliminated, the shadow elimination model referring to the artificial intelligence model, and superimpose a text area in the second image onto the third image to obtain a fourth image.

According to an embodiment, the second image obtaining unit may be configured to determine a text area in a shadow area and a text area in a non-shadow area in the original image based on the shadow area information, and perform first processing on the text area in the shadow area, and/or perform second processing on the text area in the non-shadow area. The first processing and the second processing are opposite operations.

According to an embodiment, the first processing may comprise at least one of expanding an edge of a non-text area in the shadow area towards the text area, increasing a brightness of a text in the text area in the shadow area, or reducing a contrast of the text in the text area in the shadow area. The second processing may comprise at least one of: contracting an edge of a non-text area in the non-shadow area relative to the text area, reducing a brightness of a text in the text area in the non-shadow area, or increasing a contrast of the text in the text area in the non-shadow area.

According to an embodiment, the apparatus for processing an image may further comprise a displaying unit, configured to display a mode option for selecting a first mode or a second mode, a receiving unit, configured to receive an input instruction to select the mode option, a controlling unit, and a third image obtaining unit. The controlling unit is configured to control, in response to an input instruction to select the first mode being received, the image acquiring unit, the information acquiring unit, the first image obtaining unit and the second image obtaining unit to perform a method for processing an image in the first mode, and control, in response to an input instruction to select the second mode being received, the image acquiring unit to acquire the original image of the document image having the shadow, the information acquiring unit to acquire the shadow area information from the original image, and the third image obtaining unit to use the shadow elimination model to eliminate the shadow from the original image based on the original image and the shadow area information to obtain the third image that the shadow is eliminated, the shadow elimination model referring to the artificial intelligence model.

According to an embodiment, the shadow detection model may be trained by acquiring a training sample, wherein the training sample comprises a document image sample having a shadow and a corresponding annotated shadow area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image, inputting the document image sample having the shadow into the shadow detection model to obtain a predicted shadow area mask image, calculating a loss function based on the predicted shadow area mask image and the annotated shadow area mask image, and adjusting parameters of the shadow detection model based on the calculated loss function.

According to a third aspect of the embodiments of the present disclosure, an electronic device is provided. The electronic devices may comprise at least one processor configured to acquire an original image of a document image having a shadow, acquire text area information and shadow area information from the original image, obtain an original text from the original image based on the text area information, obtain a first image based on the original text area and the original image, wherein a non-text area on the first image not having the shadow, and process the first image based on the shadow area information to obtain a second image.

According to a fourth aspect of the embodiments of the present disclosure, A computer readable storage medium containing instructions, the instructions configured to cause at least one processor of a computer to acquire an original image of a document image having a shadow, acquire text area information and shadow area information from the original image, obtain an original text from the original image based on the text area information, obtain a first image based on the original text area and the original image, wherein a non-text area on the first image not having the shadow, and process the first image based on the shadow area information to obtain a second image.

The technical solution provided by the embodiments of the present disclosure at least have the following beneficial effects.

According to the method and apparatus for processing an image of the present disclosure, the text area information in a document image having a shadow area may be used to obtain a preliminary image that a shadow in a background area is removed. Then, the shadow area information in the document image having the shadow area may be used to obtain a shadow-removed image that the text area in the preliminary image is adjusted. Therefore, the image having an improved shadow removal effect is obtained. In addition, an AI model may further be used to perform the method for processing an image according to the present disclosure, thereby improving the efficiency and effect of the shadow removal in the method for processing an image according to the present disclosure. In addition, two shadow elimination modes may further be provided for a user to select. The user may select any one of the two modes as needed, to eliminate the shadow in the document image by one click, to obtain a clear and personalized document image.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present disclosure.

The accompanying drawings herein are incorporated into the specification to constitute a portion of the specification, and illustrate embodiments conforming to the present disclosure. The accompanying drawings are used to explain the principle of the present disclosure together with the specification, and do not constitute an improper limitation to the present disclosure.

Fig. 1 is a schematic diagram illustrating document images before and after a method for processing an image according to the present disclosure is used.

Fig. 2 is a flowchart illustrating a method for processing an image according to an exemplary embodiment of the present disclosure.

Fig. 3 is a schematic diagram illustrating an input and an output of a shadow detection model according to an exemplary embodiment of the present disclosure.

Fig. 4 is a schematic structural diagram illustrating a shadow detection model according to an exemplary embodiment of the present disclosure.

Fig. 5 is a schematic diagram illustrating training of a shadow detection model according to an exemplary embodiment of the present disclosure.

Fig. 6 is a schematic diagram illustrating a training sample for training a shadow detection model according to an exemplary embodiment of the present disclosure.

Fig. 7 is a flowchart illustrating making of a training sample for training a shadow detection model according to an exemplary embodiment of the present disclosure.

Fig. 8 is a schematic diagram illustrating a process of generating a document image sample having a shadow according to an exemplary embodiment of the present disclosure.

Fig. 9 is a schematic diagram illustrating an input and an output of a text detection model according to an exemplary embodiment of the present disclosure.

Fig. 10 is a schematic diagram illustrating training of a text detection model according to an exemplary embodiment of the present disclosure.

Fig. 11 is a schematic diagram illustrating a training sample for training a text detection model according to an exemplary embodiment of the present disclosure.

Fig. 12 is a flowchart illustrating making of a training sample for training a text detection model according to an exemplary embodiment of the present disclosure.

Fig. 13 is a schematic diagram illustrating an input and an output of a shadow elimination model according to an exemplary embodiment of the present disclosure.

Fig. 14 is a schematic diagram illustrating training of a shadow elimination model according to an exemplary embodiment of the present disclosure.

Fig. 15 is a schematic diagram illustrating a training sample for training a shadow elimination model according to an exemplary embodiment of the present disclosure.

Fig. 16 is a flowchart illustrating making of a training sample for training a shadow elimination model according to an exemplary embodiment of the present disclosure.

Fig. 17 is a flowchart illustrating a method for processing an image according to a first exemplary embodiment of the present disclosure.

Fig. 18 is a schematic diagram illustrating an example in which a shadow is removed in a first mode (i.e., a documentation mode) according to an exemplary embodiment of the present disclosure.

Fig. 19 is a flowchart illustrating a method for processing an image according to a second exemplary embodiment of the present disclosure.

Fig. 20 is a schematic diagram illustrating an example in which a shadow is removed in a second mode (i.e., a shadow removal mode) according to an exemplary embodiment of the present disclosure.

Fig. 21 is a flowchart illustrating applying of a method for processing an image according to an exemplary embodiment of the present disclosure.

Fig. 22 is a schematic diagram illustrating a shadow removal UI in a scenario in which a user takes a photograph of a document according to an exemplary embodiment of the present disclosure.

Fig. 23 is a schematic diagram illustrating a shadow removal UI in a scenario in which a user edits a photograph of a document according to an exemplary embodiment of the present disclosure.

Fig. 24 is a block diagram illustrating an apparatus for processing an image according to an exemplary embodiment of the present disclosure.

Fig. 25 is a block diagram of an electronic device 2500 according to an exemplary embodiment of the present disclosure.

In order to make one of ordinary skill in the art better understand the technical solution of the present disclosure, the technical solution in the embodiments of the present disclosure is below in combination with the accompanying drawings.

It should be noted that the terms "first," "second" and the like in the specification and claims of the present disclosure and the accompanying drawings are used to distinguish similar objects, and not necessarily used to describe a specific order or an order of priority. It should be understood that the data used in this way may be interchanged in an appropriate situation, such that the embodiments of the present disclosure that are described herein can be implemented in an order other than that illustrated or described herein. The implementations described in the following embodiments do not represent all embodiments consistent with the present disclosure. Rather, the implementations are merely examples of the apparatus and method consistent with some aspects of the present disclosure and described in detail in the appended claims.

It should be noted here that "at least one of several items" in the present disclosure represents that the three parallel situations "any one of the several items," "any combination of two or more of the several items" and "all of the several items" are contained. As an example, "including at least one of A and B" includes the following three parallel situations: 1) including A; 2) including B; and 3) including A and B. As another example, "performing at least one of step 1 and step 2" represents the following three parallel situations: 1) performing step 1; 2) performing step 2; and 3) performing step 1 and step 2.

When people use a camera or a mobile phone to take a photograph of a document, due to ambient light issues (e.g., a single light source, light being blocked, and insufficient light) and a restriction of a shooting posture, and the like, a portion of area of the photograph taken for the document would be covered by a shadow, which results in a poor readability of the content of the photograph, and disadvantageously affects the later viewing, archiving, printing, sharing and disseminating. In order to solve the above problems, the present disclosure proposes a method and apparatus for processing an image. Specifically, an AI (artificial intelligence) model is used to perform processing on a document image having a shadow, to obtain a document image that the shadow is eliminated, thereby improving the quality of the document image. For example, Fig. 1 is a schematic diagram illustrating document images before and after a method for processing an image according to the present disclosure is used. As shown in 110 in Fig. 1, a user may use a mobile phone to take a photograph of a document at night and wants to print the photograph. However, due to insufficient lighting or non-uniform light in a home environment, there may be a shadow block in the taken photograph, which affects the final printing effect. However, according to the method for processing an image in the present disclosure, the AI model may be used to perform processing on a document photograph having a shadow as shown in 110 in Fig. 1, to obtain a document photograph that the shadow is eliminated, as shown in 120 in Fig. 1.

In addition, according to the method and apparatus for processing an image in the present disclosure, shadow removal approaches of two modes may be further provided for the user. For example, the two modes may be referred to as a "documentation" mode and a "shadow removal" mode. In the "documentation" mode, the AI model may be used to perform a shadow elimination on a document image having a shadow and convert the background color of the document image into a predetermined color (e.g., a white color). In the "shadow removal" mode, the AI model may be used to perform a shadow elimination on the document image having the shadow and reserve the original background color (e.g., a background color of paper). When taking a photograph of a document (or after taking the photograph of the document), the user may select any one of the two modes as needed, to eliminate the shadow in the document image by one click, to obtain a clear and personalized document image.

Hereinafter, a method and apparatus for processing an image according to exemplary embodiments of the present disclosure will be specifically described with reference to Figs. 2-25.

Referring to Fig. 2, in step 201, an original image including one or more objects and one or more shadows may be obtained. The object may comprise at least one of a text, a drawing or a table. Throughout the specification, an original image including one or more objects and one or more shadows may be understood as an original image of a document image having a shadow. Here, the original image of the document image having the shadow may be obtained when a user takes a photograph of a document, may be obtained when the user edits a photograph of a document, may be acquired from a local memory or a local database as required, or may be received from an external data source (e.g., the Internet, a server, and a database) through an input apparatus or a transmission medium. For example, when the user takes a photograph or edits a photograph, whether the photograph is a document photograph may be detected. If it is detected that the photograph is the document photograph, a document edge detection may be performed on the document photograph, and a document area may be cut out to be used as an original image of a document image.

In step 202, a first mask image indicating area of the one or more objects may be generated based on the original image. According to an exemplary embodiment of the present disclosure, a text mask image as the first mask image may be generated based on the original image. A text detection model may be used to generate a text mask image. Throughout the specification, a text mask image may be understood as text area information or a text area mask image. For example, the text detection model may be any available AI model, of which the input may be the document image having the shadow or the feature image that the document image having the shadow is pre-processed, and of which the output may be the text area mask image. Here, the text area mask image may be used to mark which pixels in a document image are effective information pixels, for example, a text, an icon, and a table box. As shown in Fig. 9, Fig. 9 is a schematic diagram illustrating an input and an output of a text detection model according to an exemplary embodiment of the present disclosure.

In step 203 a second mask image indicating shadow area and non-shadow area on the original image may be generated based on the original image. According to an exemplary embodiment of the present disclosure, a shadow detection model 350 may be used to generate the second mask image indicating shadow area and non-shadow area on the original image. Throughout the specification, the second image may be understood as shadow area information or a shadow area mask image. For example, the shadow detection model 350 may be any available artificial intelligence (AI) model, of which the input 301 may be the original image of the document image having the shadow or a feature image after the original image of the document image having the shadow is pre-processed, and of which the output 303 may be the shadow area mask image. Here, the shadow area mask image may be used to mark which pixels in a document image are in a shadow area. As shown in Fig. 3, Fig. 3 is a schematic diagram illustrating an input and an output of a shadow detection model according to an exemplary embodiment of the present disclosure.

According to an exemplary embodiment of the present disclosure, the shadow detection model may be a bidirectional feature pyramid network (BDRAR) with a recurrent attention residual module. Fig. 4 is a schematic structural diagram illustrating a shadow detection model according to an exemplary embodiment of the present disclosure. As shown in Fig. 4, a fully convolutional network (FCN) such as a shadow detection model is a deep neural network algorithm for image semantic segmentation. A residual network (e.g., ResNet) may be adopted as a backbone convolutional network, to be used for an extraction for a feature map. The resolutions of the feature maps of the network are gradually reduced, and thus, the network is referred to as a "feature pyramid." A plurality of recurrent attention residual modules (abbreviated as RAR) are embedded into a feature pyramid structure, and an attention map is extracted from adjacent feature maps, to be used to fuse the spatial context information of adjacent layers in the feature pyramid, such that the network can better pay attention to a target recognition task on a target pixel. In addition, a residual structure is introduced, which is conductive to suppressing an interference of details of non-shadow areas in a high-resolution feature map to a prediction result. Furthermore, the recurrent attention residual module (RAR) adopts two sets of paths to integrate context information, in which one set is from a deep feature map to a shallow feature map and the other set is from the shallow feature map to the deep feature map, and thus is referred to as "bidirectional." In addition, a fully connected conditional random field layer (CRFasRNN) which may be trained from an end to an end and is in the form of RNN may be added to the output layer of the BDRAR, to further improve the recognition accuracy. In this way, a parameter and an arithmetic logic may be solidified into a model file, such that a hardware level optimization such as a GPU acceleration can be utilized, which avoids additional complex post-processing after a deployment to a mobile end. Clearly, the CRFasRNN may be selectively added according to a precision requirement, i.e., the CRFasRNN may or may not be added.

Clearly, the shadow detection model according to the present disclosure may not be limited to the above structure, and may also be implemented by adopting any possible AI model, for example, a GAN (generative adversarial network) model.

According to an exemplary embodiment of the present disclosure, the shadow detection model may be trained based on a document image sample having a shadow and a corresponding annotated shadow area mask image. Specifically, the shadow detection model may be trained by: acquiring a training sample, the training sample comprising the document image sample having the shadow and the corresponding annotated shadow area mask image, and the document image sample having the shadow being obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image; inputting the document image sample having the shadow into the shadow detection model to obtain a predicted shadow area mask image; calculating a loss function (e.g., the loss function may be implemented by adopting a binary cross-entropy loss function) based on the predicted shadow area mask image and the annotated shadow area mask image; and adjusting parameters of the shadow detection model based on the calculated loss function. Hereinafter, a process of making a training sample of a shadow detection model according to an exemplary embodiment of the present disclosure will be specifically described.

Fig. 5 is a schematic diagram illustrating training of a shadow detection model according to an exemplary embodiment of the present disclosure. As shown in Fig. 5, by respectively obatining a document image sample having a shadow and a corresponding annotated shadow area mask image, the document image sample having the shadow and the corresponding annotated shadow area mask image may be obtained, to be used to train the shadow detection model.

Fig. 6 is a schematic diagram illustrating a training sample for training a shadow detection model according to an exemplary embodiment of the present disclosure. As shown in Fig. 6, 610 in Fig. 6 shows a document image sample having a shadow, and 620 in Fig. 6 shows a corresponding annotated shadow area mask image.

Referring to Fig. 7, specific steps of making a training sample set for training a shadow detection model are as follows:

1) A large number of text, icon and table examples are prepared, and several examples are randomly extracted from text, icon and table example libraries, and combined into a picture content material (710).

2) A color is randomly selected from a background color list (the colors in the background color list are relatively light colors, for example, a light yellow color, a light gray color and a pure white color, which simulate a color temperature of a scene when a user takes a photograph), to be used to simulate the paper of a document in a different color temperature environment, and is thus used as a background color picture (720).

3) The picture content material is superimposed on the background color picture to obtain a "clean document image" (730).

4) A white shadow pattern is superimposed on a black background. The shadow pattern simulates a scenario in which light is blocked when the user takes a photograph, and a shadow picture is generally a large irregular pattern, and thus, an "annotated shadow area mask image" is obtained (740).

5) Color inversion processing, background completely transparent processing, foreground Gaussian blur processing, and foreground partially transparent processing are performed on the shadow pattern of the annotated shadow area mask image (750).

6) The picture processed through step 5) is superimposed on the clean document image to obtain a "document image sample having a shadow" (760).

Throughout the specification, the term "foreground" may be understood as a content (text, table, drawing) area in image and the term "background" may be understood as a non-content area in image.

Fig. 8 is a schematic diagram illustrating a process of generating a document image sample having a shadow (i.e., a specific implementation process of the above steps 5) and 6)) according to an exemplary embodiment of the present disclosure. Referring to Fig. 8, color inversion processing, background transparent processing, foreground Gaussian blur processing, and foreground partially transparent processing may be performed on an annotated shadow area mask image, and then, the annotated shadow area mask image after the above processing is superimposed with a clean document image, thereby obtaining a document image sample having a shadow.

Referring back to Fig. 2, according to an exemplary embodiment of the present disclosure, based on the original image of the document image having the shadow, a text detection model 950 may be used to obtain a text area mask image as the text area information. For example, the text detection model 950 may be any available AI model, of which the input 901 may be the document image having the shadow or the feature image that the document image having the shadow is pre-processed, and of which the output 903 may be the text area mask image. Here, the text area mask image may be used to mark which pixels in a document image are effective information pixels, for example, a text, an icon, and a table box. As shown in Fig. 9, Fig. 9 is a schematic diagram illustrating an input and an output of a text detection model according to an exemplary embodiment of the present disclosure.

According to an exemplary embodiment of the present disclosure, the text detection model may be a bidirectional feature pyramid network (BDRAR) with a recurrent attention residual module, for example, the structure shown in Fig. 4. That is, the structures of the text detection model and the shadow detection model according to the present disclosure may be similar, and the main difference between there lies in that the classification tasks are different. The text detection model is used to detect a text area, and the shadow detection model is used to detect a shadow area. Therefore, the corresponding training data are different, and the model parameters obtained through training are different. When the structure as shown in Fig. 4 is used to implement the text detection model, the text detection model as shown in Fig. 4 may output a text area mask image, rather then a shadow area mask image. Clearly, the text detection model according to the present disclosure may not be limited to the above structure, and may also be implemented by adopting any possible AI model, for example, a GAN model.

According to an exemplary embodiment of the present disclosure, the text detection model may be trained based on a document image sample having a shadow and a corresponding annotated text area mask image. Specifically, the text detection model may be trained by: acquiring a training sample, the training sample comprising the document image sample having the shadow and the corresponding annotated text area mask image, and the document image sample having the shadow being obtained by superimposing the shadow on a clean document image; inputting the document image sample having the shadow into the text detection model to obtain a predicted text area mask image; calculating a loss function (e.g., the loss function may be implemented by adopting a binary cross-entropy loss function) based on the predicted text area mask image and the annotated text area mask image; and adjusting parameters of the text detection model based on the calculated loss function. Hereinafter, a process of making a training sample of a text detection model according to an exemplary embodiment of the present disclosure will be specifically described.

Fig. 10 is a schematic diagram illustrating training of a text detection model according to an exemplary embodiment of the present disclosure. As shown in Fig. 10, by respectively obtaining a document image sample having a shadow and a corresponding annotated text area mask image, the document image sample having the shadow and the corresponding annotated text area mask image may be obtained, to be used to train the text detection model.

Fig. 11 is a schematic diagram illustrating a training sample for training a text detection model according to an exemplary embodiment of the present disclosure. As shown in Fig. 11, 1110 in Fig. 11 shows a document image sample having a shadow, and 1120 in Fig. 11 shows a corresponding annotated text area mask image.

Referring to Fig. 12, specific steps of making a training sample set for training a text detection model are as follows:

1) A large number of text, icon and table examples are prepared, and several examples are randomly extracted from text, icon and table example libraries, and combined into a picture content material (1210).

2) The picture content material is set to white, and superimposed on a picture having a black background to obtain an "annotated text area mask image" (1220).

3) A color is randomly selected from a background color list (the colors in the background color list are relatively light colors, for example, a light yellow color, a light gray color and a pure white color, which simulate a color temperature of a scene when a user takes a photograph), to be used to simulate the paper of a document in a different color temperature environment, and is thus used as a background color picture (1230).

4) The picture content material is superimposed on the background color picture to obtain a "clean document image" (1240).

5) A picture is randomly selected from several shadow pattern pictures. A shadow pattern simulates a scenario in which light is blocked when the user takes a photograph. The shadow pattern is generally a large irregular pattern (1250).

6) Color inversion processing, background completely transparent processing, foreground Gaussian blur processing, and foreground partially transparent processing are performed on the shadow pattern (1260).

7) The picture processed in step 6) is superimposed on the clean document image to obtain a "document image sample having a shadow" (1270).

The specific implementation process of the above steps 6) and 7) may be implemented with reference to Fig. 8.

Referring back to Fig. 2, in step 204, based on the original image and the first mask image, a first image may be obtained. The one or more objects in the shadow area on the first image may include the one or more shadows. According to an exemplary embodiment of the present disclosure, based on the original image and the text mask image, a first image may be obtained. According to an exemplary embodiment of the present disclosure, object area and non-object area from the original image may be identified based on the original image and the first mask image. The non-object area in the original image may be filled with a predetermined color or pattern, to obtain the first image. According to an exemplary embodiment of the present disclosure, object area may be text area and non-object area may be non-text area. For example, the predetermined color or pattern is a pre-specified color or pattern, or an identical or similar color or pattern obtained according to a background color or pattern of the original image.

According to an exemplary embodiment of the present disclosure, a non-text area in the original image of the document image having the shadow may be filled with a predetermined color or pattern, to obtain the first image. For example, the predetermined color or pattern is a pre-specified color or pattern, or an identical or similar color or pattern obtained according to a background color or pattern of the original image.

According to an exemplary embodiment of the present disclosure, object area and non-object area from the original image may be identified based on the original image and the first mask image. The one or more shadow from object area may be eliminated. A shadow elimination model may be used to eliminate the one or more shadow from object area, and may be used to eliminate the one or more shadow from non-object area. The object area and non-object area may be combined to obtain first image. The shadow elimination model 1350 is any available AI model, of which the input 1301 may be the document image having the shadow or the feature image after the document image having the shadow is pre-processed, and input 1303 may be the shadow area mask image outputted by the shadow detection model, and of which the output 1305 may be a document image that the shadow is eliminated. As shown in Fig. 13, Fig. 13 is a schematic diagram illustrating an input and an output of a shadow elimination model according to an exemplary embodiment of the present disclosure.

According to another exemplary embodiment of the present disclosure, based on the original image of the document image having the shadow and the shadow area information, a shadow elimination model may be used to eliminate the shadow from the original image of the document image having the shadow, to obtain a third image that the shadow is eliminated. Based on the text area information, the original text area may be determined from the original image of the document image having the shadow. The original text area may be superimposed onto the third image to obtain the first image. The shadow elimination 1350 model is any available AI model, of which the input (1301, 1303) may be the document image having the shadow or the feature image after the document image having the shadow is pre-processed, and the shadow area mask image outputted by the shadow detection model, and of which the output may be a document image that the shadow is eliminated. As shown in Fig. 13, Fig. 13 is a schematic diagram illustrating an input and an output of a shadow elimination model according to an exemplary embodiment of the present disclosure.

According to an exemplary embodiment of the present disclosure, the shadow elimination model may be implemented by adopting a GAN model. Specifically, the GAN model may include a generator and a discriminator. The input of the generator may be the document image having the shadow or the feature image (which may be referred to as real data) after the document image having the shadow is pre-processed, and the shadow area mask image outputted by the shadow detection model. The output of the generator may be a predicted document image that the shadow is eliminated (which may be referred to as generated data). The loss function of the generator may be at least one of a cross entropy loss function (of the generated data and 1) or an absolute error loss function (of the generated data and label data (a clean document image sample)). The input of the discriminator may be the generated data, and the output of the discriminator may be a probability that the generated data is true. The loss function of the discriminator may be at least one of a cross entropy loss function (of the real data and 1) or a cross entropy loss function (of the generated data and 0).

According to an exemplary embodiment of the present disclosure, the shadow elimination model may be trained based on the clean document image sample, a corresponding document image sample having a shadow, and a corresponding annotated shadow area mask image. Specifically, a training sample is acquired. Here, the training sample comprises a target document image sample, a corresponding document image sample having a shadow, and a corresponding annotated shadow area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image. The document image sample having the shadow and the annotated shadow area mask image are inputted into the shadow elimination model to obtain a predicted document image. A loss function is calculated based on the target document image sample and the predicted document image. Parameters of the shadow elimination model are adjusted based on the calculated loss function. Here, the clean document image refers to a document image not having a shadow but having a background color (the background color is a simulation for the original background color of a photograph when the user takes the photograph). The target document image sample refers to a target shadow-removed document image used to train the shadow elimination model. The target document image sample may be a clean document image, or a document image which does not have a shadow and of which the background is a predetermined color (e.g., a white color). Hereinafter, a process of making a training sample of a shadow elimination model according to an exemplary embodiment of the present disclosure will be specifically described.

Fig. 14 is a schematic diagram illustrating training of a shadow elimination model according to an exemplary embodiment of the present disclosure. As shown in Fig. 14, by respectively obtaining a target document image sample (i.e., a labelled image sample without a shadow), a corresponding document image sample having a shadow, and a corresponding annotated shadow area mask image, the target document image sample, the corresponding document image sample having the shadow, and the corresponding annotated shadow area mask image may be obtained, to be used to train the shadow elimination model.

Fig. 15 is a schematic diagram illustrating a training sample for training a shadow elimination model according to an exemplary embodiment of the present disclosure. As shown in Fig. 15, 1510 in Fig. 15 shows a document image sample having a shadow, 1520 in Fig. 15 shows a corresponding annotated shadow area mask image, and 1530 in Fig. 15 shows a corresponding clean document image sample.

Referring to Fig. 16, specific steps of making a training sample set for training a shadow elimination model are as follows:

1) A large number of text, icon and table examples are prepared, and several examples are randomly extracted from text, icon and table example libraries, and combined into a picture content material (1610).

2) The picture content material is superimposed onto a picture having a pure white background, to obtain a "document picture from which a shadow is eliminated and of which the background is pure white" as a "target document image sample" (1620).

3) A color is randomly selected from a background color list (the colors in the background color list are relatively light colors, for example, a light yellow color, a light gray color and a pure white color, which simulate a color temperature of a scene when a user takes a photograph), to be used to simulate the paper of a document in a different color temperature environment, and is thus used as a background color picture (1630).

4) The picture content material is superimposed onto the background color picture, to obtain a "clean document image." In addition, the clean document image may also be used as the target document image sample. Therefore, in the situation where the clean document image is used as the target document image sample, step 2) may be omitted (1640).

5) A white shadow pattern is superimposed on a black background. The shadow pattern simulates a scenario in which light is blocked when the user takes a photograph. A shadow picture is generally a large irregular pattern, thus obtaining an "annotated shadow area mask image" (1650).

6) Color inversion processing, background completely transparent processing, foreground Gaussian blur processing, and foreground partially transparent processing are performed on the shadow pattern of the annotated shadow area mask image (1660).

7) The picture processed in step 6) is superimposed on the clean document image to obtain a "document image sample having a shadow" (1670).

Clearly, in addition to being the text area mask image and the shadow area mask image, the text area information and the shadow area information may be any possible information reflecting a text area feature and any possible information reflecting a shadow area feature.

Referring back to Fig. 2, in step 205, based on the first image and the second mask image, a second image may be obtained. A difference between a feature of the one or more objects in the shadow area on the first image and a feature of the one or more objects in the non-shadow area on the first image may be reduced to obtain the second image. Since the text portion (effective pixels) of a document image (i.e., the first image) that the shadow is preliminarily eliminated may still be superimposed with a shadow noise, the pixel color of the text in the shadow area would be darker than the color of the text in the non-shadow area, and the strokes would be slightly wider. In order to solve this problem, the shadow area information may be used to adjust the document image that the shadow is preliminarily eliminated, thus obtaining the second image. In other words, this adjusting may reduce the difference between the text in the shadow area and the text in the non-shadow area. The feature may comprise at least one of brightness, contrast, thickness of objects. According to an exemplary embodiment of the present disclosure, based on the first image and the second mask image, the shadow area on the first image and the non-shadow area on the first image may be identified. First processing may be performed on the shadow area. A second processing may be performed on the non-shadow area. The first processing and the second processing may be opposite operations.

For example, the first processing may comprise, but not limited to, at least one of: expanding an edge of a non-text area in the shadow area towards the text area in the shadow area, increasing a brightness of a text in the text area in the shadow area, or reducing a contrast of the text in the text area in the shadow area. The second processing may comprise, but not limited to, at least one of: contracting an edge of a non-text area in the non-shadow area relative to the text area in the shadow area, reducing a brightness of a text in the text area in the non-shadow area, or increasing a contrast of the text in the text area in the non-shadow area.

The method for processing an image according to the present disclosure that is described above with reference to Fig. 2 may be a first mode according to the present disclosure. That is, in the first mode, it is possible to perform a semantic segmentation on the document image having the shadow, and obtain, based on the semantic segmentation result and the shadow area mask image, the document image that the shadow is eliminated, which may be referred to as a "semantic analysis shadow removal method." Here, the semantic segmentation may refer to a segmentation on the effective pixel of a text portion and the pixel of a background portion. Furthermore, according to a second mode of the present disclosure, it is possible to acquire the original image of the document image having the shadow; acquire the shadow area information from the original image of the document image having the shadow (e.g., by using the shadow detection model); and use, based on the original image of the document image having the shadow and the shadow area information, the shadow elimination model to eliminate the shadow from the original image of the document image having the shadow, to obtain the third image that the shadow is eliminated. That is, according to the second mode of the present disclosure, a shadow removal may be performed on the document image having the shadow by using the artificial intelligence model and the shadow area mask image, to obtain the document image that the shadow is eliminated, which may be referred to as a "direct shadow removal method."

Therefore, for example, the user may be provided with shadow removal methods of the above two modes, namely, the first mode (which may also be referred to as a documentation mode) and the second mode (which may also be referred to as a shadow removal mode). In the first mode, a shadow removal operation may be performed according to a first exemplary embodiment of the present disclosure. In the second mode, a shadow removal operation may be performed according to a second exemplary embodiment of the present disclosure.

According to an exemplary embodiment of the present disclosure, it is possible to display a mode option for selecting the first mode or the second mode; receive an input instruction to select the mode option; and perform, in response to an input instruction to select the first mode being received, the "semantic analysis shadow removal method," and perform, in response to an input instruction to select the second mode being received, the "direct shadow removal method."

Hereinafter, a method for processing an image in a first mode and a method for processing an image in a second mode according to exemplary embodiments of the present disclosure are specifically described, respectively.

First mode (i.e., documentation mode)

Fig. 17 is a flowchart illustrating a method for processing an image according to the first exemplary embodiment of the present disclosure.

Referring to Fig. 17, in step 1701, based on a document image having a shadow, a text detection model may be used to obtain a text area mask image.

In step 1702, based on the text area mask image, a first image may be obtained from the document image having the shadow. According to an exemplary embodiment of the present disclosure, based on the text area mask image, a text pixel area is determined and reserved from the document image having the shadow. An area other than the text pixel area in the document image having the shadow is filled with a predetermined color (e.g., a white color) to obtain the first image. That is, an effective pixel portion (foreground) in the document image having the shadow may be reserved according to the text area mask image, and the background (containing a shadow noise) may be filled with the predetermined color to obtain the first image. In the first image, it is possible that a shadow on an effective pixel is not yet eliminated.

In step 1703, based on the document image having the shadow, a shadow detection model may be used to obtain a shadow area mask image.

In step 1704, the shadow area mask image may be used to adjust a text in the first image, to obtain a second image.

According to an exemplary embodiment of the present disclosure, it is possible to: determine, based on the shadow area mask image, a text pixel area in a shadow area and a text pixel area in a non-shadow area in the first image; and perform at least one of: adjusting a stroke of a　text in the text pixel area in the shadow area, adjusting a color of the text in the text pixel area in the shadow area, or adjusting a color of a text in the text pixel area in the non-shadow area. The parameter for performing the above adjustments may be set or adjusted according to an empirical value. As an example, a "dilation" operation may be performed on the shadow area in the first image based on the shadow area mask image. That is, the non-text pixel area in the shadow area in the first image that is already filled with the predetermined color (e.g., the white color) is expanded, such that the text pixel area in the shadow area is narrowed, which has the effect of making the stroke become thin. As another example, the pixel color of the text in the text pixel area in the shadow area may be brightened or faded. As another example, processing such as a certain degree of contrast enhancement (e.g., color deepening) on the pixel of the text in the non-shadow area based on a non-shadow area mask image (i.e., obtained through a reverse operation on the shadow area mask image), thereby reducing the difference between the pixel colors of the texts in the shadow/non-shadow area.

Clearly, no any limitation is made to the sequence of the

steps

1701 and 1702 and the step 1703 in the present disclosure. For example, it is possible that the step 1703 is first performed, and the

steps

1701 and 1702 are then performed; that the

steps

1701 and 1702 are first performed, and the step 1703 is then performed; that the step 1701 is first performed, the step 1703 is then performed, and next, the step 1702 is performed; that the

steps

1701 and 1702 and the step 1703 are performed in parallel; or the like.

Referring to Fig. 18, an original image of a document image having a shadow may be obtained (as shown in 1810 in Fig. 18). Based on the document image having the shadow, a text area mask image (as shown in 1820 in Fig. 18) and a shadow area mask image (as shown in 1830 in Fig. 18) may be respectively obtained through a text detection model and a shadow detection model. Based on the text area mask image, a document image (as shown in 1840 in Fig. 18) that the shadow is preliminarily eliminated may be obtained from the document image having the shadow. Superimposed by shadow noise, text pixel area in the shadow area may be darker than text pixel area in the non-shadow area, or stroke may slightly wider than that in the non-shadow area(may not be clearly shown in FIG. 18). The shadow area mask image may be used to adjust the document image that the shadow is preliminarily eliminated, to obtain a document image (as shown in 1850 in Fig. 18) that the shadow is finally eliminated.

Second mode (i.e., shadow removal mode)

Fig. 19 is a flowchart illustrating a method for processing an image according to the second exemplary embodiment of the present disclosure.

Referring to Fig. 19, in step 1901, based on a document image having a shadow, a shadow detection model may be used to obtain a shadow area mask image.

In step 1902, based on the document image having the shadow and the shadow area mask image, a shadow elimination model may be used to obtain a document image that the shadow is eliminated. For example, the shadow elimination model is any available AI model.

Referring to Fig. 20, a document image having a shadow (as shown in 2010 in Fig. 20) may be acquired. Based on the document image having the shadow, a shadow area mask image (as shown in 2020 in Fig. 20) may be obtained through a shadow detection model. Based on the document image having the shadow and the shadow area mask image, a document image (as shown in 2030 in Fig. 20) that the shadow is eliminated may be obtained through a shadow elimination model.

In addition, according to an exemplary embodiment of the present disclosure, it is also possible to perform respectively a shadow elimination on the text area and the background area in the original image of the document image having the shadow, and then superimpose the text area after the shadow elimination onto the background area after the shadow elimination, to obtain an image that the shadow is eliminated. For example, the original image of the document image having the shadow may be acquired. Text area information and shadow area information may be acquired from the original image of the document image having the shadow. Based on the text area information, an original text area may be reserved from the original image of the document image having the shadow to obtain a first image, a non-text area on the first image not having the shadow. Based on the shadow area information, a text in the first image may be adjusted to obtain a second image. Based on the original image of the document image having the shadow and the shadow area information, the shadow elimination model may be used to eliminate the shadow from the original image of the document image having the shadow, to obtain a third image that the shadow is eliminated. A text area in the second image is superimposed on the third image to obtain a fourth image.

Hereinafter, an example in which a user uses a method for processing an image according to an exemplary embodiment of the present disclosure in a scenario in which the user takes a photograph or edits a photograph will be specifically described with reference to Figs. 21-23.

Referring to Fig. 21, in step 2101, when a user takes a photograph or edits a photograph, whether the photograph is a document photograph may be detected.

In step 2102, in response to a detection result being that the photograph is the document photograph, a document edge detection is performed on the photograph, and a document area is cut out to generate a document image.

In step 2103, in response to the detection result being that the photograph is not the document photograph, the photograph is saved as an ordinary picture or is generally edited.

In step 2104, after the document image is generated, an image editing interface is entered, and a shadow removal option is provided in the image editing interface.

In step 2105, when a user input of a selection for the shadow removal option is received, a shadow removal mode option (e.g., a "documentation" mode option and a "shadow removal" mode option) may be provided for the user to select, and two threads are simultaneously started to respectively perform a picture shadow noise elimination in a "documentation" mode and a picture shadow noise elimination in a "shadow removal" mode on the document image. In the "documentation" mode, a semantic segmentation may be performed on an original document image, and different post-processing may be performed on a pixel according to a segmentation result, to generate a document image after a shadow elimination, of which the background is replaced with a predetermined color. In the "shadow removal" mode, the original document image may be inputted into an AI network to directly obtain a document image after a shadow elimination, which reserves the original background color.

In step 2106, according to a selection of the user for the shadow removal mode option, a corresponding document image after a shadow elimination is outputted and displayed. That is, if the user selects the "documentation" mode option, a document image after a shadow elimination that is obtained through a "semantic analysis shadow removal method" is displayed. If the user selects the "shadow removal" mode option, a document image after a shadow elimination that is obtained through a "direct shadow removal method" is displayed.

Clearly, the present disclosure is not limited to the

above steps

2105 and 2106. In the present disclosure, it is also possible that the shadow removal mode option is first provided, and after the selection of the user for the shadow removal mode option is received, a thread is started to perform a picture shadow noise elimination in a mode selected by the user.

In step 2107, the corresponding document image after the shadow elimination is saved. Clearly, the user may also be prompted to choose to overwrite the original image or save the document image as a new picture. If the user chooses to overwrite the original image, the corresponding document image after the shadow elimination is saved. If the user chooses to save the document image as the new picture, the original document image and the corresponding document image after the shadow elimination are saved.

Referring to Fig. 22, after a user takes a photograph, whether the photograph is a document photograph may be first determined. If the photograph is the document photograph, a document edge detection (as shown in 2210 in Fig. 22) is performed. When the user selects the option "click to scan," a document area may be cut out, to generate a document image (as shown in 2220 in Fig. 22). When the user selects a shadow removal option in an editing interface, a "documentation" mode option and a "shadow removal" mode option (as shown in 2230 in Fig. 22) may be provided for the user, and a corresponding document image after a shadow removal is displayed according to the selection of the user.

Referring to Fig. 23, when a user edits a photograph, an editing option (as shown in 2310 in Fig. 23) may be provided for a user in an editing interface. When the user selects the editing option, whether the photograph is a document photograph may be determined. If the the photograph is the document photograph, a document edge detection (as shown in 2320 in Fig. 23) is performed. When the user selects the option "click to intercept," a document area may be cut out, to generate a document image (as shown in 2330 in Fig. 23). When the user selects a shadow removal option in the editing interface, a "documentation" mode option and a "shadow removal" mode option (as shown in 2340 in Fig. 23) may be provided for the user, and a corresponding document image after a shadow removal is displayed according to the selection of the user.

Referring to Fig. 24, an apparatus 2400 for processing an image according to the exemplary embodiment of the present disclosure may comprise: an image acquiring unit 2401, an information acquiring unit 2402, a first image obtaining unit 2403 and a second image obtaining unit 2404.

The image acquiring unit 2401 may acquire an original image of a document image having a shadow. Here, the original image of the document image having the shadow may be obtained when a user takes a photograph of a document, may be obtained when the user edits a photograph of a document, may be acquired from a local memory or a local database as required, or may be received from an external data source (e.g., the Internet, a server, and a database) through an input apparatus or a transmission medium. For example, when the user takes a photograph or edits a photograph, whether the photograph is a document photograph may be detected. If it is detected that the photograph is the document photograph, a document edge detection is performed on the document photograph, and a document area is cut out to be used as an original image of a document image.

The information acquiring unit 2402 may acquire text area information and shadow area information from the original image of the document image having the shadow.

According to an exemplary embodiment of the present disclosure, based on the original image of the document image having the shadow, the information acquiring unit 2402 may use a shadow detection model to obtain a shadow area mask image as the shadow area information. For example, the shadow detection model may be any available artificial intelligence (AI) model, of which the input may be the document image having the shadow or a feature image after the document image having the shadow is pre-processed, and of which the output may be the shadow area mask image. Here, the shadow area mask image is used to mark which pixels in a document image are in a shadow area.

According to an exemplary embodiment of the present disclosure, the shadow detection model may be a bidirectional feature pyramid network (BDRAR) with a recurrent attention residual module (as shown in Fig. 4). Clearly, the shadow detection model according to the present disclosure may not be limited to the above structure, and may also be implemented by adopting any possible AI model, for example, a GAN (generative adversarial network) model.

According to an exemplary embodiment of the present disclosure, the shadow detection model may be trained based on a document image sample having a shadow and a corresponding annotated shadow area mask image. Specifically, the shadow detection model may be trained by: acquiring a training sample, the training sample comprising the document image sample having the shadow and the corresponding annotated shadow area mask image, and the document image sample having the shadow being obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image; inputting the document image sample having the shadow into the shadow detection model to obtain a predicted shadow area mask image; calculating a loss function (e.g., the loss function may be implemented by adopting a binary cross-entropy loss function) based on the predicted shadow area mask image and the annotated shadow area mask image; and adjusting parameters of the shadow detection model based on the calculated loss function. In addition, the making of the training sample of the shadow detection model is already described in detail above, which will not be repeatedly described here.

According to an exemplary embodiment of the present disclosure, based on the original image of the document image having the shadow, the information acquiring unit 2402 may use a text detection model to obtain a text area mask image as the text area information. According to an exemplary embodiment of the present disclosure, the apparatus 2400 for processing an image may further comprise a displaying unit (not shown), a receiving unit (not shown), a controlling unit (not shown) and a third image obtaining unit (not shown). The displaying unit may display a mode option for selecting a first mode or a second mode. The receiving unit may receive an input instruction to select the mode option. The controlling unit may control, in response to an input instruction to select the first mode being received, the image acquiring unit 2401, the information acquiring unit 2402, the first image obtaining unit 2403 and the second image obtaining unit 2404 to perform a "semantic analysis shadow removal method," and control, in response to an input instruction to select the second mode being received, the image acquiring unit 2401, the information acquiring unit 2402 and the third image obtaining unit to perform a "direct shadow removal method."

For example, the text detection model may be any available AI model, of which the input may be the document image having the shadow or the feature image after the document image having the shadow is pre-processed, and of which the output may be the text area mask image. Here, the text area mask image is used to mark which pixels in a document image are effective information pixels, for example, a text, an icon, and a table box. For example, the text detection model may be a bidirectional feature pyramid network (BDRAR) with a recurrent attention residual module (as shown in Fig. 4). Clearly, the text detection model according to the present disclosure may not be limited to the above structure, and may also be implemented by adopting any possible AI model, for example, a GAN model.

According to an exemplary embodiment of the present disclosure, the text detection model may be trained based on a document image sample having a shadow and a corresponding annotated text area mask image. Specifically, the text detection model may be trained by: acquiring a training sample, the training sample comprising the document image sample having the shadow and the corresponding annotated text area mask image, and the document image sample having the shadow being obtained by superimposing the shadow on a clean document image; inputting the document image sample having the shadow into the text detection model to obtain a predicted text area mask image; calculating a loss function (e.g., the loss function may be implemented by adopting a binary cross-entropy loss function) based on the predicted text area mask image and the annotated text area mask image; and adjusting parameters of the text detection model based on the calculated loss function. In addition, the making of the training sample of the text detection model is already described in detail above, which will not be repeatedly described here.

Based on the text area information, the first image obtaining unit 2403 may reserve an original text area from the original image of the document image having the shadow to obtain a first image, a non-text area on the first image not having the shadow.

According to an exemplary embodiment of the present disclosure, based on the text area information, the original text area may be determined and reserved from the original image of the document image having the shadow, and a non-text area in the original image of the document image having the shadow may be filled with a predetermined color or pattern, to obtain the first image. For example, the predetermined color or pattern is a pre-specified color or pattern, or an identical or similar color or pattern obtained according to a background color or pattern of the original image.

According to another exemplary embodiment of the present disclosure, based on the original image of the document image having the shadow and the shadow area information, a shadow elimination model may be used to eliminate the shadow from the original image of the document image having the shadow, to obtain a third image that the shadow is eliminated. Based on the text area information, the original text area may be determined from the original image of the document image having the shadow. The original text area may be superimposed onto the third image to obtain the first image. The shadow elimination model is any available AI model, of which the input may be the document image having the shadow or the feature image after the document image having the shadow is pre-processed, and the shadow area mask image outputted by the shadow detection model, and of which the output may be a document image that the shadow is eliminated. According to an exemplary embodiment of the present disclosure, the shadow elimination model may be implemented by adopting a GAN model. Specifically, the GAN model may include a generator and a discriminator. The input of the generator may be the document image having the shadow or the feature image (which may be referred to as real data) after the document image having the shadow is pre-processed, and the shadow area mask image outputted by the shadow detection model. The output of the generator may be a predicted document image that the shadow is eliminated (which may be referred to as generated data). The loss function of the generator may be: a cross entropy loss function (of the generated data and 1) + an absolute error loss function (of the generated data and label data (a clean document image sample)). The input of the discriminator may be the generated data, and the output of the discriminator may be a probability that the generated data is true. The loss function of the discriminator may be: a cross entropy loss function (of the real data and 1) + a cross entropy loss function (of the generated data and 0).

According to an exemplary embodiment of the present disclosure, the shadow elimination model may be trained based on the clean document image sample, a corresponding document image sample having a shadow, and a corresponding annotated shadow area mask image. Specifically, a training sample is acquired. Here, the training sample comprises a target document image sample, a corresponding document image sample having a shadow, and a corresponding annotated shadow area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image. The document image sample having the shadow and the annotated shadow area mask image are inputted into the shadow elimination model to obtain a predicted document image. A loss function is calculated based on the target document image sample and the predicted document image. Parameters of the shadow elimination model are adjusted based on the calculated loss function. Here, the clean document image refers to a document image not having a shadow but having a background color (the background color is a simulation for the original background color of a photograph when the user takes the photograph). The target document image sample refers to a target shadow-removed document image used to train the shadow elimination model. The target document image sample may be a clean document image, or a document image which does not have a shadow and of which the background is a predetermined color (e.g., a white color). In addition, the making of the training sample of the shadow elimination model is already described in detail above, which will not be repeatedly described here.

Based on the shadow area information, the second image obtaining unit 2404 may adjust a text in the first image to obtain a second image. Since the text portion (effective pixels) of a document image (i.e., the first image) that the shadow is preliminarily eliminated may still be superimposed with a shadow noise, the pixel color of the text in the shadow area would be darker than the color of the text in the non-shadow area, and the strokes would be slightly wider. In order to solve this problem, the shadow area information may be used to adjust the document image that the shadow is preliminarily eliminated, thus obtaining the second image. In other words, this adjusting may reduce the difference between the text in the shadow area and the text in the non-shadow area.

According to an exemplary embodiment of the present disclosure, the second image obtaining unit 2404 may determine a text area in the shadow area and a text area in the non-shadow area in the original image of the document image having the shadow based on the shadow area information; and perform first processing on the text area in the shadow area, and/or perform second processing on the text area in the non-shadow area. The first processing and the second processing may be opposite operations.

For example, the first processing may comprise, but not limited to, at least one of: expanding an edge of a non-text area in the shadow area towards the text area, increasing a brightness of a text in the text area in the shadow area, and reducing a contrast of the text in the text area in the shadow area. The second processing may comprise, but not limited to, at least one of: contracting an edge of a non-text area in the non-shadow area relative to the text area, reducing a brightness of a text in the text area in the non-shadow area, and increasing a contrast of the text in the text area in the non-shadow area.

The method for processing an image according to the present disclosure that is described above with reference to Fig. 2 may be the first mode according to the present disclosure. That is, in the first mode, it is possible to perform a semantic segmentation on the document image having the shadow, and obtain, based on the semantic segmentation result and the shadow area mask image, the document image that the shadow is eliminated, which may be referred to as the "semantic analysis shadow removal method." Here, the semantic segmentation may refer to a segmentation on the effective pixel of a text portion and the pixel of a background portion. Furthermore, according to the second mode of the present disclosure, it is possible to acquire the original image of the document image having the shadow; acquire the shadow area information from the original image of the document image having the shadow (e.g., by using the shadow detection model); and use, based on the original image of the document image having the shadow and the shadow area information, the shadow elimination model to eliminate the shadow from the original image of the document image having the shadow, to obtain the third image that the shadow is eliminated. That is, according to the second mode of the present disclosure, a shadow removal may be performed on the document image having the shadow by using the artificial intelligence model and the shadow area mask image, to obtain the document image that the shadow is eliminated, which may be referred to as the "direct shadow removal method."

According to an exemplary embodiment of the present disclosure, the apparatus 2400 for processing an image may further comprise the displaying unit (not shown), the receiving unit (not shown) and the controlling unit (not shown) and a third image obtaining unit (not shown). The displaying unit may display the mode option for selecting the first mode or the second mode. The receiving unit may receive the input instruction to select the mode option. The controlling unit may control, in response to the input instruction to select the first mode being received, the image acquiring unit 2401, the information acquiring unit 2402, the first image obtaining unit 2403 and the second image obtaining unit 2404 to perform the "semantic analysis shadow removal method," and control, in response to the input instruction to select the second mode being received, the image acquiring unit 2401, the information acquiring unit 2402 and the third image obtaining unit (not shown) to perform the "direct shadow removal method."

In addition, according to an exemplary embodiment of the present disclosure, it is also possible to perform respectively a shadow elimination on the text area and the background area in the original image of the document image having the shadow, and then superimpose the text area after the shadow elimination onto the background area after the shadow elimination, to obtain an image that the shadow is eliminated. For example, the apparatus 2400 for processing an image may further comprise a fourth image obtaining unit (not shown). The image acquiring unit 2401 may acquire the original image of the document image having the shadow. The information acquiring unit 2402 acquires the text area information and the shadow area information from the original image of the document image having the shadow. Based on the text area information, the first image obtaining unit 2403 reserves the original text area from the original image of the document image having the shadow to obtain the first image, the non-text area on the first image not having the shadow. Based on the shadow area information, the second image obtaining unit 2404 adjusts the text in the first image to obtain the second image. Based on the original image of the document image having the shadow and the shadow area information, the fourth image obtaining unit (not shown) uses the shadow elimination model to eliminate the shadow from the original image of the document image having the shadow, to obtain the third image that the shadow is eliminated. The fourth image obtaining unit (not shown) superimposes a text area in the second image onto the third image to obtain a fourth image.

According to an embodiment, the image acquiring unit 2401, the information acquiring unit 2402, the first image obtaining unit 2403, and the second image obtaining unit 2404 are described as individual devices, but may be implemented through one processor. In this case, the image acquiring unit 2401, the information acquiring unit 2402, the first image obtaining unit 2403, and the second image obtaining unit 2404 may be implemented through an dedicated processor or through a combination of software and general-purpose processor such as application processor (AP), central processing unit (CPU) or graphic processing unit (GPU). The dedicated processor may be implemented by including a memory for implementing an embodiment of the disclosure or by including a memory processor for using an external memory.

Also, the image acquiring unit 2401, the information acquiring unit 2402, the first image obtaining unit 2403, and the second image obtaining unit 2404 may be configured by a plurality of processors. In this case, the image acquiring unit 2401, the information acquiring unit 2402, the first image obtaining unit 2403, and the second image obtaining unit 240 may be implemented through a combination of dedicated processors or through a combination of software and general-purpose processors such as AP, CPU or GPU.

At least one of the shadow detection model, the text detection model and the shadow elimination model to which the method and apparatus for processing an image according to the present disclosure relate may be implemented by an AI model. The functions associated with AI may be performed by at least one of a non-volatile storage device, a volatile storage device, or a processor.

The processor may include one or more processors. At this time, the one or more processors may be a general purpose processor (e.g., a central processing unit (CPU) and an application processor (AP)), a processor for graphics only (e.g., a graphics processing unit (GPU), a visual processing unit (VPU)), and/or an AI application specific processor (e.g., a neural processing unit (NPU)).

The one or more processors control the processing on inputted data according to a predefined operating rule or an artificial intelligence (AI) model stored in the non-volatile storage device and the volatile storage device. The predefined operating rule or the artificial intelligence model may be provided through training or learning. Here, the providing through the learning means that a predefined operating rule or an AI model with an expected textistic is formed by applying a learning algorithm to a plurality of pieces of learning data. The learning may be performed in a device performing the AI according to an embodiment, and/or may be implemented by a separate server/device/system.

As an example, the artificial intelligence model may consist of a plurality of neural network layers. Each layer has a plurality of weight values, and a layer operation is performed through a calculation of the previous layer and an operation of the plurality of weight values. Examples of a neural network include, but not limited to, a convolutional neural network (CNN), a deep neural network (DNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a depth belief network (DBN), a bidirectional recurrent depth neural network (BRDNN), a generative adversarial　network (GAN), and a deep Q network.

A learning algorithm may refer to a method of using a plurality of pieces of learning data to train a predetermined target device (e.g., a robot) to cause, allow, or control the target device to make a determination or prediction. Examples of the learning algorithm include, but not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

The AI model (e.g., at least one of the shadow detection model, the text detection model and the shadow elimination model) to which the method and apparatus for processing an image according to the present disclosure relate may be deployed at a cloud or a mobile end.

As an example, in the situation where the model is deployed at the cloud, the AI model runs in a server framework and needs to process an interaction (including authentication, picture uploading and downloading, etc.) with a client. In addition, a high concurrency situation of is also taken into consideration. The size of the AI model is not limited too much, and generally may be about 100-300MB. A network structure may adopt some modules occupying a large space and having a high precision, for example, a residual module. A server end has a relatively strong computing capability, and thus can quickly obtain an output result of the model.

As another example, in the situation where the model is deployed at the mobile end, there are many limitations (including a computing capability, a storage resource, battery power, etc.) when the AI model is deployed at the mobile end. Therefore, the model at the mobile end may have to satisfy at least one of the following conditions: a small model size, a low computational complexity, a low battery power consumption, a flexible deployment during issuing and updating, etc. The structure of the AI model is optimized (cropped). For example, referring to a model structure such as MobileNet and SqueezeNet, an optimization technique such as a depthwise separable convolution/linear bottleneck/inverted residual structure is used, to reduce the model size from more than 100 MB to less than or equal to 10MB. The AI model is converted into a tflite file (Google Tensorflow-Lite framework) or a dlc file (Qualcomm SNPE framework), and then deployed in a mobile phone. An APP invokes the tflite/dlc file through a Tensorflow-Lite/SNPE SDK to perform a model inference. When running, the APP checks whether the mobile phone supports a chip such as GPU/NPU/DSP. If the mobile phone supports the chip, these chips are preferentially used for calculation, thereby improving the inference speed of the AI model.

Referring to Fig. 25, the electronic device 2500 includes at least one storage device 2501 and at least one processor 2502. The at least one storage device 2501 stores a computer executable instruction set. The computer executable instruction set, when executed by the at least one processor 2502, performs the method for processing an image according to the exemplary embodiments of the present disclosure.

As an example, the electronic device 2500 may be a PC computer, a tablet apparatus, a personal digital assistant, a smart phone, or other apparatuses capable of executing the above instruction set. Here, the electronic device 2500 is not necessarily a single electronic device, but may be any collection of apparatuses or circuits capable of separately or jointly executing the above instruction (or instruction set). The electronic device 2500 may also be a portion of an integrated control system or system manager, or may be configured as a portable electronic device interconnected locally or remotely (e.g., via wireless transmission) through an interface.

In the electronic device 2500, the processor 2502 may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic apparatus, a dedicated processor system, a microcontroller, or a microprocessor. As an example rather than a limitation, the processor may further include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, and the like.

The processor 2502 may run the instructions or codes stored in the storage device 2501. Here, the storage device 2501 may further store data. The instructions and the data may also be transmitted and received over the network via a network interface apparatus. Here, the network interface apparatus may employ any known transmission protocol.

The storage device 2501 may be integrated with the processor 2502, for example, a RAM or flash memory is disposed within an integrated circuit microprocessor or the like. In addition, the storage device 2501 may include a separate apparatus, such as an external disk drive, a storage array, or other storage apparatuses that any database system may use. The storage device and the processor 2502 may be operatively coupled, or may communicate with each other (e.g., through an I/O port and a network connection), to enable the processor 2502 to read the data stored in the storage device.

In addition, the electronic device 2500 may further include a video display (e.g., a liquid crystal display) and a user interaction interface (e.g., a keyboard, a mouse and a touch input apparatus). All components of the electronic device 2500 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, a computer readable storage medium storing an instruction may further be provided. Here, the instruction, when executed by the at least one processor, causes the at least one processor to perform the method for processing an image according to the present disclosure. Examples of the computer readable storage medium herein include: a read-only memory (ROM), a random access programmable read-only memory (PROM), an electrically erasable programmable read-only memory (EEPROM), a random access memory (RAM), a dynamic random access memory (DRAM), a static random access memory (SRAM), a flash memory, a non-volatile memory, a CD-ROM, a CD-R, a CD+R, a CD-RW, a CD+RW, a DVD-ROM, a DVD-R, a DVD+R, a DVD-RW, a DVD+RW, a DVD-RAM, a BD-ROM, a BD-R, a BD-R LTH, a BD-RE, a Blu-ray or optical disk memory, a hard disk drive (HDD), a solid state disk (SSD), a card memory (e.g., a multimedia card, a secure digital (SD) card or an extreme digital (XD) card), a magnetic tape, a floppy disk, a magneto-optical data storage apparatus, an optical data storage apparatus, a hard disk, a solid state disk, and any other apparatus. The any other apparatus is configured to store a computer program and any associated data, data file and data structure in a non-transitory way, and provide the computer program and the any associated data, data file and data structure for a processor or a computer, to enable the processor or the computer to execute the computer program. The computer program in the above computer readable storage medium may run in an environment deployed in a computer device such as a client, a host, a proxy apparatus and a server. In addition, in an example, the computer program and the any associated data, data file and data structure are distributed over a networked computer system, such that the computer program and the any associated data, data file and data structure are stored, accessed and executed in a distributed way by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, a computer program product may further be provided. An instruction in the computer program product may be executed by a processor of a computer device to perform the method for processing an image according to the exemplary embodiments of the present disclosure.

After the specification is taken into consideration and the invention disclosed herein is practiced, it would be easy for those skilled in the art to think of other implementations of the present disclosure. The present disclosure is intended to cover any variations, uses or adaptations of the present disclosure, and the variations, the uses or the adaptations follow the general principle of the present disclosure, and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. The specification and the embodiments are to be regarded as exemplary only, and the true scope and spirit of the present disclosure are indicated by the appended claims.

It should be appreciated that, the present disclosure is not limited to the precise structure already described above and illustrated in the accompanying drawings, and various modifications and changes may be made without departing from the scope of the present disclosure. The scope of the present disclosure is limited only by the appended claims.

Claims

A method for processing an image, comprising:

obtaining an original image including one or more objects and one or more shadows, wherein the object comprise at least one of a text, a drawing or a table;

generating, based on the original image, a first mask image indicating area of the one or more objects;

generating, based on the original image, a second mask image indicating shadow area and non-shadow area on the original image;

obtaining, based on the original image and the first mask image, a first image, wherein the one or more objects in the shadow area on the first image includes the one or more shadows; and

obtaining, based on the first image and the second mask image, a second image, by reducing a difference between a feature of the one or more objects in the shadow area on the first image and a feature of the one or more objects in the non-shadow area on the first image.
The method according to claim 1, wherein the generating, based on the original image, a first mask image indicating area of the one or more objects comprises:

generating, based on the original image, a text mask image as the first mask image, using a text detection model referring to an artificial intelligence model,

wherein the obtaining, based on the original image and the first mask image, a first image, comprises:

obtaining, based on the original image and the text mask image, a first image, wherein the one or more texts in the shadow area on the first image includes the one or more shadows.
The method according to claim 1, wherein the generating, based on the original image, a second mask image indicating shadow area and non-shadow area on the original image comprises:

generating, based on the original image, the second mask image indicating shadow area and non-shadow area on the original image, using a shadow detection model referring to an artificial intelligence model.
The method according to claim 1, wherein the obtaining, based on the original image and the first mask image, a first image comprises:

identifying, based on the original image and the first mask image, object area and non-object area from the original image; and

filling a non-object area in the original image with a predetermined color or pattern, to obtain the first image.
The method according to claim 4, wherein the predetermined color or pattern is a pre-specified color or pattern, or an identical or similar color or pattern obtained according to a background color or pattern of the original image.
The method according to claim 1, wherein the obtaining, based on the original image and the first mask image, a first image comprises:

identifying, based on the original image and the first mask image, object area and non-object area from the original image;

eliminating the one or more shadow from object area, using a shadow elimination model referring to an artificial intelligence model;

eliminating the one or more shadow from non-object area, using a shadow elimination model referring to an artificial intelligence model;

combining the object area and non-object area to obtain first image.
The method according to claim 1, wherein the obtaining, based on the first image and the second mask image, a second image comprises:

identifying, based on the first image and the second mask image, the shadow area on the first image and the non-shadow area on the first image; and

performing first processing on the shadow area, and/or performing second processing on the non-shadow area, wherein the first processing and the second processing are opposite operations.
The method according to claim 7, wherein the first processing comprises at least one of:

expanding an edge of a non-text area in the shadow area towards the text area in the shadow area;

increasing a brightness of a text in the text area in the shadow area; or

reducing a contrast of the text in the text area in the shadow area,

wherein the second processing comprises at least one of:

contracting an edge of a non-text area in the non-shadow area relative to the text area in the shadow area;

reducing a brightness of a text in the text area in the non-shadow area; or

increasing a contrast of the text in the text area in the non-shadow area

wherein the feature comprises at least one of brightness, contrast, thickness.
The method according to claim 1, further comprising:

displaying a mode option for selecting a first mode or a second mode;

receiving an input instruction to select the mode option;

performing, in response to an input instruction to select the first mode being received, the method for processing an image according to claim 1; and

performing, in response to an input instruction to select the second mode being received, a method for processing an image comprising: obtaining the original image including one or more objects and one or more shadows, wherein the object comprise at least one of a text, a drawing or a table; generating, based on the original image, the second mask image indicating shadow area and non-shadow area on the original image; eliminating, based on the original image and the second image, the one or more shadow from original image to obtain a third image that the shadow is eliminated, using a shadow elimination model referring to an artificial intelligence model.
The method according to claim 3, wherein the shadow detection model is trained by:

acquiring a training sample, wherein the training sample comprises a document image sample having a shadow and a corresponding annotated shadow area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image;

inputting the document image sample having the shadow into the shadow detection model to obtain a predicted shadow area mask image;

calculating a loss function based on the predicted shadow area mask image and the annotated shadow area mask image; and

adjusting parameters of the shadow detection model based on the calculated loss function.
The method according to claim 2, wherein the text detection model is trained by:

acquiring a training sample, wherein the training sample comprises a document image sample having a shadow and a corresponding annotated text area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image;

inputting the document image sample having the shadow into the text detection model to obtain a predicted text area mask image;

calculating a loss function based on the predicted text area mask image and the annotated text area mask image; and

adjusting parameters of the text detection model based on the calculated loss function.
The method according to claim 6, wherein the shadow elimination model is trained by:

acquiring a training sample, wherein the training sample comprises a target document image sample, a corresponding document image sample having a shadow, and a corresponding annotated shadow area mask image, and the document image sample having the shadow is obtained by superimposing the shadow on a clean document image based on the annotated shadow area mask image;

inputting the document image sample having the shadow and the annotated shadow area mask image into the shadow elimination model to obtain a predicted document image;

calculating a loss function based on the target document image sample and the predicted document image; and

adjusting parameters of the shadow elimination model based on the calculated loss function.
An electronic device, comprising:

at least one processor configured to:

obtain an original image including one or more objects and one or more shadows, wherein the object comprise at least one of a text, a drawing or a table;

generate, based on the original image, a first mask image indicating area of the one or more objects;

generate, based on the original image, a second mask image indicating shadow area and non-shadow area on the original image;

obtain, based on the original image and the first mask image, a first image, wherein the one or more objects in the shadow area on the first image includes the one or more shadows; and

obtain, based on the first image and the second mask image, a second image, by reducing a difference between a feature of the one or more objects in the shadow area on the first image and a feature of the one or more objects in the non-shadow area on the first image.
The device according to claim 13,

at least one processor configured to:

identify, based on the original image and the first mask image, object area and non-object area from the original image;

fill a non-object area in the original image with a predetermined color or pattern, to obtain the first image.
A computer readable storage medium containing instructions, the instructions configured to cause at least one processor of a computer to:

obtain an original image including one or more objects and one or more shadows, wherein the object comprise at least one of a text, a drawing or a table;

generate, based on the original image, a first mask image indicating area of the one or more objects;

generate, based on the original image, a second mask image indicating shadow area and non-shadow area on the original image;

obtain, based on the original image and the first mask image, a first image, wherein the one or more objects in the shadow area on the first image includes the one or more shadows; and

obtain, based on the first image and the second mask image, a second image, by reducing a difference between a feature of the one or more objects in the shadow area on the first image and a feature of the one or more objects in the non-shadow area on the first image.