WO2021123209A1

WO2021123209A1 - Method for segmenting an input image showing a document containing structured information

Info

Publication number: WO2021123209A1
Application number: PCT/EP2020/087081
Authority: WO
Inventors: Laurent ROSTAING; Alain Rouh; Catalin Codreanu
Original assignee: Carrus Gaming
Priority date: 2019-12-18
Filing date: 2020-12-18
Publication date: 2021-06-24
Also published as: CA3161385A1; EP4078435A1; FR3105529A1; FR3105529B1

Abstract

The present invention relates to a method for segmenting an input message showing a document containing structured information, the method comprising steps of: C2) determining a type of document using a classifying method, said type being chosen from a set of possible types each associated with one reference image of the type of document, the reference image being an image solely containing the document; C3) estimating the parameters of a geometric transformation to be applied to the reference image of the type of document determined in step C2) in order to obtain the document such as shown in the input image; C4) determining a segmentation mask on the basis of the geometric-transformation parameters estimated in step C3) and of the reference image of the type of document determined in step C2).

Description

TITLE: Process for segmenting an input image representing a document containing structured information

GENERAL TECHNICAL FIELD

The present invention relates to the field of the identification of documents comprising structured information, for example in the form of check boxes. More particularly it relates to a method of identification (classification) and segmentation of such a document, for example a game bulletin, in an image taken in an uncontrolled environment.

STATE OF THE ART

The term “document comprising structured information” will be understood to mean an element, generally on paper medium having a predefined shape and patterns, comprising indications and a set of areas to be completed, for example boxes to be checked. Such a document may for example be a game slip (lottery for example), a form, or a multiple choice questionnaire.

The processing of documents containing structured information has undergone many developments thanks to advances made in the field of computer vision.

Methods have thus been proposed such as that described in patent FR2983607, allowing the tracking of an object in a sequence of at least two images, based on an extraction of characteristic points in the images, for example by SIFT descriptors ( "Scale-invariant feature transform"), as well as a mapping of characteristic points between images. This method comprises a step of selecting so-called “singular” points which makes it possible to reduce ambiguities in the case of documents which present repetitive patterns such as the boxes of game slips. It makes it possible to ensure the follow-up of a document present in the document. a video stream, without the document model being known beforehand, and uses a set of points of interest present in the images. This method thus makes it possible to determine a zone of presence and a segmentation of the object, resulting from the analysis of the movement between the images. However, the performance of the point of interest extraction of this approach is limited by the absence of a reference image. Depending on the documents, there may be regions with few or no characteristic points, which will then be poorly detected by the method. Finally, in the case of stacking of documents, the last document added can cause a movement of documents located below, some parts of which can then be incorrectly detected, or that may prevent correct detection of the last added document. Finally, this method has the major drawback that it does not make it possible to identify the document model.

Another method, proposed in patent FR3027136, allows the identification of at least one sign of a deformed document based on the segmentation of candidate sign regions, then the joint identification of these candidate sign regions and a deformation model compared to previously stored document models. However, this method has certain drawbacks, first of all, it does not allow classification and segmentation of an image, its purpose is simply to decode a given document from a document template. In addition, the method described in this patent requires manual delineation of the regions of the document comprising information in order to identify it. Finally, since this process does not include a segmentation step, it can be disturbed by the presence of other documents in the field of view of the acquisition system used.

PRESENTATION OF THE INVENTION

According to a first aspect, the present invention relates to a method of segmenting an input image representing a document comprising structured information, the method comprising the implementation by processing means of the steps of:

C2) Determination of a document type by a classification method, said type being chosen from a set of possible types each associated with a reference image of the document type;

C3) Estimation of the parameters of a geometric transformation to be applied to the reference image of the type of document determined in step C2) to obtain the document as represented by the input image;

C4) Determination of a segmentation mask from the parameters of the geometric transformation estimated in step C3) and from the reference image of the document type determined in step C2).

According to other advantageous and non-limiting characteristics:

- the method further comprises a step C1) of acquiring an image comprising the document comprising structured information to be segmented, carried out before step C2)

- the documents can be game tickets with check boxes; the classification method can be a deep neural network, and more particularly a convolutional type deep neural network;

- the classification method can be a point of interest type method combined with a partitioning of points of interest; - step C1) of acquiring an image comprising at least one document comprising structured information to be segmented further comprises a step of sub-sampling the image comprising the at least one document comprising structured information to be segmented;

- geometric strains are strains represented by affine functions;

- the estimation of the parameters of the geometric transformation of step C3) is carried out by a deep neural network:

- the deep neural network is selected from a plurality of neural networks each corresponding to a type of document, from the type of document determined in step C2);

the segmentation method comprises a preliminary step of increasing the training data from a single reference image for each type of document to be segmented, the step of increasing the training data comprising, for each type of documents, the implementation by means of processing one and / or the other of the stages of:

E2) Simulation of random geometric deformations;

E3) Simulation of a background to the image by overlaying the image having undergone the deformations in a texture generated randomly or selected from a list of textures, representing the background of the image; E4) Simulation of photometric degradations by modification of the contrast and colorimetry of the image generated in the previous step;

E5) Adding digital noise to the image generated in the previous step, the previous steps being repeated as many times as we wish to have training examples for the type of document; a step of training a neural network implemented in step C2) and / or in step C3) on the basis of the augmented training data.

According to a second aspect, the invention relates to a method of increasing the training data for the training of a segmentation method according to the first aspect, from a single reference image for each type of document to be segmenting, the method comprising, for each type of document, the implementation by processing means of the steps of:

E2) Simulation of random geometric deformations;

E3) Simulation of a background to the image by overlaying the image having undergone the deformations in a texture generated randomly or selected from a list of textures, representing the background of the image;

E4) Simulation of photometric degradations by modification of the contrast and colorimetry of the image generated in the previous step;

E5) Adding digital noise to the image generated in the previous step, the previous steps being repeated as many times as you want to have training examples for the type of document.

According to other advantageous and non-limiting characteristics:

- The method further comprises a step E1) of acquiring a reference image of the type of document produced before step E2).

- the geometrical deformations can be deformations represented by affine functions;

the method further comprises a masking step E3bis), making it possible to mask the areas of the reference image of the document type which may be different from one document of the same type to another by applying a texture randomly generated or selected in a list of textures, representing a variable pattern.

The invention further provides a computer program product comprising code instructions for executing a segmentation method according to the first aspect or a training data augmentation method according to the second aspect, when said program is executed on a computer; and a storage means readable by computer equipment on which a computer program product comprising code instructions for the execution of a segmentation method according to the first aspect or of a method of increasing the training data according to the second aspect, when said program is executed on a computer

PRESENTATION OF FIGURES

Other characteristics and advantages of the present invention will become apparent on reading the following description of a preferred embodiment. This description will be given with reference to the accompanying drawings in which: [Fig. 1] FIG. 1 is a diagram of an architecture for the implementation of the method according to the invention;

[Fig.2] Figure 2 shows the steps of one embodiment of the segmentation method according to the invention;

[Fig. 3] FIG. 3 shows the steps of one embodiment of the data augmentation method according to the invention; [Fig. 4] Figure 4 illustrates examples of images generated by the data augmentation process.

DETAILED DESCRIPTION Architecture

According to two complementary aspects of the invention, are proposed:

- A process for segmenting a document comprising structured information;

- A training data augmentation method for training a segmentation process according to the first aspect from a single reference image for each type of document to be segmented.

These methods are implemented within an architecture such as represented by FIG. 1, thanks to a segmentation server 1 (implementing the segmentation method), a learning server 3 (implementing the method of 'learning), a database 4 (used to store the reference images of the different types of documents) and a client 2 (having images of documents to be segmented), for example a terminal of a user such as a laptop or smartphone. It is quite possible that some or all of this equipment is confused, for example the segmentation server 1 can be grouped together with the client 2, and the learning server can understand the database.

The equipment 1, 2, and 3 are typically remote computer equipment connected to a wide area network 10 such as the Internet network for the exchange of data; each comprises data processing means 11, 21, 31 respectively of processor type, and data storage means 12, 22, 32 such as a computer memory, for example a disk. The client 2 further comprises image acquisition means 23 in order to obtain images of the documents to be segmented. The database 4 can be either a server independent of the own data processing and storage means, or a file server (for example a NAS server), or a database management system integrated into the server. learning. This database stores reference images for each type of document to be segmented.

By reference image, we mean an image of good quality (without shooting defect) comprising only the document to scale (therefore not distorted), such an image can be for example a source file generated by a drawing software. computer assisted. In particular, such a reference image does not include any interpreted information relating to the document. The reference image can in particular be an image representative of the document free of any filling of the areas to be filled by a user of the document.

By form of a document represented in an image, we mean in a broad sense the position, orientation and outline of that document in that image. Thus, for a given contour, the shape includes the position of the center of gravity of this contour in the image of the document, and the orientation in the plane of this contour with respect to an orthonormal coordinate system of the image. Thus, the same document can be imaged in various places with respect to the sensor, in different orientations (more or less rotated with respect to the orthonormal reference mark of the sensor, or even at 180 °), and / or deformed in the case of a flexible document. .

Principle

The idea of the proposed methods is to allow the use and training of segmentation methods requiring large amounts of data, such as neural networks, by being satisfied with one example per class (or type of document) when training and by applying an image augmentation chain sufficiently representative of the images of real documents and thus making it possible to have a sufficient volume of images for learning the segmentation method.

For this, the segmentation process is divided into a classification sub-process followed by an estimation of the parameters of a geometric transformation which, when applied to the reference image of the document type determined by the classification, matches the shape of the reference image to the shape of the document in the image to be segmented. It thus makes it possible to segment documents from reference documents whose shape and contours are known in advance without having to train a segmentation method on examples where each pixel of the image has been annotated manually. In addition, in some embodiments, the separation of the steps of determining a segmentation and classification mask simplifies the addition of a new type of document to the model because it is then not necessary to re-train the entire model , only the portions that have changed. The term “model” is understood to mean the parts of the neural network (s) which are involved in taking into account a new type of document.

The use of methods based on neural networks also makes it possible to reduce the necessary calculation times compared to the state of the art, in fact unlike point-of-interest methods which do not necessarily require training such as neural networks, but have a computation time depending on the complexity of the input data, neural networks on the other hand make it possible to have a constant computation time whatever the input image during classification. In addition, surprisingly, the decomposition, in some embodiments, of the classification and the determination of the segmentation mask into two different tasks also makes it possible to reduce the computation time compared to a method performing the classification and the determination. of the segmentation mask at the same time. Indeed, the method of determining the segmentation mask being specific to each type of document, and therefore less complex, and the classification being a simpler operation than the segmentation, the computational load induced by the use of these two methods is lower than that of a process performing both operations at the same time.

Classification and segmentation phase

With reference to FIG. 2, the segmentation method comprises a first step, C1, of acquiring an image to be processed comprising the document to be classified and segmented. This image can be acquired either by shooting means 23, integrated into the client 2, or be acquired via a data exchange network 10 such as the Internet, this step can also include a sub-sampling of the image acquired in order to simplify its processing. If the document to be classified includes areas to be completed, one or more of these areas to be completed may have been completed by a user of the document before the acquisition step. In particular when it is acquired by dedicated shooting means 23 provided for this purpose, the image to be processed comprises the document to be classified and segmented, but also a region of space around the document to be classified. On the one hand, the form of the document to be classified in the image is unknown. On the other hand, the region of the space around the document to be classified can be very variable depending on the images. It can include a support for the document to be classified. If the document to be imaged is placed on one or more other documents during acquisition, it may include parts of this or these other documents if the stack of documents is not perfectly superimposed. Finally, in certain cases of acquisition of an unsupported document, the region of space around the document may include a remote background depending on the environment of the document during the acquisition. The image may in particular contain repeating patterns, as is the case for example for game slips, which include several boxes of identical shape.

The image to be processed is then transmitted to a classifier in order to determine the type of the document during a step C2, this classification can then be carried out by a standard classification algorithm, for example a convolutional neural network (CNN) such as VGG16, DenseNet, ResNet etc. or a point of interest matching method such as SIFT or SURF (“Speeded Up Robust Features ^” ), as will be detailed below.

According to a first example, the classification is carried out by a convolutional neural network. In this case, a neural network was trained to determine the document type from an input image. Thus, the acquired image is supplied as input to the neural network, and the neural network outputs the class of the document appearing on the image. The class of the document can for example be chosen from the following list: {passport; ID card ; driver's license ; membership card ; Bank note ; game bulletin; form ; multiple choice test}. The “membership card” can for example be a card issued to certain people in certain cases (for example, mutual insurance card, loyalty card, etc.). The document class can also include a subclass. For example, in the case of a "game bulletin", the document class can also include a subclass relating to an identifier of the game concerned.

Once the document type has been identified, a regression method is used to estimate the parameters of a geometric transformation (for example an affine transformation) which, when applied to the reference image of the identified document type , allows you to find the shape of the document present in the image to be processed. This regression can be done using a deep neural network. For example, a respective specific neural network is used for each type of document identified during the classification step. In this example, a neural network was trained to determine, from an input image, the parameters of a geometric transformation to apply to the reference document associated with the document type to make it correspond to the input image. Thus, the acquired image is supplied as input to the neural network, as well as the type of document determined in the previous step, and the neural network provides geometric transformation parameters as output. The geometric transformation parameters can for example comprise the parameters of a translation of the center of gravity, of a planar rotation, and of a deformation.

As a variant, one can use, for the regression method, a neural network of a type similar to that used for the classification step (preferably a CNN such as VGG16, DenseNet, ResNet etc.) to which is added a dense connection layer intended to estimate the parameters of the geometric transformation.

It is also possible to combine the classification network and the regression network by reusing the first layers of the classification network for the regression. In this case, a single neural network is used which, from the input image, estimates both the type of document and the geometric parameters of the transformation. According to a second example, the classification is carried out by a method by correspondence of points of interest. In this case, there is, for each type of document, a document model comprising a reference image of this type of document. The document model also includes structured information relating to interpreted information (metadata) relating to the reference document. This interpreted information includes in particular the nature and position of points of interest in the reference document. The document type classification step comprises, for each document type, image processing applied to the acquired image with a view to determining, on the acquired image, points of interest, then a setting step. in correspondence of these points of interest determined on the acquired image with the points of interest determined in the document model for the reference image. Thus, the image is classified as corresponding to a document type for the type of document for which the mapping of the points of interest of the acquired image with those of the document model for that type of document gives the best results. . According to some examples, the point-of-interest correspondence method includes a partitioning of the points of interest into different regions each treated independently.

In this second example, a neural network is then applied to determine the parameters of the geometric transformation, as described above in the context of the first example.

In a third example, a classification step is implemented as described above in the first example, by means of a neural network.

The regression can then be made from the corners of the document to be segmented if it has a polygonal shape or from marking points added to all documents of the same type. It suffices then to find the parameters of a geometric transformation which makes it possible to correspond to a point or side of the document in the reference image of the document type, the point or equivalent side in the document to be segmented.

Finally, in all the exemplary embodiments above, a segmentation mask is created by applying the geometric transformation calculated in the previous step to the outline of the document of the reference image in order to obtain the outline of the present document. in the image to be processed. This segmentation mask thus makes it possible to determine the shape (position, orientation, outline) of the document in the acquired image. This makes it possible in particular to overcome the edges of documents placed under the document to be analyzed, visible in the acquired image.

This form is then used to process the acquired image in order to extract information from the structured document. The processing in question can use parameters of the geometric strain determined as described above. Training phase

Referring to Figure 3, a method of increasing training data is proposed. This method is implemented by the processing means 31 of the training server, in order to generate a number of training images sufficient to allow training of an efficient segmentation model. For this, a single good quality reference image can be used for each type of document to be identified, acquired during a step E1 either by shooting means 23, or as a source file that is the original document type reference image.

Random geometric deformations are then simulated in step E2, by applying a geometric transformation (for example an affine transformation), the parameters of which have been determined randomly, to the reference image, thus generating images that have undergone translations, symmetries, homothety, etc.

Then, in order to simulate the presence of a background in the image, textures representing background images are created by generating an image with random patterns. The images of documents having undergone deformations generated in the previous step are then overlaid on the background images during a step E3. With regard to the background images, provision can in particular be made to generate images comprising parts simulating portions of documents placed partially under the document to be analyzed, and protruding from below it. It is also possible to add a mask on certain parts of the image of the document if it includes one or more regions that may be different from one document of the same type to another and thus improve the generalization capacity of the trained model. For this, a texture can be generated in the same way as the background images and applied to the desired region of the image, the masking texture preferably being different from the background texture. In the case, for example, of a type of document comprising areas to be filled, in particular check boxes, it is thus possible to generate a texture presenting the pattern of a filled area, which can be used as a mask on one or more parts of the document image at the location of the fields to be filled. For example, it is possible to use textures showing a zone to be filled, filled in manually or by computer by a character or a series of characters. In the case of a checkbox, you can use textures showing a checked box.

Then, in order to simulate shots taken in bad conditions (for example bad lighting, or bad focusing), photometric degradations are applied, during a step E4, to the images generated in the previous step. These degradations can be a blurring effect, a change in the brightness of the image or contrast, or alterations in the colors of the image.

Finally, digital noise (for example Gaussian noise) is applied to the images generated in a step E5. Examples of documents produced by the data augmentation process are shown in Figure 4.

This figure illustrates in particular cases where the documents have a substantially planar shape, and have a single face carrying the structured information. The documents have an outline, which can be polygonal, or include curved areas, as in the example shown of a heart shape. The document may include a peripheral cartridge of a certain width and uniform color. As seen in Figure 4, the image may only include part of the document. This is particularly the case if a geometric transform has been applied to the reference document which moves part of the document out of frame.

All of these steps are repeated for each type of document, and as many times as one wishes to obtain an example by type of document. This solves the problem of building a database for training segmentation models.

The fact that the document to be analyzed is a structured document implies the development of neural networks which are specific to the processing of structured documents. Indeed, the documents to be analyzed essentially comprise generic zones for the type of document in question, and personalized zones (for example, a character string specific to the holder of a passport, or a plurality of boxes checked and not checked. of a game report), so that the neural network will be formed in a particular way during the learning process. To give a concrete example, on a game slip with fifty check boxes, all identical, six of which are to be checked as part of the game, the check boxes in the reference document form easily recognizable regular patterns. However, a document to be analyzed will have some of these boxes checked, precisely in a different way between the ballots, or even checked in a different way between them, which makes it difficult to rely on the recognition of this regular pattern for classification.

Enrichment phase with a new type of document

The system which has just been described can be easily enriched to process a new type of document. Thus, if a new type of document is to be recognized, in particular in the event of a new game being edited, a reference image of the game slip of the new game is used. Then, a set of images is generated from the reference image, as described above. The neural network determining the document type is then re-trained with these new training images. Then, a neural network for estimating the deformation parameters is created and trained with these training images. Computer program product

According to additional aspects, the invention relates to a computer program product comprising code instructions for execution (in particular on the data processing means 11, 21 and 31 of the servers 1 and / or 3, and / or of the client 2) of a method of segmenting a document comprising information structured according to at least one reference associated with a type of document, the document being present in an input image, as well as of an augmentation method training data for training the segmentation process described from a single reference image for each type of document to be classified and segmented. Similarly, the invention also relates to storage means readable by computer equipment (a memory 12, 22 and 32 of the servers 1 and / or 3, and / or of the client 2) on which this computer program product is found. .

Claims

1. A method of segmenting an input image representing a document comprising structured information, the method comprising the implementation by processing means (11) of the steps of:

C2) Determination of a document type by a classification method, said type being chosen from a set of possible types each associated with a reference image of the document type, the reference image being an image comprising only the document;

2. The segmentation method according to claim 1, characterized in that it further comprises a step C1) of acquiring an image comprising the document comprising structured information to be segmented, carried out before step C2).

3. A segmentation method according to one of claims 1 and 2, characterized in that the documents are game tickets with check boxes.

4. The segmentation method according to one of claims 1 to 3, characterized in that the classification method is a deep neural network.

5. The segmentation method according to claim 4, characterized in that the deep neural network is a convolutional type deep neural network.

6. The segmentation method according to one of claims 1 to 3, characterized in that the classification method is a point of interest type method combined with a partitioning of points of interest.

7. The segmentation method according to claim 2, characterized in that step C1) of acquiring an image comprising at least one document comprising structured information to be segmented further comprises a step of sub- sampling of the image comprising the at least one document comprising structured information to be segmented.

8. A method of segmentation according to one of claims 1 to 7 characterized in that the geometric deformations are deformations represented by affine functions.

9. A segmentation method according to one of claims 1 to 8, characterized in that the estimation of the parameters of the geometric transformation of step C3) is carried out by a deep neural network.

10. The segmentation method according to claim 9, characterized in that the deep neural network is selected from a plurality of neural networks each corresponding to a type of document, from the type of document determined in step C2).

11. The segmentation method according to one of claims 1 to 10, comprising: a prior step of increasing the training data from a single reference image for each type of document to be segmented, the step of increase in training data comprising, for each type of document, the implementation by processing means (31) of one and / or the other of the steps of:

E2) Simulation of random geometric deformations;

12. A method of increasing training data for training a segmentation method according to one of claims 1 to 11 from a single reference image for each type of document to be segmented, the method comprising, for each type of document, the implementation by processing means (31) of the steps of:

E2) Simulation of random geometric deformations; E3) Simulation of a background to the image by overlaying the image having undergone the deformations in a texture generated randomly or selected from a list of textures, representing the background of the image;

E4) Simulation of photometric degradations by modification of the contrast and colorimetry of the image generated in the previous step; E5) Adding digital noise to the image generated in the previous step, the previous steps being repeated as many times as you want to have training examples for the type of document.

13. Method for increasing training data according to claim 12, characterized in that it further comprises a step E1) of acquiring a reference image of the type of document carried out before step E2).

14. Method for increasing training data according to one of claims 12 and 13, characterized in that the geometric deformations are deformations represented by affine functions.

15. Method for increasing the training data according to one of claims 12 to 14, characterized in that it further comprises a masking step E3bis), making it possible to mask the areas of the reference image of the type. document that can be different from one document of the same type to another by applying a texture randomly generated or selected from a list of textures, representing a variable pattern.

16. Computer program product comprising code instructions for the execution of a segmentation method according to one of claims 1 to 11 or of a method of increasing training data according to one of claims. 12 to 15, when said program is executed on a computer.

17. Storage means readable by computer equipment on which is recorded a computer program product comprising code instructions for the execution of a segmentation method according to one of claims 1 to 11 or of a A method of increasing training data according to one of claims 12 to 15, when said program is executed on a computer.