CN114241493B

CN114241493B - Training method and training device for training data of augmented document analysis model

Info

Publication number: CN114241493B
Application number: CN202111567461.3A
Authority: CN
Inventors: 陈昌盛; 朱罡; 张书政
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2023-04-07
Anticipated expiration: 2041-12-20
Also published as: CN114241493A

Abstract

The present disclosure describes a training method of training data of an augmented document analysis model, comprising obtaining a first sample and a second sample for validation; inputting an original document image into a data generation model based on a generation countermeasure network to obtain corresponding pseudo document image data; and obtaining a target sample based on the first sample and a third sample including pseudo document image data, training a document analysis model by using the target sample, and verifying the trained document analysis model by using the second sample, wherein the training of the first model of the data generation model comprises: carrying out image alignment and image segmentation on the first sample and the corresponding original document image to obtain a plurality of paired image blocks serving as a first training set; the first model is trained with a first training set to match image patches of a pseudo-legal image generated by the first model to legal image patches. Therefore, the training data can be conveniently amplified, and the document analysis model can have high generalization performance.

Description

Training method and training device for training data of amplification document analysis model

Technical Field

The present disclosure generally relates to the field of document processing, and in particular, to a training method and a training apparatus for amplifying training data of a document analysis model.

Background

In recent years, machine learning (e.g., deep learning) methods have proven to be of great effectiveness in many fields, as well as great potential for development. For example, in the field of document image analysis and identification, supervised learning is often used to obtain evidence of a document image, or to determine whether a digital document image is copied or whether a tampered region of the digital document image is detected or determined, thereby improving the security of the document image.

Taking training a copy detection network for coping with a copy attack as an example, a database for training is generally constructed by using a way of manually acquiring training data, when constructing the database, firstly, an original document is acquired, and a legal document is obtained through one-time printing and scanning, and secondly, the legal document of an electronic file (i.e. the tampered original document) is printed and then the copied document is acquired through acquisition equipment (such as a printer or a mobile phone). In other studies, some automated or semi-automated methods were used to obtain training data. For example, an open source software DocCreator is used to generate legal or copied documents with different degradation models (e.g., ink degradation, font ghosting, paper breakage, adaptive blurring, paper deformation, or non-linear lighting models). For another example, a noise block formed by real degradation is pasted in a gradient domain of an original document, the author of the method firstly converts the original document into the gradient domain, then reconstructs the original document from the gradient domain, and for different types of degradation, the original document is manually extracted and calculated to reduce the boundary effect and then is separately stored, so that the method can more accurately select which degradation is closer to a real legal document or a copied document when being used in a corresponding position.

However, for the method of manually acquiring training data, the acquisition device needs to wait for acquiring a legal document or a copied document, and the consumed manpower and time resources are high. In addition, several degradation models provided by open source software are not common in common application scenarios, and the types of generated legal documents or copied documents are also limited. When the method of pasting noise blocks formed by real degradation is used, in order to make the original document degradation more real, the position selection needs to be very careful, and the operation needs a professional to spend a great deal of time. Therefore, in the process of application in the field of document image security analysis, the problems of insufficient training data and excessive difficulty in generating the training data still need to be solved.

Disclosure of Invention

The present disclosure has been made in view of the above circumstances, and an object thereof is to provide a training method and a training device for training data of an augmented document analysis model, which can easily augment training data and can make a document analysis model have high generalization performance.

To this end, a first aspect of the present disclosure provides a training method of augmenting training data of a document analysis model that is a machine learning-based model that analyzes a document image obtained by an acquisition device based on an original document image, the training method including: obtaining a first sample and a second sample for verification, wherein the first sample and the second sample respectively comprise the document image, and the document image comprises a legal image; inputting the original document image into a data generation model based on a generated countermeasure network to obtain pseudo document image data corresponding to the original document image, wherein the pseudo document image data comprises image blocks of a pseudo legal image, and the data generation model is a mode of training by using the first sample to simulate the acquisition equipment to generate the document image; and obtaining target samples based on the first sample and a third sample comprising the pseudo document image data, training the document analysis model using the target samples, and verifying the trained document analysis model using the second sample to analyze performance of the trained document analysis model, wherein the training of the data generation model comprises training based on a first model that generates an anti-network and is used to generate patches of the pseudo-legal image, the training of the first model comprising: after image alignment is carried out on legal images in the first sample, image segmentation is carried out on the legal images in the first sample and original document images corresponding to the first sample so as to obtain a plurality of paired image blocks as a first training set, wherein the paired image blocks comprise original image blocks from the original document images and legal image blocks corresponding to the original image blocks and from the legal images in the first sample; and training the first model by using the first training set so as to enable the image blocks of the pseudo-legal image generated by the first model to be matched with the legal image blocks.

In the present disclosure, when a document analysis model is trained, a data generation model is trained by an initial training set so that the data generation model can generate a large amount of pseudo document image data suitable for training the document analysis model, and the document analysis model is trained by using the initial training set and the pseudo document image data together. Therefore, the training data can be conveniently amplified, and the document analysis model can have high generalization performance. In addition, the first model in the data generation model is based on the generation of the confrontation network and is trained by using the paired image blocks, so that the data generation model can generate a document image which is closer to the artificially generated document image under the real condition, the performance of the first model can be effectively improved, and the quality and the quantity of training data of the document analysis model can be further improved, thereby the generalization performance of the document analysis model can be improved.

In addition, in the training method according to the first aspect of the present disclosure, optionally, the first model includes a first generative countermeasure network and a second generative countermeasure network coupled to each other, and when the first model is trained: the generating network of the first generating countermeasure network converts an original image block of a pair of image blocks of the plurality of paired image blocks into a first pseudo-legal image block, and the generating network of the second generating countermeasure network converts the first pseudo-legal image block into a first pseudo-original image block; the generation network of the second generation countermeasure network converts a legal image block of the pair of image blocks into a second pseudo-original image block, and the generation network of the first generation countermeasure network converts the second pseudo-original image block into a second pseudo-legal image block; the first discrimination network of the generation countermeasure network acquires a first discrimination result by judging the similarity between a legal image block in the pair of image blocks and the second pseudo-legal image block, and the second discrimination network of the generation countermeasure network acquires a second discrimination result by judging the similarity between an original image block in the pair of image blocks and the first pseudo-original image block; and constructing a first loss function of the first model based on the first and second discrimination results and optimizing the first model using the first loss function. In this case, the first pseudo-legal image block generated by the generating network of the first generating antagonistic network can be matched to the legal image block by continuously updating the network parameters of the first model by the loss function.

Further, in the training method according to the first aspect of the present disclosure, optionally, the acquisition device is used to print, photograph, or scan a document. Thereby, the document image can be acquired by the acquisition device.

In addition, in the training method according to the first aspect of the present disclosure, optionally, the document images further include copied images, the dummy document image data further includes dummy copied images, and the training of the data generation model further includes training of a second model for generating the dummy copied images, where the second model includes a recovery network for returning the copied images to the original document images and a copy network based on a generation countermeasure network and used for simulating distortion of the original document images to generate the dummy copied images. In this case, after the training of the second model is completed, the pseudo-snap image can be quickly generated using the snap network of the second model.

Further, in the training method according to the first aspect of the present disclosure, optionally, the training of the second model includes: after the images of the copied images in the first sample are aligned, taking the copied images in the first sample and the original document images corresponding to the first sample as a second training set; performing joint training on the recovery network and the copying network by using the second training set, wherein in the joint training, a generation network of the copying network takes original document images in the second training set as input and outputs the pseudo-copied images, and the recovery network takes the pseudo-copied images as input and outputs pseudo-recovered images; the judging network of the copying network obtains a third judging result by judging the similarity between the pseudo-copied image and the copied image in the first sample; and constructing a second loss function of the second model based on the third discrimination result and optimizing the second model using the second loss function. In this case, the network parameters of the second model are constantly updated by the loss function, and the pseudo-copied image generated by the generation network of the copy network can be matched with the copied image.

In addition, in the training method according to the first aspect of the present disclosure, optionally, the restoration network is pre-trained based on supervised learning by using an existing training set before the joint training, and the restoration network is based on a U-Net network. This can improve training efficiency.

In addition, in the training method according to the first aspect of the present disclosure, optionally, the original document image is generated by software, the legal image is an image acquired by the acquisition device by acquiring the original document image, and the copied image is an image acquired by printing the legal image onto a physical object carrier to obtain a printed image and then acquiring the printed image by the acquisition device. Thus, the original document image can be easily acquired. In addition, a legal image and a reproduced image can be acquired.

In addition, in the training method according to the first aspect of the present disclosure, optionally, the first sample is obtained by the acquisition device based on the original document image under a known acquisition condition, and the second sample is obtained by the acquisition device based on the original document image under an unknown acquisition condition. In this case, the process of acquiring the second sample is more suitable for the actual scene, and the performance of the document analysis model can be effectively analyzed subsequently when the document analysis model is verified.

In addition, in the training method according to the first aspect of the present disclosure, optionally, the number of the third samples is greater than the number of the first samples, and if the performance of the document analysis model does not satisfy a preset condition, the first samples from different sources are added to perform fine adjustment on the data generation model. In this case, the document analysis model can be retrained based on the pseudo document image data generated by the trimmed data generation model until a preset condition is satisfied. Therefore, the document analysis model can be conveniently trained, and the generalization performance of the document analysis model can be improved.

A second aspect of the present disclosure provides a training apparatus for augmenting training data of a document analysis model, comprising a memory for non-transitory storage of computer-readable instructions; and a processor for executing the computer-readable instructions, when executed by the processor, performing the training method according to the first aspect of the present disclosure.

According to the present disclosure, there are provided a training method and a training device for training data of an augmented document analysis model, which can easily augment training data and can make a document analysis model have a high generalization performance.

Drawings

The disclosure will now be explained in further detail by way of example only with reference to the accompanying drawings, in which:

FIG. 1 is a schematic scenario illustrating a training method of augmenting training data of a document analysis model to which examples of the present disclosure relate.

Fig. 2 is an exemplary block diagram illustrating a data generation model for acquiring pseudo document image data according to an example of the present disclosure.

Fig. 3 is a flow chart illustrating a training method of a first model according to an example of the present disclosure.

Fig. 4 is a schematic diagram illustrating an original document image according to an example of the present disclosure.

Fig. 5 is a flow chart illustrating a method of training a second model in accordance with an example of the present disclosure.

FIG. 6 is a flow chart illustrating a method of training data for augmented document analysis models in accordance with examples of the present disclosure.

Fig. 7 (a) is a schematic diagram showing an image block of a generated pseudo-legal image of a first model according to an example of the present disclosure.

Fig. 7 (b) is a schematic diagram showing an image block of a generated pseudo-snap image of the second model according to an example of the present disclosure.

Detailed Description

Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the following description, the same components are denoted by the same reference numerals, and redundant description thereof is omitted. The drawings are schematic, and the proportions of the dimensions of the components and the shapes of the components may be different from the actual ones. It is noted that the terms "comprises," "comprising," and "having," and any variations thereof, in this disclosure, for example, a process, method, system, article, or apparatus that comprises or has a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include or have other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. All methods described in this disclosure can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. In addition, for convenience of description, the data generated by the data generation model simulation acquisition device is prefixed by a "dummy" word. For example, "pseudo-legal image", "pseudo-copied image", "pseudo-document image data", "pseudo-legal image block", "pseudo-original image block", and the like.

The training method for amplifying training data of a document analysis model according to the present disclosure may be simply referred to as a training method or a data amplification method. The training method provided by the disclosure can conveniently amplify training data (also called training samples) and enable a document analysis model to have high generalization performance. The document analysis model to which the present disclosure relates may be any machine learning based model that analyzes document images (e.g., a flap detection). The training method related to the present disclosure can be applied to any application scenario for analyzing document images. In some examples, the document analysis model may be selected from any of DenseNet121, denseNet169, denseNet201, inclusion V3, inclusion resenetv 2, mobileNet, resenext 101, resenext 50, resenet 101V2, resenet 101, resenet 152V2, resenet 152, resenet 50V2, resenet 50, VGG16, VGG19, or Xception.

In some examples, the document image may be a legal image and/or a copy image. In addition, the legal image may be an image obtained by capturing an original document image by a capturing device. In addition, the copied image can be an image obtained by printing a legal image on a physical carrier to obtain a printed image and then collecting the printed image by the collecting equipment. In some examples, the acquisition devices involved in generating a single snapshot may be different. In this case, the process of obtaining the copied image can be closer to the real copied scene. This can improve the quality of the training data. Additionally, the capture device may be used to print, photograph, or scan a document (e.g., an original document image or a legal image). Thereby, the document image can be acquired by the acquisition device. In some examples, the capture device may include, but is not limited to, one of a cell phone, a digital camera, a video camera, or a scanner. Thereby, document images can be obtained based on different or the same acquisition devices.

In some examples, the training methods to which the present disclosure relates may be applied in a scenario as shown in fig. 1. In a scenario, training data of document analysis model 10 may originate from data generation model 20 and capture device 30, where a first model 21 of data generation model 20 may receive an original document image and output a pseudo-legal image, a second model 22 of data generation model 20 may receive an original document image and output a pseudo-copied image, and capture device 30 may process the original document image to obtain a document image that includes a legal image and/or a copied image. After the document analysis model 10 is trained using the pseudo document images (i.e., the pseudo-legal images and/or the pseudo-copied images) obtained by the source data generation model 20 and the acquisition device 30, and the document images, the trained document analysis model 10 may receive the document images and analyze the document images to obtain analysis results. In some examples, the training data used to train data-generating model 20 may be derived from document images used to train document analysis model 10 and obtained by acquisition device 30. Thus, the pseudo document image generated by the data generation model 20 can be made more suitable for training of the document analysis model 10.

In general, models involving document image analysis often need to be trained using document images as training data. However, due to the influence of privacy policy, it is difficult to acquire a document image itself in an actual application scene, and it is often not convenient or limited to acquire the document image manually, and to acquire the document image automatically or semi-automatically.

The training data of the document analysis model 10 to which the present disclosure relates may be generated at least in part by the data generation model 20. In other examples, the training data of the document analysis model 10 to which the present disclosure relates may also be generated entirely by the data generation model 20.

In some examples, data generation model 20 may be a way to train with samples that include an original document image and a document image corresponding to the original document image to simulate the generation of the document image by acquisition device 30. Thus, a pseudo document image (i.e., an image that approximates a document image but is not completely consistent) can be automatically generated. That is, the data generation model 20 may simulate distortion or degradation of the original document image. In some examples, the data generation model 20 may generate pseudo document image data (i.e., a pseudo document image and/or an image block of a pseudo document image). And in particular to the manner in which the data generation model 20 is trained.

In some examples, the data generation model 20 may be based on generating a countermeasure network. Generating a countermeasure network is a deep learning model. The generation of the countermeasure network includes at least a generation network (generator) and a discrimination network (discriminator). In general, generating a network can generate similar data with training set features under the guidance of a discriminant network by learning the features of the training set. The discrimination network can discriminate whether the input data is real data or false data generated by the generation network and feed back to the generation network. The discriminant network and the generation network are alternately trained until the data generated by the generation network can be falsified.

In some examples, the target samples used to train the document analysis model 10 may include an initial training set (i.e., the first samples described later). In some examples, the training data used to train the data generative model 20 (referred to as the generative training set for short) may intersect with the initial training set of the document analysis model 10. For example, the generated training set may be derived in whole or in part from the initial training set of the document analysis model 10. That is, the data generative model 20 may be trained using training data derived in part from the initial training set such that the data generative model 20 generates a large number of pseudo document images for training by the document analysis model 10. Thus, the pseudo document image generated by the data generation model 20 can be made more suitable for training of the document analysis model 10. In other examples, the generated training set may also be similar to, but different in source from, the initial training set of the document analysis model 10.

Fig. 2 is an exemplary block diagram illustrating a data generation model 20 for acquiring pseudo document image data according to an example of the present disclosure.

As shown in fig. 2, in some examples, the data generation model 20 may include a first model 21 and a second model 22. First model 21 may be used to simulate acquisition device 30 generating a legal image to generate a pseudo-legal image. Second model 22 may be used to simulate capture device 30 generating a snapshot image to generate a pseudo-snapshot image. Thus, a pseudo-legal image and a pseudo-copied image can be automatically generated. Examples of the disclosure are not limited thereto, and in other examples, the data generation model 20 may include only the first model 21 or only the second model 22.

In some examples, the data generation model 20 may be provided in a data generation system having a first channel and a second channel, where the first channel may correspond to a print-capture device and its intermediate media through which the original document image was passed during the manual generation of the legal image. The second channel may correspond to the print-capture devices (e.g., printers and scanners) and their intermediate media that are passed through in the two print-capture processes that the original document image was artificially generated to render an image. Wherein the first model 21 may be disposed on a first channel and the second model 22 may be disposed on a second channel. And when the image signal of the original document image passes through the first channel or the second channel, the image signal is distorted or degraded, so that a pseudo-legal image or a pseudo-copied image is obtained.

In some examples, the first model 21 may be based on generating a countermeasure network. In some examples, the first model 21 may be a generative confrontation network based on dual learning. This reduces the dependency on the annotation data. In some examples, the first model 21 may be a dual learning-based generative confrontation network that is based on dual learning and trained with paired image patches (described later). Thereby, the first model 21 can generate an image block of a pseudo-legal image.

In some examples, the first model 21 of generating a competing network based on dual learning may include a first generating competing network and a second generating competing network coupled to each other (not shown). In addition, the first generative countermeasure network and the second generative countermeasure network may have a generative network and a discriminative network, respectively. In some examples, the first generative countermeasure network may generate a pseudo-legal image based on the original document image, and the second generative countermeasure network may generate a pseudo-original document image based on the pseudo-legal image. In addition, for the first model 21 trained with paired image patches, the first generative countering network may generate image patches of a pseudo-legal image based on the image patches of the original document image, and the second generative countering network may generate image patches of the pseudo-original document image based on the image patches of the pseudo-legal image.

In some examples, the number of legal images used to train the first model 21 may be less than the usual scale of a training set of models. For example, the number of legal images in a common scale may be 20. That is, the first model 21 may be trained with less than 20 legal images. In this case, the training of the first model 21 can be completed with a small amount of data, so that the training of the first model 21 can be easily realized.

The training process of the first model 21 is described below by way of example of generating an initial training set (i.e., a first sample) from which the training set is derived from the document analysis model 10. It should be noted that, without representing a limitation to the present disclosure, the first model 21 may be trained using any data similar to the first sample. Here, the document image in the first sample may include a legal image. Fig. 3 is a flowchart illustrating a training method of the first model 21 according to an example of the present disclosure. Fig. 4 is a schematic diagram illustrating an original document image according to an example of the present disclosure.

In some examples, as shown in fig. 3, the training method of the first model 21 may include constructing a first training set based on the first sample (step S102) and training the first model 21 with the first training set to match image patches of a pseudo-legal image generated by the first model 21 with legal image patches in the first training set (step S104).

In some examples, in step S102, a first training set may be constructed based on the first sample (i.e., the first training set may be obtained based on the first sample). As described above, the first sample may be an initial training set of the document analysis model 10. In some examples, the first sample may include a document image. As described above, the document image may be obtained by capture device 30 based on the original document image. In some examples, the first sample may include an original document image and a document image corresponding to the original document image.

In some examples, the original document image may be generated by software. Thus, the original document image can be easily acquired. As an example, fig. 4 shows an example of an original document image of a campus card drawn by coreldaw specialty software. The disclosed examples are not so limited and in other examples, the original document image may also be sourced from the actual application scene and the document via desensitization processing.

In some examples, the document images may include legal images for training of the first model 21. As described above, the legal image may be an image obtained by capturing an original document image by the capturing device 30. Specifically, the original document image may be printed on the physical object carrier to obtain a printed image, and the printed image may be captured (e.g., photographed or scanned) by the capture device 30 to obtain a legal image.

In some examples, the first sample may be preprocessed to obtain a first training set. In some examples, in the pre-processing, the legitimate images in the first sample may be image-aligned. For example, the legal image can be made to coincide with the size (i.e., resolution size) of the original document image by image alignment. In this case, subsequent training based on the aligned legal images can improve the performance of the first model 21. In other examples, local alignment is also possible. That is, the size of the local region of the legal image can be made consistent with the size of the local region of the original document image by local alignment.

In some examples, the first training set may be a plurality of pairs of image blocks obtained via image segmentation of a legal image in the first sample and an original document image to which the first sample corresponds. In some examples, the preprocessed legal image and the original document image may be image segmented to obtain a plurality of paired image blocks. Examples of the disclosure are not limited thereto, and in other examples, the first training set may be a plurality of images comprising pairs of the preprocessed legal images and the original document images. That is, image segmentation may not be performed. It should be noted that the training of the first model 21 corresponds to prediction. Specifically, if the first training set is a plurality of image blocks paired after image segmentation, after the training of the first model 21 is completed, the input image block of the original document image may be received and the image block corresponding to the corresponding pseudo-legal image may be output. If the first training set is a pair of images including legal images and original document images without image segmentation, after the training of the first model 21 is completed, the input original document images may be received and corresponding pseudo-legal images may be output.

In some examples, in step S102, the image segmentation manner of the legal image and the original document image may be kept consistent. For example, the image blocks may be obtained by dividing the capture blocks in a preset order or by capturing the image blocks in a random set of capture blocks. Thereby, a plurality of paired image blocks can be obtained.

In some examples, the paired image blocks may include an original image block of the source original document image and a legal image block of the source legal image and corresponding to the original image block and derived from the legal image in the first sample. In this case, the subsequent training of the first model 21 based on the paired image blocks can effectively improve the performance of the first model 21, and has low requirements on hardware resources of the device for training. For example, the size of the image block may be 256 × 256.

In some examples, in step S104, the first model 21 may be trained with a first training set to match image patches of a pseudo-legal image generated by the first model 21 with legal image patches in the first training set. That is, the first model 21 may be trained such that the first model 21 learns to migrate the style of legal images to the original document image to convert the original document image into a pseudo-legal image. Thus, the original document image can be quickly converted into a pseudo-legal image.

As described above, in some examples, the first model 21 may include a first generative countermeasure network and a second generative countermeasure network coupled to each other. In some examples, when training the first model 21 with pairs of image patches, each time training may be performed based on one pair of image patches of the plurality of pairs of image patches to obtain a training result (e.g., a first discrimination result and a second discrimination result), and a first loss function may be calculated based on the training result to optimize the first model 21. This can effectively improve the performance of the first model 21.

Specifically, when the first model 21 is trained with a pair of image blocks:

first, the generating network of the first generating countermeasure network can convert the original image block of the pair of image blocks into a first pseudo-legal image block, and the generating network of the second generating countermeasure network can convert the first pseudo-legal image block into a first pseudo-original image block.

Secondly, the generating network of the second generating countermeasure network can convert the legal image blocks of the pair of image blocks into second pseudo original image blocks, and the generating network of the first generating countermeasure network can convert the second pseudo original image blocks into second pseudo legal image blocks.

Then, the first discrimination network of the first generation countermeasure network may obtain a first discrimination result by determining similarity (or difference) between a legal image block and a second pseudo-legal image block in the pair of image blocks, and the second discrimination network of the second generation countermeasure network may obtain a second discrimination result by determining similarity between an original image block and the first pseudo-original image block in the pair of image blocks.

Finally, a first loss function of the first model 21 may be constructed based on the first and second discrimination results and the first model 21 may be optimized using the first loss function. In this case, the first pseudo-legal image blocks generated by the generating network of the first generating antagonistic network (i.e. the image blocks of the pseudo-legal image) can be matched to the legal image blocks by constantly updating the network parameters of the first model 21 by the loss function. That is, the first pseudo-legal image block generated by the generating network of the first generating countermeasure network can be made "as close as possible to the legal image block as the computer sees fit".

In some examples, the performance of the first model 21 may be evaluated using a Complex wavelet structural similarity (CW-SSIM). In some examples, the image similarity of the pseudo-legal image generated by the first model 21 to the legal image may be as high as 0.99 (the closer to 1 the more images between the representative images), which may replace the legal image generated by the acquisition device 30. The image blocks of the pseudo-legal image have similar effects, which are not described herein again.

In some examples, the first model 21 generates a pseudo-legal image or image patches of a pseudo-legal image may be used for training of the document analysis model 10.

As described above, in some examples, the data generation model 20 may include the second model 22. In some examples, the second model 22 may be based on generating a competing network. As described above, second model 22 may be used to simulate capture device 30 generating a snapshot image to generate a pseudo-snapshot image (i.e., second model 22 may be used to generate a pseudo-snapshot image). Thus, the document analysis model 10 can analyze the captured image conveniently.

In some examples, the second model 22 may include a recovery network and a replication network (not shown). In addition, a recovery network may be used to return the copied image back to the original document image. In some examples, the recovery network may be based on a U-Net network. In some examples, the recovery network may be a U-Net network with a normalized-activation layer (EvoNorm-S0). Therefore, higher performance of the conversion task of the copied image to the original document image can be obtained. Additionally, the copy network may be based on a generation countermeasure network and used to simulate distortion of the original document image to generate a pseudo-copy image. In this case, after the training of the second model 22 is completed, the pseudo-captured image can be quickly generated using the capturing network of the second model 22. In some examples, replicating the network may include generating a network and discriminating a network.

The training process of the second model 22 is described below by way of example in which the generation of the training set is derived from the initial training set (i.e., the first sample) of the document analysis model 10. It should be noted that, without representing a limitation to the present disclosure, the second model 22 may be trained using any data similar to the first sample. Here, the document image in the first sample may include a copied image. Fig. 5 is a flowchart illustrating a training method of the second model 22 according to an example of the present disclosure.

In some examples, as shown in fig. 5, the training method of the second model 22 may include constructing a second training set based on the first sample (step S202), jointly training the recovery network and the copy network using the second training set (step S204), and constructing a second loss function of the second model 22 based on the third discrimination result and optimizing the second model 22 using the second loss function by determining the similarity between the pseudo-copied image and the copied image in the first sample (step S206).

In some examples, in step S202, a second training set may be constructed based on the first sample (i.e., the second training set may be obtained based on the first sample). As described above, the first sample may be an initial training set of the document analysis model 10. As described above, in some examples, the first sample may include a document image. In some examples, for training of the second model 22, the document images may include a snap image. As described above, the copied image may be a printed image obtained by printing a legal image on the physical object carrier, and then the capture device 30 captures the printed image.

In some examples, the first sample may be preprocessed to obtain a second training set. In some examples, in the pre-processing, the images of the flips in the first sample may be image-aligned. See the relevant description of the image alignment in the first model 21 for details. In some examples, the second training set may be an original document image that includes the copied image after pre-processing (e.g., image alignment) and the first sample correspondence.

In some examples, in step S204, the recovery network and the snap network may be jointly trained using the second training set. In the joint training, the pseudo-snap images generated by the snap-in network can gradually approach the snap-in images. That is, the second model 22 may be trained such that the second model 22 learns to migrate the style of the copied image to the original document image. In this case, the trained second model 22 is able to convert the original document image into a pseudo-reproduced image. Thus, the original document image can be quickly converted into a pseudo-reproduction image.

In some examples, in the joint training, the generation network of the snap network may take as input the original document images in the second training set and output the pseudo snap images, and the restoration network may take as input the pseudo snap images and output the pseudo restoration images. In this case, the recovery network and the reproduction network are constrained by each other through a cyclic training process, and the pseudo reproduction image generated by the reproduction network can be matched with the reproduction image. That is, the rendering network is capable of simulating the distortion of the original document image to generate a pseudo-rendered image.

In some examples, in step S206, the discrimination network of the reproduction network may acquire a third discrimination result by determining similarity between the pseudo reproduction image and the reproduction image in the first sample. That is, the discrimination network in the reproduction network may discriminate the false reproduction image obtained in step S204.

In some examples, in step S208, a second loss function of the second model 22 may be constructed based on the third discrimination result and the second model 22 may be optimized using the second loss function. In this case, the network parameters of the second model 22 are constantly updated by the loss function, and the pseudo-copied image generated by the generation network of the copied network can be matched with the copied image. That is, the pseudo-copied image generated by the generation network of the copy network and the copied image can be "as close as possible to each other as viewed by the computer".

In some examples, prior to joint training, the recovery network may be pre-trained based on supervised learning using an existing training set. This can improve training efficiency. Additionally, the existing training set may be a data set that is public and similar or analogous to the first sample.

In some examples, document analysis model 10 may be trained after training data generation model 20 with a first sample. Hereinafter, a training method of training data of the augmented document analysis model 10 according to the present disclosure will be described in detail with reference to the drawings. FIG. 6 is a flow chart illustrating a method of training the training data of augmented document analysis model 10 in accordance with an example of the present disclosure. Fig. 7 (a) is a schematic diagram showing an image block of a generated pseudo-legal image of the first model 21 according to an example of the present disclosure. Fig. 7 (b) is a schematic diagram showing image blocks of a pseudo-photographic image generated by the second model 22 according to the example of the present disclosure.

In some examples, as shown in fig. 6, a method for training a document analysis model 10 according to the present disclosure may include obtaining a first sample and a second sample for verification (step S302), obtaining a third sample using the data generation model 20 (step S304), obtaining a target sample based on the first sample and the third sample (step S306), training the document analysis model 10 using the target sample, and verifying the trained document analysis model 10 using the second sample to analyze performance of the trained document analysis model 10 (step S308).

In some examples, in step S302, the first and second samples may be obtained based on the acquisition device 30. In some examples, the first and second samples may each include a document image. For a detailed description, reference is made to the description of the first sample in the training of the first model 21 and the second model 22. It should be noted that in other examples, the second sample may not be necessary.

As described above, the document image may include a legal image and/or a copied image. Thus, the first and second samples may include legal images and/or copied images according to the analysis objects (e.g., legal images and/or copied images) of the document analysis model 10. Specifically, if the document analysis model 10 is analyzing a legal image, the first sample and the second sample may respectively include a legal image, and if the document analysis model 10 is analyzing a copied image, the first sample and the second sample may respectively include a copied image. Similarly, the subsequent data generation model 20 may generate pseudo document image data (e.g., a pseudo-legal image, a pseudo-copied image, an image patch corresponding to a pseudo-legal image, and/or an image patch corresponding to a pseudo-copied image) from an analysis object of the document analysis model 10. And will not be described in detail herein.

In some examples, the first sample may be obtained by capture device 30 based on the original document image under known capture conditions, and the second sample may be obtained by capture device 30 based on the original document image under unknown capture conditions. The collection condition may be, for example, information of the collection device 30, a paper medium, a collection method, or the like. In general, in an actual scene, a user does not necessarily know how a document image is obtained. In this case, the process of acquiring the second sample is more suitable for the actual scene, and the performance of the document analysis model 10 can be effectively analyzed subsequently when the document analysis model 10 is verified.

In some examples, in step S304, a third sample may be obtained using the data generating model 20. In some examples, the third sample may be obtained by the data generation model 20 based on the first sample. As an example, fig. 7 (a) and 7 (b) show an image block P11 of a pseudo-legal image generated by the first model 21 of the data generation model 20 and an image block P21 of a pseudo-copied image generated by the second model 22, respectively. In addition, for comparison, fig. 7 (a) also shows an image block P10 of the original document image corresponding to the image block P11, and an image block P12 of the corresponding legal image. Likewise, fig. 7 (b) also shows an image block P20 of the original document image corresponding to the image block P21, and an image block P22 of the corresponding copied image.

In some examples, data generating model 20 may be a way of training with a first sample to simulate the generation of a document image by acquisition device 30. In some examples, the training of the data generation model 20 may include training of the first model 21 and/or training of the second model 22. Specifically, if the document analysis model 10 is analyzing legitimate images, the document images in the first sample may include legitimate images and generate the first model 21 in the model 20 using the first sample training data, and if the document analysis model 10 is analyzing flipped images, the document images in the first sample may include flipped images and generate the second model 22 in the model 20 using the first sample training data. In some examples, the document analysis model 10 may also analyze both legitimate images and copied images. For example, the document analysis model 10 may be a multitasking machine learning based model. Correspondingly, the first model 21 and the second model 22 in the model 20 may be generated using the first sample training data. For details, reference is made to the above-mentioned training of the first model 21 and the training of the second model 22.

In some examples, the third sample may include pseudo document image data. In some examples, an original document image may be input to the data generation model 20 to obtain pseudo document image data corresponding to the original document image. Thereby, a large amount of pseudo document image data can be acquired based on the data generation model 20. In addition, the pseudo document image data may include a pseudo document image and/or an image block corresponding to the pseudo document image according to the training mode of the data generation model 20. The pseudo document image data may be pseudo-legal images and/or pseudo-reproduced images according to the training objectives of the data generation model 20. That is, the pseudo document image data may include a pseudo-legal image, a pseudo-copied image, an image block corresponding to the pseudo-legal image, and/or an image block corresponding to the pseudo-copied image.

In some examples, if the document analysis model 10 is analyzing a legal image, a pseudo-legal image or an image block of a pseudo-legal image may be generated using a generation network of the first generation countermeasure network in the first model 21, and if the document analysis model 10 is analyzing a flipped image, a pseudo-copied image or an image block of a pseudo-copied image may be generated using a generation network of the flipped network in the second model 22.

In some examples, in step S306, a target sample may be obtained based on the first sample and the third sample. In some examples, the target samples may include a first sample and a third sample. In some examples, if the first sample and the third sample do not match the training of the document analysis model 10, the target sample may include data obtained after processing the first sample and the third sample to match the training of the document analysis model 10. For example, if the document analysis model 10 is trained based on image blocks, and the first sample and the third sample are not image blocks, then image segmentation may be performed on the first sample and the third sample to obtain image blocks, and if the document analysis model 10 is trained based on image blocks, and the first sample is not image blocks, and the third sample is image blocks, then image segmentation may be performed on only the first sample. In some examples, the target samples may further include annotation data corresponding to the first sample and annotation data corresponding to the third sample according to a training goal of the document analysis model 10.

In some examples, the number of third samples may be greater than the number of first samples. In this case, the third sample is easy to acquire and can improve the generalization performance of the document analysis model 10. Thus, the document analysis model 10 can be conveniently trained.

In some examples, in step S308, the document analysis model 10 may be trained using the target sample, and the trained document analysis model 10 may be validated using the second sample to analyze the performance of the trained document analysis model 10. As described above, in some examples, the second sample may be obtained by acquisition device 30 based on the original document image under unknown acquisition conditions. In this case, the process of acquiring the second sample is more suitable for the actual scene, and the performance of the document analysis model 10 can be effectively analyzed.

In some examples, the performance indicators of the document analysis model 10 may include a Receiver Operating Characteristic Curve (ROC Curve), a confidence level (Area Under ROC Curve), and an average Error Rate (EER).

In addition, in order to verify the effectiveness of the training method of the document analysis model 10 related to the present disclosure, comparison is made with a scheme (may be simply referred to as a comparison scheme) of the document analysis model 10 trained based on only the first sample.

In addition, the experimental environment is set up such that the present disclosure trains the data generating model 20 through the first sample and generates a pseudo-copied image using the trained data generating model 20, and trains the document analysis model 10 together with the first sample. The comparison scheme is based only on the first sample-trained document analysis model 10. After the training is completed, the document analysis model 10 is subjected to performance verification by using a second sample, wherein the first sample is a copied image obtained by the acquisition device 30 based on the original document image under a known acquisition condition, the second sample is a copied image obtained by the acquisition device 30 based on the original document image under an unknown acquisition condition, and the document analysis model 10 is a common copy detection network. In the experimental environment described above, the performance versus ratio of the present disclosure and the comparative scheme is obtained in table 1.

Table 1 comparison of performance of the present disclosure and comparative schemes

As can be seen from table 1, when the copy detection network is trained with a batch of original training data (i.e., the first sample), the cross-library generalization performance of the copy detection network is not very good. However, after a batch of pseudo-reproduction images automatically generated by the data generation model 20 provided by the present disclosure are added to the original training data, the reproduction detection network has an obvious improvement in cross-library performance. The cross-library performance of the copying detection network based on the CNN-ResNet50 or the copying detection network based on the CNN-FS-ResNet50 is obviously improved. Therefore, the data generation model 20 can automatically generate a large amount of effective training data, and the generalization performance of the model can be improved well under the auxiliary training of the data.

In some examples, if the performance of the document analysis model 10 does not satisfy the preset condition, a first sample from a different source may be added to fine-tune the data generation model 20. For example, the types of acquisition devices 30 (e.g., printers and scanners) may be increased to add the first sample. In this case, the document analysis model 10 can be retrained based on the pseudo document image data generated by the trimmed data generation model 20 until a preset condition is satisfied. This makes it possible to train the document analysis model 10 easily and improve the generalization performance of the document analysis model 10.

The present disclosure also provides a training apparatus for augmenting training data of a document analysis model 10, which may include a memory and a processor. The memory is for non-transitory storage of computer readable instructions. The processor is used for executing computer readable instructions, and the computer readable instructions are executed by the processor to execute the training method provided by any example of the disclosure. It should be noted that, for the detailed description of the process of performing model training by the training apparatus, reference may be made to the relevant description in the example of the training method, and details are not repeated here.

In the training method and the training apparatus according to the present disclosure, when the document analysis model 10 is trained, the initial training set is used to train the data generation model 20 so that the data generation model 20 can generate a large amount of pseudo document image data suitable for the training of the document analysis model 10, and the initial training set and the pseudo document image data are used to train the document analysis model 10. Therefore, the training data can be conveniently amplified, and the document analysis model can have high generalization performance. The first model 21 of the data generation model 20 is trained based on the image blocks in pairs for generating the countermeasure network, so that the data generation model 20 can generate a document image that is closer to the actual situation generated manually, the performance of the first model 21 can be effectively improved, and the quality and quantity of training data of the document analysis model 10 can be improved, thereby improving the generalization performance of the document analysis model 10. In addition, the second model 22 based on the generation countermeasure network in the data generation model 20 can also generate a pseudo-snap image. In this case, it is possible to easily convert from the original document image to the legal image and from the original document image to the reproduction image. Therefore, the method and the device can be suitable for training of most of models for analyzing the document images, so that the problems of insufficient training data, overlarge difficulty in generating the training data, time and labor waste in the process of developing a deep network or a traditional machine learning model in the field of application of the document image security analysis are solved or alleviated, and the process of acquiring the training data is greatly simplified.

While the present disclosure has been described in detail in connection with the drawings and examples, it should be understood that the above description is not intended to limit the disclosure in any way. Variations and changes may be made as necessary by those skilled in the art without departing from the true spirit and scope of the disclosure, which fall within the scope of the disclosure.

Claims

1. A training method of training data of an augmented document analysis model, the document analysis model being a model of multitask-based machine learning that analyzes both a legal image and a copied image in a document image obtained by an acquisition device based on an original document image, the document image being an image affected by a privacy policy, the training method comprising:

acquiring a first sample and a second sample for verification, wherein the first sample and the second sample respectively comprise the document image, the document image comprises the legal image and the copied image, the legal image is an image acquired by acquiring the original document image through the acquisition device, the copied image is an image acquired by printing the legal image on a material object carrier to obtain a printed image, and the acquisition device acquires the printed image;

inputting an original document image generated by software into a data generation model based on a generation countermeasure network so as to automatically generate pseudo document image data corresponding to the original document image generated by the software, wherein the pseudo document image data comprises image blocks of a pseudo legal image and a pseudo reproduction image, and the data generation model is used for training by using the first sample so as to simulate the mode that the acquisition equipment generates the document image and further simulate the distortion or degradation of the original document image generated by the software; and is

Obtaining a target sample based on the first sample obtained by the acquisition device and a third sample including the pseudo document image data generated by the data generation model, the target sample including the first sample and the third sample, training the document analysis model using the target sample, and verifying the trained document analysis model using the second sample to analyze the performance of the trained document analysis model, if the performance of the trained document analysis model does not satisfy a preset condition, adding the first samples from different sources to fine tune the data generation model, the number of the third samples being greater than the number of the first samples,

wherein the training of the data generation model comprises training based on a first model that generates a countermeasure network and is used to generate patches of the pseudo-legal image and training of a second model that is used to generate the pseudo-reproduced image, the training of the first model comprising:

after image alignment is carried out on legal images in the first sample, image segmentation is carried out on the legal images in the first sample and original document images corresponding to the first sample so as to obtain a plurality of paired image blocks as a first training set, wherein the paired image blocks comprise original image blocks from the original document images and legal image blocks corresponding to the original image blocks and from the legal images in the first sample;

training the first model with the first training set to match patches of the pseudo-legal image generated by the first model to the legal patches,

the second model includes a recovery network for rolling back the copied image to the original document image and a copy network based on a generation countermeasure network for simulating distortion of the original document image to generate the pseudo-copied image.

2. The training method of claim 1, wherein the first model comprises a first generative countermeasure network and a second generative countermeasure network coupled to each other, and wherein in training the first model:

the generating network of the first generating countermeasure network converts an original image block of a pair of image blocks of the plurality of paired image blocks into a first pseudo-legal image block, and the generating network of the second generating countermeasure network converts the first pseudo-legal image block into a first pseudo-original image block;

the generation network of the second generation countermeasure network converts a legal image block of the pair of image blocks into a second pseudo-original image block, and the generation network of the first generation countermeasure network converts the second pseudo-original image block into a second pseudo-legal image block;

the first discrimination network of the generation countermeasure network acquires a first discrimination result by judging the similarity between a legal image block in the pair of image blocks and the second pseudo-legal image block, and the second discrimination network of the generation countermeasure network acquires a second discrimination result by judging the similarity between an original image block in the pair of image blocks and the first pseudo-original image block; and is

And constructing a first loss function of the first model based on the first judgment result and the second judgment result and optimizing the first model by using the first loss function.

3. Training method according to claim 1, characterized in that:

the acquisition device is used for printing, shooting or scanning a document.

4. A training method according to claim 1, wherein the training of the second model comprises:

after the images of the copied images in the first sample are aligned, taking the copied images in the first sample and the original document images corresponding to the first sample as a second training set;

performing joint training on the recovery network and the copying network by using the second training set, wherein in the joint training, a generation network of the copying network takes original document images in the second training set as input and outputs the pseudo-copied images, and the recovery network takes the pseudo-copied images as input and outputs pseudo-recovered images;

the judging network of the copying network obtains a third judging result by judging the similarity between the pseudo-copied image and the copied image in the first sample; and is

And constructing a second loss function of the second model based on the third discrimination result and optimizing the second model by using the second loss function.

5. Training method according to claim 4, characterized in that:

and before the joint training, pre-training the recovery network based on supervised learning by utilizing the existing training set, wherein the recovery network is based on a U-Net network.

6. Training method according to claim 1, characterized in that:

the first sample is obtained by the capture device based on the original document image under known capture conditions, and the second sample is obtained by the capture device based on the original document image under unknown capture conditions.

7. A training apparatus that amplifies training data of a document analysis model, comprising: a memory for non-transitory storage of computer readable instructions; and a processor for executing the computer readable instructions, the computer readable instructions when executed by the processor performing the training method of any of claims 1 to 6.