CN114529909A - Sample data set generation method and device and electronic equipment - Google Patents

Sample data set generation method and device and electronic equipment Download PDF

Info

Publication number
CN114529909A
CN114529909A CN202210148525.4A CN202210148525A CN114529909A CN 114529909 A CN114529909 A CN 114529909A CN 202210148525 A CN202210148525 A CN 202210148525A CN 114529909 A CN114529909 A CN 114529909A
Authority
CN
China
Prior art keywords
image
images
target
sub
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210148525.4A
Other languages
Chinese (zh)
Inventor
黄聚
李煜林
王鹏
谢群义
钦夏孟
姚锟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210148525.4A priority Critical patent/CN114529909A/en
Publication of CN114529909A publication Critical patent/CN114529909A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Processing (AREA)

Abstract

The disclosure provides a method and a device for generating a sample data set and electronic equipment, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing and computer vision, and can be applied to an optical character recognition scene. The specific implementation scheme is as follows: acquiring an original image, wherein the original image at least comprises a text area; carrying out color transformation processing on an original image to obtain at least one first image; cutting at least one first image to obtain a plurality of first sub-images; attaching the first sub-images to obtain a plurality of target images; and generating a sample data set based on the plurality of target images, wherein the sample data set is used for training a preset model, and the preset model is at least used for identifying a text region in the image to be identified.

Description

Sample data set generation method and device and electronic equipment
Technical Field
The present disclosure relates to the technical field of artificial intelligence, and in particular, to the technical field of deep learning, image processing, and computer vision, which can be applied to scenes such as optical character recognition, and in particular, to a method and an apparatus for generating a sample data set, and an electronic device.
Background
At present, with the rapid development of an OCR (optical character recognition) technology, devices gradually replace manual work to look up or check characters in images corresponding to paper documents (such as invoices, articles and the like), but because the formats of the paper documents are complex and various, a large number of characters are overlapped, and the lengths of character lines are different, a certain defect still exists in detecting the characters in the images corresponding to the paper documents through related devices.
At present, the existing data augmentation method is not enough in utilization of data, has no potential of fully mining data, and cannot effectively increase a sample data set.
Disclosure of Invention
The disclosure provides a method and a device for generating a sample data set and electronic equipment.
According to an aspect of the present disclosure, a method for generating a sample data set is provided, including: acquiring an original image, wherein the original image at least comprises a text area; carrying out color transformation processing on an original image to obtain at least one first image; cutting at least one first image to obtain a plurality of first sub-images; attaching the first sub-images to obtain a plurality of target images; and generating a sample data set based on the plurality of target images, wherein the sample data set is used for training a preset model, and the preset model is at least used for identifying a text region in the image to be identified.
Further, the method for generating the sample data set further comprises: randomly determining a target processing mode from a plurality of color transformation processing modes; and performing color transformation on the original image based on the target processing mode to obtain at least one first image.
Further, the method for generating the sample data set further comprises: when the target processing mode is a color disturbance processing mode, randomly enhancing a color channel of an original image to obtain at least one first image, wherein the color channel at least comprises: a luminance channel, a saturation channel.
Further, the method for generating the sample data set further comprises: when the target processing mode is a noise adding mode, randomly determining a target noise signal from a plurality of noise signals; and carrying out noise superposition processing on the original image based on the target noise signal to obtain at least one first image.
Further, the method for generating the sample data set further comprises: and when the target processing mode is a gray scale processing mode, converting the original image into a gray scale image to obtain at least one first image.
Further, the method for generating the sample data set further comprises: determining the cutting quantity corresponding to each first image in at least one first image, wherein the cutting quantity represents the quantity of images obtained by cutting each first image; and cutting each first image into a plurality of images based on the cutting number to obtain a plurality of first sub-images.
Further, the method for generating the sample data set further comprises: detecting a target text region in at least one first image; and cropping each first image based on the cropping number and the target text area to obtain a plurality of first sub-images, wherein the target text area is positioned in any one of the plurality of first sub-images.
Further, the method for generating the sample data set further comprises: determining at least one second image constituting each target image from the plurality of first sub-images; randomly determining the image position of at least one second image on the corresponding target image; and attaching the at least one second image based on the image position to obtain a plurality of target images.
Further, the method for generating the sample data set further comprises: performing geometric transformation processing on the plurality of target images to obtain a plurality of third images; and constructing a sample data set based on the plurality of third images.
Further, the method for generating the sample data set further comprises: performing geometric transformation processing on a plurality of target images by any one or more of the following modes: rotating a plurality of target images; carrying out affine transformation operation on a plurality of target images; and performing perspective transformation operation on the plurality of target images.
According to another aspect of the present disclosure, there is provided an apparatus for generating a sample data set, including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an original image, and the original image at least comprises a text area; the transformation module is used for carrying out color transformation processing on the original image to obtain at least one first image; the cutting module is used for cutting at least one first image to obtain a plurality of first sub-images; the laminating module is used for laminating the plurality of first sub-images to obtain a plurality of target images; the generating module is used for generating a sample data set based on the target images, wherein the sample data set is used for training a preset model, and the preset model is at least used for identifying a text region in the image to be identified.
Further, the transformation module further comprises: a first determining module for randomly determining a target processing mode from a plurality of color transformation processing modes; and the first sub-transformation module is used for carrying out color transformation on the original image based on the target processing mode to obtain at least one first image.
Further, the first sub-transform module further comprises: the first processing module is configured to, when the target processing mode is a color perturbation processing mode, randomly enhance a color channel of an original image to obtain at least one first image, where the color channel at least includes: a luminance channel, a saturation channel.
Further, the first sub-transformation module further comprises: a second determining module, configured to randomly determine a target noise signal from the multiple noise signals when the target processing mode is a noise adding mode; and the second processing module is used for carrying out noise superposition processing on the original image based on the target noise signal to obtain at least one first image.
Further, the first sub-transform module further comprises: and the third processing module is used for converting the original image into a gray image to obtain at least one first image when the target processing mode is a gray processing mode.
Further, the clipping module further comprises: the third determining module is used for determining the cutting quantity corresponding to each first image in at least one first image, wherein the cutting quantity represents the quantity of images obtained by cutting each first image; and the first sub-cropping module is used for cropping each first image into a plurality of images based on the cropping number to obtain a plurality of first sub-images.
Further, the first sub-cropping module further comprises: a detection module for detecting a target text region in at least one first image; and the second sub-cropping module is used for cropping each first image based on the cropping number and the target text area to obtain a plurality of first sub-images, wherein the target text area is positioned in any one of the plurality of first sub-images.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the storage stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the method for generating the sample data set.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of generating a sample data set described above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of generating a sample data set as described above.
In the embodiment of the disclosure, a mode of obtaining a plurality of target images based on an original image to generate a sample data set is adopted, the original image is obtained, color transformation processing is performed on the original image to obtain at least one first image, at least one first image is cut to obtain a plurality of first sub-images, then the plurality of first sub-images are subjected to laminating processing to obtain a plurality of target images, and thus the sample data set is generated based on the plurality of target images. The original image at least comprises a text region, the sample data set is used for training a preset model, and the preset model is at least used for identifying the text region in the image to be identified.
In the process, the original image is subjected to color transformation processing and cutting processing in sequence, and the cut first sub-image is subjected to laminating processing, so that a plurality of target images which have part of the same text content and different image display contents and correspond to the original image can be effectively obtained, namely, a plurality of target images corresponding to model training requirements can be obtained according to the original image, a training sample set can be effectively increased, meanwhile, a sample data set generated by marking the image can be avoided, and the marking cost is saved.
Therefore, the scheme provided by the disclosure achieves the purpose of generating the sample data set by obtaining a plurality of target images based on the original images, thereby realizing the technical effect of increasing the training sample set and further solving the technical problem that the sample data set cannot be effectively increased by the existing data augmentation method.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a method of generating a sample data set according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a method of generating a sample data set according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a generation apparatus of a sample data set according to an embodiment of the present disclosure;
fig. 4 is a block diagram of an electronic device used to implement a method of generating a sample data set of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
Example 1
In accordance with an embodiment of the present disclosure, there is provided an embodiment of a method for generating a sample data set, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system, such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a schematic diagram of a method for generating a sample data set according to an embodiment of the present disclosure, as shown in fig. 1 and fig. 2, the method includes the following steps:
step S102, an original image is obtained, wherein the original image at least comprises a text area.
In step S102, the original image may be acquired by an image processing system, an electronic device, a processor, or the like. The original image may be generated by photographing the entity file based on a device with a photographing function, such as a camera or a mobile phone, or generated by scanning the entity file based on a device with a scanning function, such as a scanner. The generated original image at least comprises a part of text area on the entity document, and the entity document can be a paper document or a document made of other materials (such as cloth, stone carving and the like).
Optionally, in this embodiment, an original image is obtained by the image processing system, and the original image is an image generated based on a paper ticket.
It should be noted that, by acquiring the original image, the original image is processed subsequently, so as to generate the sample data set.
Step S104, color transformation processing is carried out on the original image to obtain at least one first image.
In step S104, the image processing system may perform color transformation on the entire region of the original image, or may perform color transformation on a partial region of the original image. The image processing system may determine the partial region by randomly selecting from the original image, may determine a region corresponding to a specific color in the original image as the partial region, and may also determine the partial region by other means.
Optionally, the color transformation characterization changes the colors in the original image. Specifically, the color transformation may be a change of brightness, contrast, saturation in the original image, or a change of hue (i.e., color) in the original image.
It should be noted that, by performing color transformation processing on the original image, on one hand, the obtained first image can be diversified, thereby being beneficial to enriching the sample data set; on the other hand, the partial image features in the original image can be more obvious, and characters in the original image can be more easily recognized, so that the working efficiency is improved.
Step S106, at least one first image is cut to obtain a plurality of first sub-images.
In step S106, the image processing system may randomly crop the first image, may crop only a text region in the first image, or may crop only a region with or without a designated color in the first image.
It should be noted that, on the one hand, the multiple first sub-images are obtained based on the cropping processing, which is convenient for obtaining multiple target images subsequently; on the other hand, the diversity of the scale of the obtained first sub-image is increased, so that the diversity of the target objects can be further improved.
And step S108, carrying out laminating treatment on the plurality of first sub-images to obtain a plurality of target images.
In step S108, the image processing system may attach all the first sub-images corresponding to the original image to a preset canvas, or may randomly or appointed select a part of the first sub-images to be attached to the preset canvas.
Further, in the attaching process, the positions of the first sub-images may be randomly distributed, or may be arranged or placed according to a specific order, for example: the image processing system arranges the first sub-images from the first direction to the second direction according to the number of characters in each first sub-image, or the image processing system respectively arranges the first sub-images in the first direction and the second direction according to the character fonts (such as hand-written fonts or printed fonts) in each first sub-image, wherein the first direction is different from the second direction.
It should be noted that, by applying a plurality of first sub-images corresponding to the original image, a plurality of target images having partially the same text content but different image display contents can be obtained, so that on one hand, data expansion can be effectively achieved, and on the other hand, the amount of overlapped samples can be effectively increased, which is convenient for training the model.
Step S110, generating a sample data set based on a plurality of target images, wherein the sample data set is used for training a preset model, and the preset model is at least used for identifying a text region in an image to be identified.
In step S110, the image processing system may first perform image processing on the plurality of target images, such as geometric transformation, color transformation or other processing methods on the target images, and then generate a sample data set.
In step S110, the sample data set is generated based on the plurality of target images, so that an effect of effectively increasing the sample data set is achieved, and meanwhile, the sample data set is prevented from being generated by labeling the images, thereby saving the labeling cost.
At present, because the labeling cost for labeling the characters is high, the prior art enriches the sample data set for training the model in the related equipment and improves the character detection effect generally based on the data augmentation mode. Wherein, the classical character detection algorithm comprises:
(1) and EAST: and (5) cutting and zooming the image.
(2) And DBNet: and (4) cutting, zooming, turning over and rotating the image.
(3) PSENet: and (4) cutting, zooming, turning over and rotating the image.
However, the data augmentation methods corresponding to the above character detection algorithms are simple in processing the image, insufficient in utilization of the data itself, and insufficient in potential of mining the data, so that the sample data set cannot be effectively increased. Therefore, in order to solve the above problem, the present disclosure provides a method for generating the sample data set described above.
Based on the solutions defined in steps S102 to S110, it can be known that, in the embodiment of the present disclosure, a mode of obtaining a plurality of target images based on an original image to generate a sample data set is adopted, the original image is obtained, color transformation processing is performed on the original image to obtain at least one first image, at least one first image is cut to obtain a plurality of first sub-images, and then the plurality of first sub-images are subjected to fitting processing to obtain a plurality of target images, so that the sample data set is generated based on the plurality of target images. The original image at least comprises a text region, the sample data set is used for training a preset model, and the preset model is at least used for identifying the text region in the image to be identified.
It is easy to note that in the above process, by sequentially performing color conversion processing and clipping processing on the original image, and performing fitting processing on the clipped first sub-image, a plurality of target images which have the same text content in part but different image display contents and correspond to the original image can be effectively obtained, that is, a plurality of target images corresponding to model training requirements can be obtained according to the original image, so that a training sample set can be effectively increased, and meanwhile, a sample data set generated by labeling the image can be avoided, thereby saving the labeling cost.
Therefore, the scheme provided by the disclosure achieves the purpose of generating the sample data set by obtaining a plurality of target images based on the original images, thereby realizing the technical effect of increasing the training sample set and further solving the technical problem that the sample data set cannot be effectively increased by the existing data augmentation method.
In an alternative embodiment, the image processing system may randomly determine the target processing mode from a plurality of color transformation processing modes, so as to perform color transformation on the original image based on the target processing mode to obtain at least one first image.
Optionally, a plurality of color transformation processing manners may be preset in the image processing system, the memory or other devices with storage functions, as shown in fig. 2, the color transformation processing manners at least include a color disturbance manner, a noise adding mode and a gray scale processing mode, so as to adjust at least one attribute of brightness, contrast, saturation, hue and noise in the original image.
It should be noted that, a certain color transformation processing mode is randomly selected from the multiple color transformation processing modes to perform color transformation on the original image, so that enhancement of different image features in the first image is realized, and the obtained first image is diversified, thereby being beneficial to enriching of the sample data set.
In an optional embodiment, when the target processing mode is a color perturbation processing mode, the image processing system may perform random enhancement on color channels of the original image to obtain at least one first image, where the color channels at least include: a luminance channel, a saturation channel.
Optionally, when the original image is obtained, the image processing system may randomly select at least one color channel from an HSV (Hue, Saturation) channel, an RGB channel, a luminance channel, a Saturation channel, and a contrast channel corresponding to the original image, and perform enhancement processing on the selected color channel, thereby obtaining at least one first image. For example, the RGB channel may be a color channel that is divided into an R channel, a G channel, and a B channel, and the image processing system may randomly select at least one color channel from the R channel, the G channel, and the B channel for enhancement, or may wholly enhance the color channel.
It should be noted that, by randomly enhancing the color channel of the original image, significant gain can be brought to the partial image features of the colored image, so that on one hand, the diversity of the target image is improved, and on the other hand, the image processing system is convenient to process the first image.
In an optional embodiment, when the target processing mode is a noise adding mode, the image processing system may randomly determine a target noise signal from the multiple noise signals, and perform noise superposition processing on the original image based on the target noise signal to obtain at least one first image.
Optionally, the image processing system may randomly determine a target noise signal from noise signals such as gaussian noise, poisson noise, multiplicative noise, rayleigh noise, gamma noise, and salt-and-pepper noise, so as to perform noise superposition processing on the original image. After determining the target noise signal, the image processing system may further randomly determine a relevant parameter value according to the target noise signal, for example: when the target noise signal is salt-pepper noise, the related parameter values corresponding to the target noise signal at least comprise a signal-to-noise ratio and a pixel value, so that the image processing system can randomly determine the signal-to-noise ratio in a [0, 1] interval and randomly determine the target pixel value between the pixel value 255 and the pixel value 0, and thus noise superposition processing is performed on the original image based on the target noise signal and the randomly determined parameter values thereof. It is emphasized that the image processing system may also perform a noise superposition process on the original image based on the target noise signal and its randomly determined parameter values to obtain the at least one first image.
By performing the noise superposition processing on the original image, on one hand, the diversity of the target image can be further improved, and on the other hand, the processing of the first image by the image processing system is facilitated.
In an alternative embodiment, when the target processing mode is a grayscale processing mode, the image processing system may convert the original image into a grayscale image to obtain at least one first image.
Optionally, the image processing system may randomly select a gray scale processing algorithm from algorithms such as a component method, a maximum value method, an average value method, and a weighted average method to convert the original image into the gray scale image, or may convert the original image into the gray scale image based on a preset gray scale processing algorithm.
By converting the original image into the grayscale image, on the one hand, the diversity of the target image can be further improved, and on the other hand, the dimensionality of the original image can be reduced, thereby improving the operation speed of the image processing system.
In an alternative embodiment, after obtaining the at least one first image, the image processing system determines a cropping number corresponding to each first image in the at least one first image, so as to crop each first image into a plurality of images based on the cropping number, thereby obtaining a plurality of first sub-images. The cropping number represents the number of images obtained by cropping each first image.
The number of cuts corresponding to each first image may be the same or different. Specifically, the image processing system may randomly select a value within a preset value range to determine the value as the cropping number, or may determine a fixed value preset in the system as the cropping number. For example, before the image processing system obtains the original images, an operator may determine the type (such as a traffic invoice, a hotel invoice, and the like) to which each original image belongs and identify the type of each original image through the image processing system or a third-party server, so that when the image processing system determines the cutting number of the first images, the corresponding value range or the fixed value may be determined according to the type identification.
Optionally, the operator may also divide the size of the text region in the first image into multiple types in the image processing system, and set a corresponding cropping number for each type, so that the image processing system may determine the cropping number based on the size of the text region in the first image.
It should be noted that, the determination of the number of the first sub-images is realized by determining the number of the cropping corresponding to the first image.
In an alternative embodiment, after the image processing system obtains the plurality of first sub-images, the image processing system may detect a target text region in at least one of the first images, and then crop each of the first images based on the number of crops and the target text region to obtain the plurality of first sub-images. Wherein the target text region is located in any one of the plurality of first sub-images.
Optionally, the image processing system randomly crops the first image into a plurality of first sub-images corresponding to the cropping number, each of the first sub-images has a portion of the text content in the target text region, and the size of the region of each of the first sub-images may be different. Specifically, during the cropping process, the image processing system maintains the integrity of the fields within the target text region, i.e., does not crop the fields or content associated with the fields, such as: the bill will show "drawer: zhang III, etc., wherein, when the drawer: when Zhang three is used as a whole field, the field is not cut in order to not destroy the field structure, and when the drawer: "and" Zhang three "each alone as a field," drawer: "and" Zhang three "are associated contents, and in order not to destroy the meaning of the field, the image processing system does not execute the following steps on the condition that the" drawer: and (5) cutting by three pieces.
It should be noted that, by distributing the target text regions into the plurality of first sub-images, on one hand, each first sub-image can contain valid information, that is, each target image is guaranteed to be valid data, and on the other hand, the phenomenon that the target text regions exist in only one first sub-image is avoided, so that the diversity of the target images is guaranteed.
In an alternative embodiment, after obtaining the plurality of first sub-images, the image processing system determines at least one second image forming each target image from the plurality of first sub-images, and randomly determines an image position of the at least one second image on the corresponding target image, so that the at least one second image is fitted based on the image position to obtain the plurality of target images.
Optionally, in this embodiment, the image processing system may determine any number of second images from a plurality of first sub-images corresponding to the same first image, or may determine any number of second images from a plurality of first sub-images respectively corresponding to different first images. After the second images are determined, the image processing system randomly determines the image location of each second image on the canvas, and may attach each second image to the canvas based on the CutMix method (a method in which a portion of one image is cut off and pasted onto another image) to obtain a plurality of destination images.
It should be noted that, by randomly determining the image positions of the second images on the target image, even if the plurality of selected second images are the same, the target images generated by the second images are different, so that the overlapping sample size of the target images can be further increased.
In an alternative embodiment, as shown in fig. 2, after obtaining the plurality of target images, the image processing system may perform geometric transformation processing on the plurality of target images to obtain a plurality of third images, so as to construct a sample data set based on the plurality of third images.
Optionally, the geometric transformation processing performed on the multiple target images by the image processing system may be stretching, rotating, mirroring, perspective transformation, or the like performed on the target images.
It should be noted that, by performing geometric transformation processing on the target image, the point coordinate information of the target image can be effectively enriched, thereby achieving the effect of enriching the sample data set.
In an alternative embodiment, the image processing system may perform the geometric transformation on the plurality of target images by any one or more of: rotating a plurality of target images; carrying out affine transformation operation on a plurality of target images; and performing perspective transformation operation on the plurality of target images.
Specifically, in the present embodiment, as shown in fig. 2, the image processing system may randomly select at least one operation from rotation, affine transformation, and perspective transformation to geometrically transform the target image. The rotation operation performed on the target image may be rotation of the target image by a random angle, and in this embodiment, the angle value is 10 ° or-10 °. The affine transformation representation stretches the opposite corners of the target image, and the stretching degree can be determined based on a displacement value or a displacement range preset in the system. The perspective transformation is used to mimic the pattern of the target image seen at different viewing perspectives.
It should be noted that, geometric transformation is performed on the target image based on rotation, affine transformation or perspective transformation, on one hand, enrichment of the sample data set is achieved, and on the other hand, the obtained third image can better conform to the image pattern generated in the actual scene, so that after the model is trained through the sample data set constructed by the third image, the trained model has a better detection effect.
It should be noted that, the method and the device can significantly improve the effect of text detection of the related equipment, and are particularly suitable for the case of a small number of samples, i.e., effectively save the labeling cost.
Therefore, the scheme provided by the disclosure achieves the purpose of generating the sample data set by obtaining a plurality of target images based on the original images, thereby realizing the technical effect of increasing the training sample set and further solving the technical problem that the sample data set cannot be effectively increased by the existing data augmentation method.
Example 2
According to an embodiment of the present disclosure, an embodiment of a device for generating a sample data set is provided, where fig. 3 is a schematic diagram of the device for generating a sample data set according to an embodiment of the present disclosure, as shown in fig. 3, the device includes:
an obtaining module 302, configured to obtain an original image, where the original image at least includes a text region;
a transformation module 304, configured to perform color transformation processing on an original image to obtain at least one first image;
the cropping module 306 is configured to crop at least one first image to obtain a plurality of first sub-images;
a fitting module 308, configured to perform fitting processing on the plurality of first sub-images to obtain a plurality of target images;
the generating module 310 is configured to generate a sample data set based on the multiple target images, where the sample data set is used to train a preset model, and the preset model is at least used to identify a text region in the image to be identified.
It should be noted that the obtaining module 302, the transforming module 304, the cropping module 306, the fitting module 308, and the generating module 310 correspond to steps S102 to S110 in the foregoing embodiment, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1.
Optionally, the transformation module further includes: a first determining module, configured to randomly determine a target processing manner from a plurality of color transformation processing manners; and the first sub-transformation module is used for carrying out color transformation on the original image based on the target processing mode to obtain at least one first image.
Optionally, the first sub-transform module further includes: the first processing module is configured to, when the target processing mode is a color perturbation processing mode, randomly enhance a color channel of an original image to obtain at least one first image, where the color channel at least includes: a luminance channel, a saturation channel.
Optionally, the first sub-transform module further includes: a second determining module, configured to randomly determine a target noise signal from the multiple noise signals when the target processing mode is a noise adding mode; and the second processing module is used for carrying out noise superposition processing on the original image based on the target noise signal to obtain at least one first image.
Optionally, the first sub-transform module further includes: and the third processing module is used for converting the original image into a gray image to obtain at least one first image when the target processing mode is a gray processing mode.
Optionally, the cutting module further includes: the third determining module is used for determining the cutting quantity corresponding to each first image in at least one first image, wherein the cutting quantity represents the quantity of images obtained by cutting each first image; and the first sub-cropping module is used for cropping each first image into a plurality of images based on the cropping number to obtain a plurality of first sub-images.
Optionally, the first sub-clipping module further includes: a detection module for detecting a target text region in at least one first image; and the second sub-cropping module is used for cropping each first image based on the cropping number and the target text area to obtain a plurality of first sub-images, wherein the target text area is positioned in any one of the plurality of first sub-images.
Optionally, the attaching module further includes: a fourth determining module for determining at least one second image constituting each target image from the plurality of first sub-images; a fifth determining module, configured to randomly determine an image position of the at least one second image on the corresponding target image; and the first sub-attaching module is used for attaching at least one second image based on the image position to obtain a plurality of target images.
Optionally, the apparatus for generating a sample data set further includes: the fourth processing module is used for carrying out geometric transformation processing on the plurality of target images to obtain a plurality of third images; and the construction module is used for constructing the sample data set based on the plurality of third images.
Optionally, the apparatus for generating a sample data set further includes: a fifth processing module, configured to perform geometric transformation processing on the multiple target images through any one or more of the following manners: rotating a plurality of target images; carrying out affine transformation operation on a plurality of target images; and performing perspective transformation operation on the plurality of target images.
Example 3
The present disclosure also provides an electronic device, a non-transitory computer readable storage medium storing computer instructions, and a computer program product according to embodiments of the present disclosure.
FIG. 4 shows a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the device 400 can also be stored. The calculation unit 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
A number of components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, or the like; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408 such as a magnetic disk, optical disk, or the like; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 401 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 401 executes the respective methods and processes described above, such as the generation method of the sample data set. For example, in some embodiments, the method of generating the sample data set may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into the RAM 403 and executed by the computing unit 401, one or more steps of the method of generating a sample data set described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the method of generating the sample data set by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (20)

1. A method for generating a sample data set comprises the following steps:
acquiring an original image, wherein the original image at least comprises a text area;
carrying out color transformation processing on the original image to obtain at least one first image;
cutting the at least one first image to obtain a plurality of first sub-images;
performing laminating treatment on the plurality of first sub-images to obtain a plurality of target images;
generating a sample data set based on the target images, wherein the sample data set is used for training a preset model, and the preset model is at least used for identifying a text region in the image to be identified.
2. The method of claim 1, wherein color transforming the original image to obtain at least one first image comprises:
randomly determining a target processing mode from a plurality of color transformation processing modes;
and performing color transformation on the original image based on the target processing mode to obtain the at least one first image.
3. The method of claim 2, wherein color transforming the original image based on the target processing mode to obtain the at least one first image comprises:
when the target processing mode is a color disturbance processing mode, randomly enhancing a color channel of the original image to obtain the at least one first image, wherein the color channel at least comprises: a luminance channel, a saturation channel.
4. The method of claim 2, wherein color transforming the original image based on the target processing mode to obtain the at least one first image comprises:
when the target processing mode is a noise adding mode, randomly determining a target noise signal from a plurality of noise signals;
and performing noise superposition processing on the original image based on the target noise signal to obtain the at least one first image.
5. The method of claim 2, wherein color transforming the original image based on the target processing mode to obtain the at least one first image comprises:
and when the target processing mode is a gray scale processing mode, converting the original image into a gray scale image to obtain the at least one first image.
6. The method of claim 1, wherein cropping the at least one first image to obtain a plurality of first sub-images comprises:
determining a cropping quantity corresponding to each first image in the at least one first image, wherein the cropping quantity represents the quantity of images obtained by cropping each first image;
and cutting each first image into a plurality of images based on the cutting quantity to obtain a plurality of first sub-images.
7. The method of claim 6, wherein cropping the each first image into a plurality of images based on the cropping number, resulting in the plurality of first sub-images, comprises:
detecting a target text region in the at least one first image;
and cropping each first image based on the cropping number and the target text area to obtain the plurality of first sub-images, wherein the target text area is located in any one of the plurality of first sub-images.
8. The method of claim 1, wherein the fitting the plurality of first sub-images to obtain a plurality of target images comprises:
determining at least one second image constituting each target image from the plurality of first sub-images;
randomly determining an image position of the at least one second image on the corresponding target image;
and fitting the at least one second image based on the image position to obtain the plurality of target images.
9. The method of claim 1, wherein after performing the fit processing on the first sub-images to obtain a plurality of target images, the method further comprises:
performing geometric transformation processing on the plurality of target images to obtain a plurality of third images;
constructing the sample data set based on the plurality of third images.
10. The method of claim 9, wherein the method further comprises:
performing geometric transformation processing on the plurality of target images by any one or more of the following modes:
performing a rotation operation on the plurality of target images;
performing affine transformation operation on the plurality of target images;
and carrying out perspective transformation operation on the plurality of target images.
11. An apparatus for generating a sample data set, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an original image, and the original image at least comprises a text area;
the transformation module is used for carrying out color transformation processing on the original image to obtain at least one first image;
the cutting module is used for cutting the at least one first image to obtain a plurality of first sub-images;
the laminating module is used for laminating the plurality of first sub-images to obtain a plurality of target images;
and the generating module is used for generating a sample data set based on the target images, wherein the sample data set is used for training a preset model, and the preset model is at least used for identifying a text region in the image to be identified.
12. The apparatus of claim 11, wherein the transformation module further comprises:
a first determining module for randomly determining a target processing mode from a plurality of color transformation processing modes;
and the first sub-transformation module is used for carrying out color transformation on the original image based on the target processing mode to obtain the at least one first image.
13. The apparatus of claim 12, wherein the first sub-transform module further comprises:
a first processing module, configured to, when the target processing mode is a color perturbation processing mode, perform random enhancement on a color channel of the original image to obtain the at least one first image, where the color channel at least includes: a luminance channel, a saturation channel.
14. The apparatus of claim 12, wherein the first sub-transform module further comprises:
a second determining module, configured to randomly determine a target noise signal from the multiple noise signals when the target processing mode is a noise adding mode;
and the second processing module is used for carrying out noise superposition processing on the original image based on the target noise signal to obtain the at least one first image.
15. The apparatus of claim 12, wherein the first sub-transformation module further comprises:
and the third processing module is used for converting the original image into a gray image to obtain the at least one first image when the target processing mode is a gray processing mode.
16. The apparatus of claim 11, wherein the cropping module further comprises:
a third determining module, configured to determine a cropping number corresponding to each first image in the at least one first image, where the cropping number represents a number of images obtained by cropping each first image;
and the first sub-cropping module is used for cropping each first image into a plurality of images based on the cropping number to obtain a plurality of first sub-images.
17. The apparatus of claim 16, wherein the first sub-clipping module further comprises:
a detection module to detect a target text region in the at least one first image;
and the second sub-cropping module is used for cropping each first image based on the cropping number and the target text region to obtain the plurality of first sub-images, wherein the target text region is positioned in any one of the plurality of first sub-images.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a sample data set of any of claims 1 to 10.
19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of generating a sample data set according to any one of claims 1 to 10.
20. A computer program product comprising a computer program which, when executed by a processor, implements a method of generating a sample dataset according to any one of claims 1 to 10.
CN202210148525.4A 2022-02-17 2022-02-17 Sample data set generation method and device and electronic equipment Pending CN114529909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210148525.4A CN114529909A (en) 2022-02-17 2022-02-17 Sample data set generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210148525.4A CN114529909A (en) 2022-02-17 2022-02-17 Sample data set generation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114529909A true CN114529909A (en) 2022-05-24

Family

ID=81623208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210148525.4A Pending CN114529909A (en) 2022-02-17 2022-02-17 Sample data set generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114529909A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082598A (en) * 2022-08-24 2022-09-20 北京百度网讯科技有限公司 Text image generation method, text image training method, text image processing method and electronic equipment
CN116091748A (en) * 2023-04-10 2023-05-09 环球数科集团有限公司 AIGC-based image recognition system and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082598A (en) * 2022-08-24 2022-09-20 北京百度网讯科技有限公司 Text image generation method, text image training method, text image processing method and electronic equipment
WO2024040870A1 (en) * 2022-08-24 2024-02-29 北京百度网讯科技有限公司 Text image generation, training, and processing methods, and electronic device
CN116091748A (en) * 2023-04-10 2023-05-09 环球数科集团有限公司 AIGC-based image recognition system and device

Similar Documents

Publication Publication Date Title
CN114529909A (en) Sample data set generation method and device and electronic equipment
CN103208004A (en) Automatic recognition and extraction method and device for bill information area
KR20210156228A (en) Optical character recognition method, device, electronic equipment and storage medium
CN108345888A (en) A kind of connected domain extracting method and device
CN110866900A (en) Water body color identification method and device
CN113822817A (en) Document image enhancement method and device and electronic equipment
CN113362420A (en) Road marking generation method, device, equipment and storage medium
CN110263301B (en) Method and device for determining color of text
WO2023071119A1 (en) Character detection and recognition method and apparatus, electronic device, and storage medium
CN112883926A (en) Identification method and device for table medical images
CN113408251A (en) Layout document processing method and device, electronic equipment and readable storage medium
CN111724396A (en) Image segmentation method and device, computer-readable storage medium and electronic device
CN111784703A (en) Image segmentation method and device, electronic equipment and storage medium
CN109543525B (en) Table extraction method for general table image
CN108877030B (en) Image processing method, device, terminal and computer readable storage medium
US10963690B2 (en) Method for identifying main picture in web page
CN113487473A (en) Method and device for adding image watermark, electronic equipment and storage medium
CN113326766A (en) Training method and device of text detection model and text detection method and device
CN116645678A (en) Image processing method and device based on vector graphics drawing
CN114511862B (en) Form identification method and device and electronic equipment
Salunkhe et al. Recognition of multilingual text from signage boards
CN115797661A (en) Image processing method and device, electronic device and storage medium
CN115937039A (en) Data expansion method and device, electronic equipment and readable storage medium
CN113361371A (en) Road extraction method, device, equipment and storage medium
CN109242750B (en) Picture signature method, picture matching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination