CN114140811A - Certificate sample generation method and device, electronic equipment and storage medium - Google Patents

Certificate sample generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114140811A
CN114140811A CN202111300632.6A CN202111300632A CN114140811A CN 114140811 A CN114140811 A CN 114140811A CN 202111300632 A CN202111300632 A CN 202111300632A CN 114140811 A CN114140811 A CN 114140811A
Authority
CN
China
Prior art keywords
image
certificate
character
sample
gaussian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111300632.6A
Other languages
Chinese (zh)
Inventor
陈卓
孙智彬
夏曙东
杨晓明
胡道生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Transwiseway Information Technology Co Ltd
Original Assignee
Beijing Transwiseway Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Transwiseway Information Technology Co Ltd filed Critical Beijing Transwiseway Information Technology Co Ltd
Priority to CN202111300632.6A priority Critical patent/CN114140811A/en
Publication of CN114140811A publication Critical patent/CN114140811A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a certificate sample generation method, a certificate sample generation device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a certificate character set and generating a first character image without a background; generating a Gaussian kernel randomly, and carrying out Gaussian filtering and labeling on the first character image according to the Gaussian kernel to obtain a second character image; acquiring a certificate background image; embedding the second text image into the certificate background image to generate a certificate sample image; and outputting all the certificate sample images and labels thereof. Thus, a first character image is directly generated through a character set, a plurality of second character images marked are directly obtained through Gaussian filtering, and more certificate sample images are generated through embedding a certificate background image; therefore, sample pictures related to personal privacy do not need to be obtained, complicated feature extraction occupying a large amount of computing resources does not need to be carried out, and manual labeling occupying a large amount of manual resources is not needed.

Description

Certificate sample generation method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of picture identification, in particular to a certificate sample generation method and device, electronic equipment and a storage medium.
Background
At present, a large number of training samples are needed for a character recognition model based on an image, and the accuracy of the character recognition model is directly influenced by the quantity and quality of the training samples. However, for the identification of the certificate, the partial image is easily related to personal privacy, so that a large batch of training samples is difficult to obtain.
In the existing solution, a batch of sample pictures are generally provided, more complex feature extraction is performed to obtain new picture samples, and then the new picture samples are manually labeled and used as training samples. However, in this way, in order to obtain a sufficient number of training samples, a large number of sample pictures still need to be provided, and the feature extraction process is complex, and a large amount of computing resources are occupied; in addition, the samples need to be labeled manually, which occupies a large amount of manual resources.
Disclosure of Invention
The invention solves the problem that the conventional generation mode of the sample picture occupies a large amount of computing resources and artificial resources.
To solve the above problems, a first aspect of the present invention provides a method for generating a document sample, including:
acquiring a certificate character set and generating a first character image without a background;
generating a Gaussian kernel randomly, and carrying out Gaussian filtering and labeling on the first character image according to the Gaussian kernel to obtain a second character image;
acquiring a certificate background image;
embedding the second text image into the certificate background image to generate a certificate sample image;
and outputting all the certificate sample images and labels thereof.
Preferably, the acquiring the certificate text set and generating the background-free first text image includes:
acquiring a certificate character set, and randomly extracting characters from the certificate character set;
and generating a first text image without background by the extracted text according to the font style of the certificate.
Preferably, the randomly generating a gaussian kernel, performing gaussian filtering and labeling on the first text image according to the gaussian kernel, and obtaining a second text image includes:
randomly generating a plurality of Gaussian kernels, and carrying out Gaussian filtering on the first character image based on the Gaussian kernels to obtain a second character image;
acquiring a preset first threshold, and judging the size relationship between the Gaussian kernel and the first threshold;
if the Gaussian kernel is larger than the first threshold value, marking the second character image obtained based on the Gaussian kernel filtering as unavailable;
and if the Gaussian kernel is less than or equal to the first threshold, marking the second character image obtained based on the Gaussian kernel filtering as available.
Preferably, the randomly generating a gaussian kernel, performing gaussian filtering and labeling on the first text image according to the gaussian kernel, and after obtaining a second text image, further includes:
and adjusting the brightness and the contrast of the second character image.
Preferably, the adjusting the brightness and the contrast of the second text image includes:
acquiring a preset parameter value field, and randomly generating a plurality of groups of brightness and contrast adjusting parameters according to the parameter value field;
and calculating the pixel value of the adjusted second character image according to each group of brightness and contrast adjusting parameters.
Preferably, the embedding the second text image into the certificate background image to generate a certificate sample image includes:
acquiring a character distribution range on the certificate background image;
and embedding the second character image into a random position in the character distribution range to obtain a certificate sample image.
Preferably, before outputting all the certificate sample images and their labels, the method further includes:
and performing data enhancement processing on the certificate sample image.
The second aspect of the present invention also provides a document sample generation apparatus comprising:
the generating unit is used for acquiring a certificate character set and generating a first character image without a background;
the filtering unit is used for randomly generating a Gaussian kernel, and carrying out Gaussian filtering and labeling on the first character image according to the Gaussian kernel to obtain a second character image;
an acquisition unit for acquiring a document background image;
the embedding unit is used for embedding the second character image into the certificate background image to generate a certificate sample image;
and the output unit is used for outputting all the certificate sample images and the labels thereof.
The third aspect of the present invention further provides an electronic device, which includes a computer readable storage medium storing a computer program and a processor, wherein the computer program is read by the processor and when executed, implements the certificate sample generation method as described above.
The fourth aspect of the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is read and executed by a processor, the computer program implements the certificate sample generation method as described above.
According to the method, a first character image is directly generated through a character set, a plurality of second character images marked are directly obtained through Gaussian filtering, and more certificate sample images are generated through embedding a certificate background image; therefore, the sample is automatically generated for the specific certificate, the background characteristic data and the character characteristic data do not need to be obtained through a complex algorithm, and the process of generating the sample is simplified; meanwhile, the generated sample is provided with a label and can be directly used for training a model, so that the manual marking process is reduced.
Therefore, sample pictures related to personal privacy do not need to be obtained, complicated feature extraction occupying a large amount of computing resources does not need to be carried out, and manual labeling occupying a large amount of manual resources is not needed.
Drawings
FIG. 1 is a flow diagram of a credential sample generation method according to one embodiment of the present invention;
FIG. 2 is a flowchart of a credential sample generation method S10 according to an embodiment of the invention;
FIG. 3 is a flowchart of a credential sample generation method S20 according to an embodiment of the invention;
FIG. 4 is a flow diagram of a credential sample generation method according to another embodiment of the invention;
FIG. 5 is a flowchart of a credential sample generation method S50, according to an embodiment of the invention;
FIG. 6 is a flow diagram of a credential sample generation method according to yet another embodiment of the invention;
FIG. 7 is a flowchart of a credential sample generation method S60 according to an embodiment of the invention;
FIG. 8 is a flow diagram of a credential sample generation method according to yet another embodiment of the invention;
FIG. 9 is a block diagram of a credential sample generation device according to an embodiment of the present invention;
fig. 10 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
At present, a large number of training samples are needed for a character recognition model based on an image, and the accuracy of the character recognition model is directly influenced by the quantity and quality of the training samples. However, for the identification of the certificate (such as a vehicle driving license), a lot of privacy contents are recorded on the certificate at present, and the partial image is easily related to the privacy, so that it is difficult to acquire a large amount of training samples, and even to say, it is difficult to acquire the training samples.
In the existing solution to the shortage of sample images, a batch of sample images are generally provided first, more complex feature extraction is performed to obtain new image samples, and then the new image samples are respectively manually labeled and used as training samples. However, in this way, in order to obtain a sufficient number of training samples, a large number of sample pictures still need to be provided, and the feature extraction process is complex, and a large amount of computing resources are occupied; in addition, the samples need to be labeled manually, which occupies a large amount of manual resources.
The embodiment of the application provides a certificate sample generation method, which can be executed by a certificate sample generation device, and the certificate sample generation device can be integrated in electronic equipment such as a computer, a server and a computer. FIG. 1 is a flow diagram of a credential sample generation method according to an embodiment of the present invention; the certificate sample generation method comprises the following steps:
s10, acquiring a certificate character set and generating a first character image without a background;
the certificate text set may be generated according to text features on a certificate, for example, a text set representing names may be generated according to a family name and a chinese dictionary, other text sets such as a text set representing nationality may be generated according to a nationality chinese list, the text sets may be combined into a certificate text set, or a plurality of small sets may be set in the certificate text set, so as to generate different text combinations such as names, genders, internationals, and the like according to different small sets. Or the set of certificate words disclosed by a specific platform can be set or directly read according to actual conditions.
Preferably, the first character image can be generated by combining more contents such as names, addresses and the like; therefore, the method can avoid the situation that too few character images are generated by gender, nationality and the like, so that more first character images can be generated, and the number of generated certificate samples is increased to the maximum extent.
S20, generating a Gaussian kernel randomly, and carrying out Gaussian filtering and labeling on the first character image according to the Gaussian kernel to obtain a second character image;
the gaussian filtering is a linear smooth filtering, is suitable for eliminating gaussian noise, and is widely applied to a noise reduction process of image processing. The gaussian filtering is a process of weighted average of the whole image, and the value of each pixel point is obtained by weighted average of the value of each pixel point and other pixel values in the neighborhood. The specific operation of gaussian filtering is: each pixel in the image is scanned using a template (or convolution, mask), and the weighted average of the pixels in the neighborhood determined by the template is used to replace the value of the pixel in the center of the template.
Through Gaussian filtering, according to the difference of the sizes of Gaussian kernels, second character images with different definitions can be obtained; the larger the Gaussian kernel is, the lower the definition of the corresponding generated second character image is; therefore, the second character images with different definitions can be generated by using different Gaussian kernels, and the second character images are directly marked according to the Gaussian kernels.
S40, acquiring a certificate background image;
in the case of a certificate, such as a driving certificate, the removed portrait is identical to the background after the text (the background of the certificate in the same period is identical, and the background of the certificate in different periods is different, but the certificate background can be uniformly replaced by a new certificate background according to the specification, and the certificate background can be considered to be identical unless the certificate is in the period of updating, and the certificate has a color or new and old difference in overall vision due to the difference in the using or placing mode.
Therefore, the acquired certificate background image can be obtained by directly processing the certificate itself or other corresponding certificates acquired in a non-infringement mode, and the method is simple and convenient.
The acquired certificate background images are at least one, preferably one for each certificate background image with different styles; thus, the method is simple to obtain and convenient to use.
S60, embedding the second character image into the certificate background image to generate a certificate sample image;
the specific embedding mode can be selected according to actual requirements. And after the second text image is embedded into the certificate background image, the label of the second text image is the label of the generated certificate sample image.
And S80, outputting all the certificate sample images and labels thereof.
Thus, a first character image is directly generated through a character set, a plurality of second character images marked are directly obtained through Gaussian filtering, and more certificate sample images are generated through embedding a certificate background image; therefore, the sample is automatically generated for the specific certificate, the background characteristic data and the character characteristic data do not need to be obtained through a complex algorithm, and the process of generating the sample is simplified; meanwhile, the generated sample is provided with a label and can be directly used for training a model, so that the manual marking process is reduced.
Therefore, sample pictures related to personal privacy do not need to be obtained, complicated feature extraction occupying a large amount of computing resources does not need to be carried out, and manual labeling occupying a large amount of manual resources is not needed.
Preferably, as shown in fig. 2, S10, acquiring the certificate text set and generating a first text image without background, includes:
s11, acquiring a certificate character set, and randomly extracting characters from the certificate character set;
the certificate text set may be generated according to text features on a certificate, for example, a text set representing names may be generated according to a family name and a chinese dictionary, other text sets such as a text set representing nationality may be generated according to a nationality chinese list, the text sets may be combined into a certificate text set, or a plurality of small sets may be set in the certificate text set, so as to generate different text combinations such as names, genders, internationals, and the like according to different small sets. Or the set of certificate words disclosed by a specific platform can be set or directly read according to actual conditions.
The randomly extracted characters can be the contents of names, addresses and the like which are combined more; therefore, the method can avoid the situation that too few character images are generated by gender, nationality and the like, so that more first character images can be generated, and the number of generated certificate samples is increased to the maximum extent.
S12, generating a first character image without background according to the extracted characters and the font style of the certificate;
wherein, the background-free image is that the pixel values of other images except the character part of the first character image are 0.
Preferably, the first textual image is arranged in PNG format, so that the value of each pixel in the image is represented by three channels (red, blue, green). The three-channel pixel values of the text part are all 255, and the three-channel pixel values of the rest part are all 0.
Assuming that N1 combinations are randomly drawn, N1 first text images can be generated.
Therefore, the first character image is generated through the randomly extracted characters, on one hand, the accuracy of subsequent training can be improved through randomness, and on the other hand, the number of the first character image is greatly increased through the character combination which can be extracted at random in a maximum amount; the random extraction is simple and convenient, and enough first character images can be obtained through few computing resources.
Preferably, as shown in fig. 3, S20, randomly generating a gaussian kernel, and performing gaussian filtering and labeling on the first text image according to the gaussian kernel to obtain a second text image, includes:
s21, randomly generating a plurality of Gaussian kernels, and carrying out Gaussian filtering on the first character image based on the Gaussian kernels to obtain a second character image;
it should be noted that N2 gaussian kernels can be randomly generated, and gaussian filtering is performed on each first text image, so that N2 second text images can be obtained for each first text image. Then N1 first text images can result in N1 × N2 second text images.
Wherein the size of the Gaussian kernel is a positive odd number generated randomly.
The size of the Gaussian kernel can control the character definition of a generated image; the larger the gaussian kernel, the lower the sharpness.
S22, acquiring a preset first threshold, and judging the size relationship between the Gaussian kernel and the first threshold;
the value of the first threshold may be determined according to an actual situation, or may be selected according to an experimental condition.
S23, if the Gaussian kernel is larger than the first threshold, marking the second character image obtained based on the Gaussian kernel filtering as unavailable;
the larger the gaussian kernel, the lower the sharpness; greater than the first threshold, meaning that the sharpness is too low to be discernable; the second textual image is marked as unavailable.
And S24, if the Gaussian kernel is less than or equal to the first threshold, marking the second character image obtained based on the Gaussian kernel filtering as available.
Wherein, whether the Gaussian kernel is available or not can be directly determined through the size of the Gaussian kernel; the gaussian kernel can therefore be marked as usable/unusable directly after it has been obtained, so that the second text image generated after the use of the gaussian kernel directly reproduces the marking.
Therefore, the generated second character image is directly marked through the Gaussian kernel, automatic judgment can be completed, labor is not required to be consumed, and the method is simple and convenient.
Preferably, as shown in fig. 4, the method further comprises: and S30, adjusting the brightness and contrast of the second character image.
Thus, because the certificate such as the vehicle driving certificate is generally a printed paper certificate, the depth of characters can have great difference; aiming at the characteristics of the certificate picture, the contrast and definition conditions of the characters are processed separately, so that a sample which is more similar to a real picture is generated.
Preferably, as shown in fig. 5, S30, the adjusting brightness and contrast of the second text image includes:
s31, acquiring a preset parameter value field, and randomly generating a plurality of groups of brightness and contrast adjusting parameters according to the parameter value field;
wherein, the parameter threshold value can be determined according to actual conditions. The parameter threshold is used for defining a brightness parameter and a contrast parameter, and the specific defining mode can be that the brightness parameter and the contrast parameter are respectively and independently defined; or a definition of the correlation between the brightness parameter and the contrast parameter. The purpose of this limitation is to avoid a large number of black dots (three channel values of 255) or white dots (three channel values of 0) in the adjusted second text image.
Setting N3 groups of brightness and contrast adjusting parameters, N3 times of adjustments can be performed on each second character image, respectively, to obtain N3 adjusted second character images.
The N1 × N2 second text images may result in N1 × N2 × N3 adjusted second text images.
And S32, calculating the pixel value of the adjusted second character image according to each group of brightness and contrast adjusting parameters.
The calculation may be to calculate each pixel value in the second text image separately; if each pixel value is a three-channel value, the same calculation can be performed on the three-channel values. The resulting decimal place can be eliminated by rounding.
In this way, the contrast and definition conditions of the characters are processed separately, so that a sample which is more similar to a real picture is generated; and the number of generated samples can be expanded again by contrast brightness adjustment.
Preferably, the calculation formula of the pixel value of the second character image is:
G(x,y)=a×F(x,y)+b
wherein x and y are rows and columns of the second character image, F (x, y) is a pixel value of the second character image before adjustment in the x-th row and the y-th column, G (x, y) is a pixel value of the second character image after adjustment in the x-th row and the y-th column, a is a contrast adjustment parameter, and b is a brightness adjustment parameter.
Wherein, the dynamic setting is carried out according to the parameter value distribution in the parameter value domain, so that the calculation cannot exceed 255; the values of a and b can be set simultaneously, or a can be randomly generated and then b can be randomly determined according to the constraint of the parameter value field.
The brightness adjusting parameter b can perform brightness offset to control the brightness of the image; the contrast adjusting parameter a can control the contrast of the image, and can expand or reduce the range of different gradient levels between the brightest white and the darkest black in the bright and dark areas in the image.
Therefore, the adjustment of the brightness contrast is completed in a calculation mode, the adjusted second character image can be directly obtained, the method is simple and convenient, and the adjustment is softer.
Preferably, as shown in fig. 6, the method further comprises: and S50, adjusting the brightness and contrast of the certificate background image.
Thus, because the certificate such as the vehicle driving certificate is generally a printed paper certificate, the depth of characters can have great difference; and aiming at the characteristics of the certificate picture, the contrast and definition conditions of the background image are separately processed, so that a sample which is more similar to a real picture is generated.
The specific adjustment process of the certificate background image is the same as S30, S31, and S32, and the detailed description is omitted here.
And setting N4 groups of brightness and contrast adjusting parameters, and adjusting each certificate background image for N4 times to obtain N4 adjusted certificate background images.
Preferably, as shown in fig. 7, S60, embedding the second text image into the document background image to generate a document sample image, includes:
s61, acquiring the character distribution range on the certificate background image;
for the actual inspection of the certificates, such as the driving license and the like, we can find that although most certificates are drawn with transverse lines for filling out the printed characters, the specific characters are not necessarily printed exactly on the transverse lines in the specific printing process, but are shifted upwards, downwards, leftwards or rightwards by some distance; such a word allows for a distance of offset that we consider as the word distribution range of such a word.
The character distribution range can be obtained by statistics according to actual certificates, or can be generated by the central position and the offset distance (the distance from the central position is less than the offset distance, and the character distribution ranges are all character distribution ranges); but may also be defined in other ways according to other circumstances.
S62, embedding the second character image into a random position in the character distribution range to obtain a certificate sample image;
and the random position in the character distribution range is a randomly selected position in the character distribution range. Through the random distribution, the real position distribution can be better met, and therefore a sample image more meeting the actual situation is obtained.
The embedding may be directly covering, that is, covering the second text image at a randomly obtained position on the background image (directly replacing the corresponding pixel value); other embedding methods may be used, such as weighting the pixel values of the second character image and the pixel values at random positions.
It should be noted that, after the second text image is embedded into the random position, a certain deviation occurs between the random position and the color distribution of the background image; the deviation does not affect the definition of the characters, so that the training in character recognition is not affected.
Of course, other ways to reduce the deviation may be adopted, such as selecting a background image and a second text image with similar colors for embedding.
By embedding in different modes, the N1 × N2 × N3 adjusted second character images are embedded into the N4 adjusted certificate background images, and then N1 × N2 × N3 × N4 certificate sample images are obtained.
Preferably, as shown in fig. 8, the method further comprises: and S70, performing data enhancement processing on the certificate sample image.
Through data enhancement processing, more sample numbers can be obtained, and the accuracy of model training can be improved.
Preferably, the data enhancement processing is performed in at least one of the following ways: randomly rotating the sample image according to a preset angle, randomly translating the sample image in the horizontal direction, randomly translating the sample image in the vertical direction, randomly scaling the sample image according to a preset proportion, horizontally or vertically turning the sample image, and horizontally or vertically affine-matching the sample image.
Data enhancement processing, which is the most effective way to increase training data; the number of samples can be increased by an order of magnitude by the data enhancement process, thereby increasing the number of samples enormously.
The data enhancement processing is not limited to the specific manner described above, and random cutting, filling, etc. may be performed, and the specific use may be determined according to actual situations. The detailed process of the specific processing is not described herein.
The embodiment of the present application provides a certificate sample generation device, which is used for executing the certificate sample generation method described in the above-mentioned contents of the present invention, and the certificate sample generation device is described in detail below.
As shown in fig. 9, the credential sample generation device includes:
a generating unit 101, configured to acquire a certificate text set and generate a first text image without a background;
a filtering unit 102, configured to randomly generate a gaussian kernel, perform gaussian filtering on the first text image according to the gaussian kernel, and label the first text image to obtain a second text image;
an acquisition unit 103 for acquiring a document background image;
an embedding unit 104, configured to embed the second text image into the certificate background image, so as to generate a certificate sample image;
an output unit 105 for outputting all the certificate sample images and their annotations.
Preferably, the generating unit 101 is further configured to: acquiring a certificate character set, and randomly extracting characters from the certificate character set; and generating a first text image without background by the extracted text according to the font style of the certificate.
Preferably, the filtering unit 102 is further configured to: randomly generating a plurality of Gaussian kernels, and carrying out Gaussian filtering on the first character image based on the Gaussian kernels to obtain a second character image; acquiring a preset first threshold, and judging the size relationship between the Gaussian kernel and the first threshold; if the Gaussian kernel is larger than the first threshold value, marking the second character image obtained based on the Gaussian kernel filtering as unavailable; and if the Gaussian kernel is less than or equal to the first threshold, marking the second character image obtained based on the Gaussian kernel filtering as available.
Preferably, the device further comprises an adjusting unit for adjusting the brightness and contrast of the second text image.
Preferably, the adjusting unit is further configured to: acquiring a preset parameter value field, and randomly generating a plurality of groups of brightness and contrast adjusting parameters according to the parameter value field; and calculating the pixel value of the adjusted second character image according to each group of brightness and contrast adjusting parameters.
Preferably, the embedding unit 104 is further configured to: acquiring a character distribution range on the certificate background image; and embedding the second character image into a random position in the character distribution range to obtain a certificate sample image.
Preferably, the adjusting unit is further configured to: and performing data enhancement processing on the certificate sample image.
Preferably, the device further comprises an enhancement unit for performing data enhancement processing on the document sample image.
Preferably, the data enhancement processing is performed in at least one of the following ways: randomly rotating the sample image according to a preset angle, randomly translating the sample image in the horizontal direction, randomly translating the sample image in the vertical direction, randomly scaling the sample image according to a preset proportion, horizontally or vertically turning the sample image, and horizontally or vertically affine-matching the sample image.
An electronic device is provided in the embodiment of the present application, as shown in fig. 10, and includes a computer-readable storage medium 301 storing a computer program and a processor 302, wherein the computer program is read by the processor and executed by the processor to implement the certificate sample generation method as described above.
The embodiment of the application provides a computer-readable storage medium, which stores a computer program, and when the computer program is read and executed by a processor, the computer program implements the certificate sample generation method as described above.
The technical solution of the embodiment of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be an air conditioner, a refrigeration device, a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the embodiment of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
Thus, a first character image is directly generated through a character set, a plurality of second character images marked are directly obtained through Gaussian filtering, and more certificate sample images are generated through embedding a certificate background image; therefore, the sample is automatically generated for the specific certificate, the background characteristic data and the character characteristic data do not need to be obtained through a complex algorithm, and the process of generating the sample is simplified; meanwhile, the generated sample is provided with a label and can be directly used for training a model, so that the manual marking process is reduced.
Therefore, sample pictures related to personal privacy do not need to be obtained, complicated feature extraction occupying a large amount of computing resources does not need to be carried out, and manual labeling occupying a large amount of manual resources is not needed.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the application are described in a relevant manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. Reference is made to the preceding description of the embodiments.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A method of generating a credential sample, comprising:
acquiring a certificate character set and generating a first character image without a background;
generating a Gaussian kernel randomly, and carrying out Gaussian filtering and labeling on the first character image according to the Gaussian kernel to obtain a second character image;
acquiring a certificate background image;
embedding the second text image into the certificate background image to generate a certificate sample image;
and outputting all the certificate sample images and labels thereof.
2. The method of generating a document sample as recited in claim 1, wherein the acquiring a document text set and generating a background-free first text image comprises:
acquiring a certificate character set, and randomly extracting characters from the certificate character set;
and generating a first text image without background by the extracted text according to the font style of the certificate.
3. The method of claim 1, wherein the randomly generating a gaussian kernel, gaussian filtering and labeling the first textual image according to the gaussian kernel to obtain a second textual image comprises:
randomly generating a plurality of Gaussian kernels, and carrying out Gaussian filtering on the first character image based on the Gaussian kernels to obtain a second character image;
acquiring a preset first threshold, and judging the size relationship between the Gaussian kernel and the first threshold;
if the Gaussian kernel is larger than the first threshold value, marking the second character image obtained based on the Gaussian kernel filtering as unavailable;
and if the Gaussian kernel is less than or equal to the first threshold, marking the second character image obtained based on the Gaussian kernel filtering as available.
4. The method of any of claims 1-3, wherein the randomly generating a Gaussian kernel, Gaussian filtering and labeling the first textual image based on the Gaussian kernel, and after obtaining a second textual image, further comprises:
and adjusting the brightness and the contrast of the second character image.
5. The document sample generation method of claim 4, wherein the brightness and contrast adjustment of the second text image comprises:
acquiring a preset parameter value field, and randomly generating a plurality of groups of brightness and contrast adjusting parameters according to the parameter value field;
and calculating the pixel value of the adjusted second character image according to each group of brightness and contrast adjusting parameters.
6. The document sample generation method of any one of claims 1-3, wherein embedding the second text image in the document background image generates a document sample image, comprising:
acquiring a character distribution range on the certificate background image;
and embedding the second character image into a random position in the character distribution range to obtain a certificate sample image.
7. The document sample generation method of any one of claims 1-3, wherein before outputting all of the document sample images and their annotations, further comprising:
and performing data enhancement processing on the certificate sample image.
8. A credential sample generation device, comprising:
the generating unit is used for acquiring a certificate character set and generating a first character image without a background;
the filtering unit is used for randomly generating a Gaussian kernel, and carrying out Gaussian filtering and labeling on the first character image according to the Gaussian kernel to obtain a second character image;
an acquisition unit for acquiring a document background image;
the embedding unit is used for embedding the second character image into the certificate background image to generate a certificate sample image;
and the output unit is used for outputting all the certificate sample images and the labels thereof.
9. An electronic device, comprising a computer-readable storage medium storing a computer program and a processor, the computer program, when read and executed by the processor, implementing the credential sample generation method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when read and executed by a processor, implements the certificate sample generation method as claimed in any one of claims 1 to 7.
CN202111300632.6A 2021-11-04 2021-11-04 Certificate sample generation method and device, electronic equipment and storage medium Pending CN114140811A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111300632.6A CN114140811A (en) 2021-11-04 2021-11-04 Certificate sample generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111300632.6A CN114140811A (en) 2021-11-04 2021-11-04 Certificate sample generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114140811A true CN114140811A (en) 2022-03-04

Family

ID=80392352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111300632.6A Pending CN114140811A (en) 2021-11-04 2021-11-04 Certificate sample generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114140811A (en)

Similar Documents

Publication Publication Date Title
JP3881439B2 (en) Image processing device
CN110414519A (en) A kind of recognition methods of picture character and its identification device
US7525694B2 (en) Image processing device, image processing method, image processing program, and recording medium
US20150332607A1 (en) System for Producing Tactile Images
RU2010133657A (en) DEVICE, METHOD AND PROCESS FOR IMAGE PROCESSING
CN107172418A (en) A kind of tone scale map image quality evaluating method analyzed based on exposure status
CN110298353B (en) Character recognition method and system
JP2007166622A (en) Method for generating half-tone digital image, apparatus and computer program
JP2012199901A (en) Document modification detecting method by character comparison using character shape feature
JP2002185800A (en) Adaptive image enhancement filter and method for generating enhanced image data
CN102867180A (en) Gray character image normalization device and gray character image normalization method
CN108288064B (en) Method and device for generating pictures
CN113592776A (en) Image processing method and device, electronic device and storage medium
CN103530625A (en) Optical character recognition method based on digital image processing
JP4093413B2 (en) Image processing apparatus, image processing program, and recording medium recording the program
CN109741273A (en) A kind of mobile phone photograph low-quality images automatically process and methods of marking
US9338310B2 (en) Image processing apparatus and computer-readable medium for determining pixel value of a target area and converting the pixel value to a specified value of a target image data
CN111191716B (en) Method and device for classifying printed pictures
CN114140811A (en) Certificate sample generation method and device, electronic equipment and storage medium
Pangestu et al. Histogram equalization implementation in the preprocessing phase on optical character recognition
CN103716506A (en) Image processing device and computer-readable medium
JP2007079586A (en) Image processor
Montrucchio et al. Toner savings based on quasi-random sequences and a perceptual study for green printing
CN110298236A (en) A kind of braille automatic distinguishing method for image and system based on deep learning
JP2007079587A (en) Image processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination