CN114202762A - Handwritten sample generation method and device and application - Google Patents

Handwritten sample generation method and device and application Download PDF

Info

Publication number
CN114202762A
CN114202762A CN202210148735.3A CN202210148735A CN114202762A CN 114202762 A CN114202762 A CN 114202762A CN 202210148735 A CN202210148735 A CN 202210148735A CN 114202762 A CN114202762 A CN 114202762A
Authority
CN
China
Prior art keywords
handwritten
character
characters
target
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210148735.3A
Other languages
Chinese (zh)
Inventor
沈瑶
王国梁
毛云青
陈思瑶
葛俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCI China Co Ltd
Original Assignee
CCI China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCI China Co Ltd filed Critical CCI China Co Ltd
Priority to CN202210148735.3A priority Critical patent/CN114202762A/en
Publication of CN114202762A publication Critical patent/CN114202762A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application provides a method, a device and an application for generating a handwriting sample, wherein the method comprises the following steps: acquiring a handwritten character to be processed, and acquiring key points on a writing track boundary of the handwritten character to be processed; connecting the key points to convert the handwritten characters to be processed into a Bezier curve, and recording control points of the Bezier curve; applying random transformation conforming to Gaussian distribution to the key points and the control points to generate target handwritten characters; acquiring a background picture, wherein the background picture is generated based on background characters written by printing fonts; and filling the target handwritten character into the background picture to generate a handwritten sample. Aiming at the problem that the existing handwriting sample cannot be automatically and quickly generated, the method adopts a means of adding the randomness characteristics of character strokes to generate the character image simulating manual handwriting, and achieves the technical effect of automatically generating the handwriting data set.

Description

Handwritten sample generation method and device and application
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a method, an apparatus, and an application for generating a handwritten sample.
Background
In the deep learning field, in order to train a character detection algorithm or a character recognition algorithm, a large number of samples are generally required to be collected as a data set, wherein the samples refer to images containing handwritten characters, labeling the samples refers to position coordinates of the characters appearing in the images, and the contents of the characters generally include the handwritten characters and printed characters. The position coordinates of the handwritten content can be distinguished from the images by a sample training algorithm in a hope of algorithm, so that the mode of target detection or recognition is learned.
The conventional sample methods for acquiring handwritten characters include the following methods:
1. data (HCL 2000) issued by a mode recognition laboratory of Beijing post and telecommunications university, which is the largest off-line handwritten Chinese character library at present, is collected by adopting a method of manually writing Chinese characters, and contains 64x64 dot matrix images of 3755 Chinese characters, wherein the 1000-person handwriting is total.
2. Adopting handwriting font simulation: using handwritten fonts to simulate real-person writing, such as "relaxing hands -biological fonts," the fonts simulate handwritten text patterns.
3. The handwritten text is generated by learning features of the handwritten text using a deep learning generation countermeasure network (GAN).
The current method for collecting handwritten text samples comprises the following defects:
the manual writing method is adopted to consume a large amount of manpower and material resources for collection, and the sample making efficiency is low.
And each handwritten font can only imitate a certain writing style when the handwritten font is acquired by adopting a handwritten font simulating method, and once the font is determined, the writing style of each output character is fixed. In reality, the same person cannot write completely consistent characters even in two times, so that samples manufactured by the method are lack of diversification for algorithm learning.
In addition, the GAN network is trained by adopting GAN to simulate handwritten characters, which also needs a large number of handwritten data sets, and thus, a large number of resources are consumed to collect data, and the efficiency is low and the resource consumption is large.
In summary, an effective solution for achieving the purpose of generating a large number of handwritten samples is not obtained at present aiming at the problems of low efficiency, lack of diversification of samples, large resource consumption for sample collection and the like in the process of generating handwritten samples at present.
Disclosure of Invention
The embodiment of the application provides a method, a device and an application for generating a handwriting sample, aiming at the problem that the existing handwriting sample cannot be automatically and quickly generated, a method of adding random characteristics of character strokes is adopted to generate a character image simulating manual handwriting, and the technical effect of automatically generating a handwriting data set is achieved.
In a first aspect, an embodiment of the present application provides a method for generating a handwritten sample, where the method includes: acquiring a handwritten character to be processed, and acquiring key points on a writing track boundary of the handwritten character to be processed; connecting the key points to convert the handwritten characters to be processed into a Bezier curve, and recording control points of the Bezier curve; applying random transformation conforming to Gaussian distribution to the key points and the control points to generate target handwritten characters; acquiring a background picture, wherein the background picture is generated based on background characters written by printing fonts; and filling the target handwritten character into the background picture to generate a handwritten sample.
In some embodiments, applying a gaussian-distributed random transformation to the keypoints and the control points to generate the target handwritten character includes: and obtaining coordinate values of the key points and coordinate values of the control points, and randomly applying interference on the coordinate values in the coordinate axis direction to generate a target handwritten character, wherein the interference conforms to Gaussian distribution.
In some of these embodiments, prior to "generating a handwritten sample", the method further comprises: and applying a random transformation conforming to the Gaussian distribution to the word spacing and the line spacing of a plurality of the target handwritten characters.
In some of these embodiments, the method further comprises: and recording coordinate values and state values of the key points and the control points, wherein the state values comprise states of points corresponding to the coordinate values on the Bezier curve, and distinguishing the key points and the control points based on the state values.
In some embodiments, the state value is used to distinguish whether the point corresponding to the coordinate value is on or outside a bezier curve; wherein a point on the bezier curve is the key point and a point outside the bezier curve is the control point.
In some of these embodiments, "populating the target handwritten character onto a background picture" includes: and outputting the target handwritten character to a preset canvas to obtain a handwritten image bitmap, and filling the handwritten image bitmap on a background picture after transparentizing the handwritten image bitmap.
In some embodiments, the target handwritten character is output to a preset canvas to obtain a handwritten image bitmap, and the handwritten image bitmap is enlarged or reduced to the size of the handwritten character to be processed and then filled in a background picture.
In a second aspect, an embodiment of the present application provides a handwriting sample generation apparatus, including: the key point acquisition module is used for acquiring a handwritten character to be processed and acquiring key points on the boundary of a writing track of the handwritten character to be processed; the control point acquisition module is used for connecting the key points, converting the handwritten characters to be processed into a Bezier curve and recording control points of the Bezier curve; the font processing module is used for applying random transformation conforming to Gaussian distribution to the key points and the control points to generate target handwritten characters; the background generation module is used for acquiring a background picture, wherein the background picture is generated based on background characters written by printing fonts; and the sample generation module is used for filling the target handwritten characters into the background picture to generate a handwritten sample.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for generating handwritten samples according to any one of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer program product comprising software code portions for performing the method for generating handwritten samples according to any one of the first aspect, when the computer program product is run on a computer.
In a fifth aspect, the present application provides a readable storage medium, in which a computer program is stored, where the computer program includes program code for controlling a process to execute a process, where the process includes the handwriting sample generation method according to any one of the first aspect.
The main contributions and innovation points of the embodiment of the application are as follows:
according to the method and the device, the existing handwritten characters are converted into the outline data, the font of the existing handwritten characters is changed by applying disturbance to the outline data, the target handwritten characters are automatically generated, and the handwritten samples are made through the target handwritten characters, so that the beneficial effect that the handwritten samples can be automatically, massively and efficiently generated aiming at the existing handwritten characters is achieved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of main steps of a handwriting sample generation method according to a first embodiment of the present application.
FIG. 2 is a schematic diagram of a character depicted using a combination of Bezier curves and straight lines according to a first embodiment of the present application;
fig. 3 is a handwriting sample effect display diagram according to the first embodiment of the present application.
Fig. 4 is a flowchart of a handwritten sample generation method according to a second embodiment of the application.
Fig. 5 is a block diagram of a handwriting sample generation apparatus according to a third embodiment of the present application.
Fig. 6 is a schematic hardware configuration diagram of an electronic device according to a fourth embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
The method aims to generate a handwriting sample, namely, a plurality of target handwritten characters which imitate the random characteristics of human writing are generated in batch by processing the existing handwritten characters. When model training is carried out, the model training is insufficient due to too small sample amount, and the feature action mode is unreasonable, so that training samples need to be supplemented quickly to enable the action mode of the model to be more reasonable. Therefore, using the generated target handwriting samples as input to the model enables training of the model so that the model learns the pattern of target detection or recognition.
Fig. 1 is a flowchart of main steps of a handwriting sample generation method according to a first embodiment of the present application.
To achieve this, as shown in fig. 1, the handwritten sample generation method mainly includes steps S101 to S104 as follows.
Step S101, obtaining a handwritten character to be processed, and obtaining key points on the writing track boundary of the handwritten character to be processed.
In step S101, the handwritten character to be processed includes an existing handwritten character, that is, a handwritten character obtained from an existing font library. For example, handwritten characters and the like are extracted from data released by the Beijing post and telecommunications university pattern recognition laboratory. The handwritten characters to be processed may include handwritten Chinese characters, or handwritten letters, or handwritten symbols, etc. The number of handwritten characters to be processed may be one, for example "C", or a plurality, for example "wangming signature".
And after the handwritten character to be processed is obtained, extracting points on the boundary of the writing track of the handwritten character. It should be noted that the point of the writing trace boundary refers to a point on the outer edge of the writing trace obtained along the outer edge of the writing trace, that is, the key point is not inside the writing trace but on the boundary of the writing trace. As shown in fig. 2, there are a plurality of solid points such as solid point 1, solid point 5, etc. on the inner and outer boundaries of "C", which are called key points.
Aiming at the step S101, the present solution extracts the writing trace from the existing handwritten character, and obtains the key points on the boundary along the boundary of the writing trace, unlike the prior art that a large number of input samples are required when the GAN is used to generate the samples, the present solution can process a small number of existing handwritten characters, and the number and the true degree of the batch generated handwritten samples are not affected by the number of the existing handwritten characters, so the generation efficiency of the samples is higher.
And S102, connecting the key points, converting the handwritten characters to be processed into a Bezier curve, and recording control points of the Bezier curve.
In step S101, control points on the writing trajectory boundary are obtained by the bezier curve algorithm. Specifically, in describing "C" in fig. 2, in addition to knowing the coordinates of each point, it is also necessary to know how to contour those points, and bezier curves can be used to describe the boundaries of "C". The "C" is represented by a plurality of basel curves and a straight line, wherein the two ends of the straight line are provided with solid points, for example, 0-1 is a straight line, and the solid point 0 and the solid point 1 are on the line. Although not labeled in fig. 2, there is a solid point between points 2-3 and 3-4 in the figure, which are indicated as 2.5 and 3.5, and there are three second-order bezier curves in points 1-5 in fig. 2, which are 0-2.5, 2.5-3.5 and 3.5-5, respectively, and the control points in the bezier curves are empty points, i.e., points 2, 3 and 4 in points 1-5 are control points. That is, in this step, the character is converted into a bezier curve, and then control points are acquired on the bezier curve, thereby obtaining points that can control the change of the font style.
In this step, the control points of the bezier curve may also be obtained from a font library storing TrueType fonts. Specifically, the TrueType font adopts a quadratic bezier curve and a straight line to describe the outline of the font, and a glyf table stored in the TrueType font file format records the outline of each font in a font library, including the coordinates of key points and control points. And selecting any truetype font from the font library, and acquiring outline data of the handwritten Chinese characters from the glyf table, so that the time for converting the handwritten Chinese characters is reduced. In the present invention, the font library storing TrueType fonts stores characters, and when the characters need to be converted, the converted outline data can be directly acquired from the font library. When the handwritten symbol needs to be converted, the means of steps S101-102 of the present scheme may be adopted to convert the handwritten symbol into a representation represented by bezier curves and straight lines.
Aiming at the step S102, the method converts the handwritten character to be processed into a bezier curve and a straight line for representation, wherein two ends of the straight line are two key points, two ends of the bezier curve are two key points, and the middle of the bezier curve includes a control point. The control points in the Bezier curve are obtained in the step, so that interference is applied to the control points in the subsequent steps, and the font style of the handwritten font is controlled. Different from the mode of manually writing Chinese characters in the prior art, the scheme can automatically convert the handwritten characters to be processed in batches, and the converted handwritten characters are vector fonts, namely, the definition of the characters represented by the Basel curve obtained by conversion is not influenced no matter how the pixels of the images are, so that the problem of model training caused by low pixel of the handwritten sample is avoided.
And S103, applying random transformation conforming to Gaussian distribution to the key points and the control points to generate target handwritten characters.
In step S103, interference is applied to the key points and the control points so that the handwritten character to be processed can be converted into a target handwritten character. Specifically, for each character, after the stroke sequence of the character is extracted from the font, each stroke corresponds to the outline data, and random characteristics of a human in character handwriting are simulated for the random disturbance of key points and control points in the outline data.
The difference of the step is that in the target handwritten characters generated for multiple times, even if the same character in the same font is generated, the results generated for two times are different, and the situation that the characters are not completely the same when people write is met. The interference applied in the scheme is random disturbance conforming to Gaussian distribution, namely, when the target handwritten characters are repeatedly generated on a large number of random variables, the distribution of the generated target handwritten characters is comprehensively close to normality, and by the steps, a large number of target handwritten characters which simulate handwriting of a human can be generated in batches aiming at the existing handwritten characters, so that the problem of low efficiency caused by the fact that the handwritten characters in the handwritten sample need to be manually written in the prior art is solved.
Aiming at the step S103, the method has the advantages that the same character which is handwritten for a plurality of times when the human hand writing is simulated by applying disturbance to the key points and the control points can be different in the character form, and the problem of low efficiency of the means for generating the handwritten character in the prior art is solved by applying disturbance to the existing handwritten character to generate a plurality of target handwritten characters in batch.
And step S104, obtaining a background picture, wherein the background picture is generated based on background characters written by printing fonts.
Specifically, the printed word can play an interference role in training the model, so that the model can only extract the target handwritten character from the picture, and therefore, when a handwritten sample is made, the printed word is used as a background to generate a background picture.
And step S105, filling the target handwritten characters into a background picture to generate a handwritten sample.
As shown in fig. 3, the generated handwriting sample includes a plurality of sections of print forms and handwritten "wangming signatures" embedded in the characters of the print forms, each "wangming signature" has different font styles and character pitches, and each character is arranged in different height.
In the present scheme, the purpose of filling the target handwritten character into the background picture is to obtain a handwritten sample as shown in fig. 3, and a means of overlapping a layer where the target handwritten character is located and a layer where the background picture is located may be adopted.
In another embodiment, the step presets placeholders in the background picture and records the positions of the placeholders, wherein the placeholders are used for determining the positions to be filled by the target handwritten character. The handwritten word is filled into the placeholder. Specifically, non-handwritten characters, such as print characters, may be included in the background picture, and other parts except for the positions occupied by the print characters may be referred to as placeholders. If the shielding of the print characters on the bottom layer is required to be avoided as much as possible when the target handwritten characters are filled in the background picture, the shielding of the characters is avoided by filling the target handwritten characters into the placeholders.
In another embodiment, the placeholder may also be randomly arranged at any position of the background picture, for example, arranged on the print character to overlap with the print character, or arranged around the print character, so that after the target written character is filled into the placeholder, there may be a case that the print character and the handwritten character are overlapped, or a case that the print character and the handwritten character are dispersed at different positions of the background picture, which brings diversity to the sample.
It should be noted that, in an embodiment, the position where the background picture needs to be inserted into the target handwritten character may be directly referred to as a placeholder. That is, the placeholder is used for explaining the position filled by the target handwritten character, and in actual operation, the placeholder may not be additionally arranged, and the placeholder of the background picture is the target insertion position of the background picture.
In step S105, the target handwritten characters generated in batch are filled in the background picture to obtain a handwritten sample, and the model can be trained through the handwritten sample, so that the expected model can recognize the target handwritten characters in the background picture, and thus learn a target detection or recognition mode.
In summary, in the above steps S101 to S105, the present solution converts the existing handwritten character into the outline data, changes the font of the existing handwritten character by applying disturbance to the outline data, automatically generates a plurality of target handwritten characters, and creates a handwritten sample from the target handwritten character, thereby achieving the beneficial effect that the handwritten sample can be automatically, batch-wise, and efficiently generated for the existing handwritten character.
In one possible embodiment, the step of generating the target handwritten character by applying a gaussian-distributed random transformation to the key points and the control points comprises: and obtaining coordinate values of the key points and coordinate values of the control points, and randomly applying interference on the coordinate values in the coordinate axis direction to generate a target handwritten character, wherein the interference conforms to Gaussian distribution.
In this embodiment, the coordinate values are used to represent the specific positions of the key points and the control points, and specifically, when representing the positions of the points, the relative positions of the points may be used, for example, the key point 10 is directly below the key point 9 in fig. 2. In this embodiment, the absolute position of the point is used to record the position of the point, and the absolute position of the point can be represented by the coordinate value of each point on the coordinate system after the coordinate system is established. The coordinate value is expressed as (x, y) and includes a value on the x-axis and a value on the y-axis, and applying interference to the coordinate value means changing the value on the x-axis or the value on the y-axis under the constraint that the resulting value conforms to the gaussian distribution to generate a new coordinate value (x ', y').
Aiming at the embodiment, the positions of the key points and the control points are expressed by coordinate values, the coordinate values are interfered on coordinate axes to disturb the value of the coordinate of the point on an x axis or the value on a y axis, new coordinate values of the key points and the new coordinate values of the control points are generated, and the positions of the Bezier curve and the positions of straight lines are confirmed again by the coordinate values to construct new target handwritten characters.
In this implementation, the following method may also be included: and recording coordinate values and state values of the key points and the control points, wherein the state values comprise states of points corresponding to the coordinate values on the Bezier curve, and distinguishing the key points and the control points based on the state values.
Furthermore, the state value is used for distinguishing whether the point corresponding to the coordinate value is on the Bezier curve or outside the Bezier curve; wherein a point on the bezier curve is the key point and a point outside the bezier curve is the control point.
Specifically, in fig. 2, an arc is defined by a point 1 on the curve, a point 2 on the non-curve, a point 3 on the non-curve, a point 4 on the non-curve, and a point 5 on the curve, and a point 0 on the curve, a point 1 on the curve, and a point 2 on the non-curve can be used to determine that the point 1 is the starting point of the bezier curve. It can be seen that if only the key points and the control points are stored to represent the bezier curve, the states of the points on the bezier curve need to be recorded to distinguish the starting point of each bezier curve. The key points and the control points are thus distinguished in the present embodiment by recording the state values.
For the above embodiment, the present solution can only store the coordinate values and the state values of the points, and does not need to store the data information corresponding to each point, so that the storage resources can be saved. The specific position of each point can be determined through the coordinate values, the state of each point on the curve can be represented through the state values, and the coordinate values and the state values can represent the outline data of each character.
In one possible embodiment, before "generating a handwritten sample", the method further comprises: and applying a random transformation conforming to the Gaussian distribution to the word spacing and the line spacing of a plurality of the target handwritten characters.
Specifically, in addition to being able to interfere with the glyph itself, the present embodiment may also interfere with the inter-word space between two words and the inter-line space between two lines of words to construct a new target handwritten character. Specifically, in the present embodiment, the line spacing and the word spacing of each time when a person writes multiple lines of characters are simulated to have different random characteristics, and the word spacing and the line spacing of multiple target handwritten characters are obtained by applying transformation, so that the generated handwriting samples are made according to randomness and diversity.
In one possible embodiment, "filling the target handwritten character onto a background picture" includes: and outputting the target handwritten character to a preset canvas to obtain a handwritten image bitmap, and filling the handwritten image bitmap on a background picture after transparentizing the handwritten image bitmap.
Specifically, the generated target handwritten character may be directly output on the "first background map bitmap", or the content to be output may be written on the "canvas" of the buffer area and output to the "first background map bitmap" again. Directly drawing/outputting anything on the background picture bitmap can cause the image to flicker endlessly, so the embodiment firstly buffers on the canvas without causing the problem of the image flickering endlessly, and the efficiency is higher.
In addition, the second background image obtained by combining the handwritten image bitmap and the background image can also be added with background image colors randomly, so that the diversity of the sample and the identification difficulty of the model are increased.
In this embodiment, the following method may be further included: and outputting the target handwritten character to a preset canvas to obtain a handwritten image bitmap, and filling the handwritten image bitmap on a background picture after the handwritten image bitmap is amplified or reduced to the size of the handwritten character to be processed.
Specifically, the size of the handwritten character to be processed is enlarged from the original size to a preset size, the handwritten character to be processed with the preset size is disturbed, the disturbance comprises disturbance on the character itself or the character line spacing or the character spacing, the size is reduced from the preset size to the original size after the conversion is completed, the target handwritten character is obtained, the target handwritten character is filled on a canvas, a handwriting bitmap is generated, the handwriting bitmap is combined with a first background image after being transparentized, a second background image is obtained, the purpose of transparentizing the handwriting bitmap in the embodiment is to display the first background image on the bottom layer, and the first background image refers to an image with different colors and different printed word patterns.
In summary, an embodiment of the present application provides a handwriting sample generation method, which aims to reduce the workload of acquiring a handwriting sample, and to achieve the purpose, the method acquires a print word to perform background map creation, converts an existing character into outline data, interferes with the outline data, draws a vector bitmap, synthesizes the background map and the vector bitmap, and obtains the handwriting sample through background interference. Compared with the prior art, the number and the true degree of the hand-written samples generated by the scheme can not be influenced by the number of the existing hand-written characters, and the scheme can simulate the random characteristics of a person when the person writes the same character with the same font, so that the generated hand-written samples have more authenticity and diversity. And this scheme still has the beneficial effect of batch, automatic generation handwritten sample, compares in the manual work sample, and efficiency is higher. The following is a detailed description of the implementation of the present solution with a specific example.
Taking the generation of the chinese character sample as an example, the present embodiment mainly includes steps S20-S80, as shown in fig. 4, which specifically includes:
step S20, collecting characters:
all the characters (about 3000 Chinese characters) in the primary Chinese character library are collected and recorded into a List List.
Step S30, writing a random function:
calling a random function to obtain 1-10 characters from the List, randomly selecting one of fonts such as a black body, a Song body and a regular body, and randomly selecting the font size of 15-35;
the line spacing was set to 20 and the print was written line by line on 1280 × 720 canvas using a black brush as the background, the canvas was used as the background image, named first background map.
Step S40, drawing handwritten characters:
s41, calling a random function, selecting characters, word sizes and fonts to be output, wherein the fonts are set as loose handwriting, and the width w and the height h of the output section of characters are calculated according to the word sizes;
s42, generating a canvas (h and w are respectively enlarged to 100 pixels) which is larger than h and w and is used for drawing the characters obtained in the List; the purpose of the magnification is to make some slight shifts and changes to the font shape;
s43, reading and obtaining outline data of each character from the glaph table of ttf font file of TrueType, where the outline data can be represented by a set of continuous (x, y) coordinate values and a state value, applying gaussian distribution to each (x, y) to generate new coordinate values (x ', y') = gauss (random, (x, y), ∂), where random is a random number generator, ∂ is an input offset (standard deviation), and gauss makes the random generation value conform to gaussian distribution; after changing, the contour value of each character is disturbed, random characteristics of human during writing are simulated, and changed characters are obtained;
s44, except for the change of single characters, the interference of font size is added randomly when the Chinese characters are drawn, the interference of the space between characters is added randomly among single lines, and the interference of the space between lines is added randomly among different lines;
s45, the handwritten characters conforming to the above rules are output one by one onto the canvas and stored as a bitmap of the handwritten image.
Step S50, synthesizing a second background map:
and reducing the bitmap from 100 pixels to h w, overlapping the bitmap in the first background image, and displaying the bitmap to obtain a second background image.
Step S60, adding a background color:
and randomly adding background image colors on the second background image.
Step S70, generating a handwriting sample:
and saving the image coordinates as an annotation file, saving a bitmap of a second background image, and finishing the manufacture of a sample.
Step S80, creating a handwritten character data set:
and repeating the steps to generate a certain number of samples to obtain a handwritten character data set, and taking the handwritten character data set as the input of the deep learning network.
Fig. 5 is a block diagram of a structure of a handwriting sample generation apparatus according to a third embodiment of the present application.
As shown in fig. 5, an embodiment of the present application proposes a handwriting sample generation apparatus, including:
and a key point obtaining module 501, configured to obtain a handwritten character to be processed, and obtain a key point on a writing trajectory boundary of the handwritten character to be processed.
A control point obtaining module 502, configured to connect the key point to convert the handwritten character to be processed into a bezier curve, and record a control point of the bezier curve.
And a font processing module 503, configured to apply random transformation conforming to gaussian distribution to the key points and the control points to generate target handwritten characters.
The background generating module 504 is configured to obtain a background picture, where the background picture is generated based on background characters written in a printing font.
And a sample generating module 505, configured to fill the target handwritten character onto the background picture, and generate a handwritten sample.
Fig. 6 is a schematic hardware configuration diagram of an electronic device according to a fourth embodiment of the present application.
As shown in fig. 6, the electronic device according to an embodiment of the present application includes a memory 604 and a processor 602, where the memory 604 stores a computer program, and the processor 602 is configured to execute the computer program to perform the steps in any of the method embodiments described above.
Specifically, the processor 602 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
Memory 604 may include, among other things, mass storage 604 for data or instructions. By way of example, and not limitation, memory 604 may include a hard disk drive (hard disk drive, HDD for short), a floppy disk drive, a solid state drive (SSD for short), flash memory, an optical disk, a magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 604 may include removable or non-removable (or fixed) media, where appropriate. The memory 604 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 604 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 604 includes Read-only memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a static random-access memory (SRAM) or a dynamic random-access memory (DRAM), where the DRAM may be a fast page mode dynamic random-access memory 604 (FPMDRAM), an extended data output dynamic random-access memory (EDODRAM), a synchronous dynamic random-access memory (SDRAM), or the like.
The memory 604 may be used to store or cache various data files for processing and/or communication purposes, as well as possibly computer program instructions for execution by the processor 602.
The processor 602 may implement any one of the above-described handwriting sample generation methods in the embodiments by reading and executing computer program instructions stored in the memory 604.
Optionally, the electronic apparatus may further include a transmission device 606 and an input/output device 608, where the transmission device 606 is connected to the processor 602, and the input/output device 608 is connected to the processor 602.
The transmitting device 606 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 606 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input/output device 608 is used for inputting or outputting information. In the present embodiment, the input information may be printed words, handwritten words, or the like, and the output information may be a handwritten pattern or the like.
Optionally, in this embodiment, the processor 602 may be configured to execute the following steps by a computer program:
s101, obtaining a handwritten character to be processed, and obtaining key points on the boundary of a writing track of the handwritten character to be processed.
And S102, connecting the key points, converting the handwritten characters to be processed into a Bezier curve, and recording control points of the Bezier curve.
S103, random transformation conforming to Gaussian distribution is applied to the key points and the control points to generate target handwritten characters.
S104, obtaining a background picture, wherein the background picture is generated based on background characters written by printing fonts.
And S105, filling the target handwritten characters into the background picture to generate a handwritten sample.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (11)

1. A method for generating a handwritten sample, comprising the steps of:
acquiring a handwritten character to be processed, and acquiring key points on a writing track boundary of the handwritten character to be processed;
connecting the key points to convert the handwritten characters to be processed into a Bezier curve, and recording control points of the Bezier curve;
applying random transformation conforming to Gaussian distribution to the key points and the control points to generate target handwritten characters;
acquiring a background picture, wherein the background picture is generated based on background characters written by printing fonts;
and filling the target handwritten character into the background picture to generate a handwritten sample.
2. The handwriting sample generation method according to claim 1, wherein said generating a target handwritten character by applying a random transformation conforming to a gaussian distribution to said key points and said control points comprises:
and obtaining coordinate values of the key points and coordinate values of the control points, and randomly applying interference on the coordinate values in the coordinate axis direction to generate a target handwritten character, wherein the interference conforms to Gaussian distribution.
3. The method of generating handwritten samples according to claim 1, characterized in that before "generating handwritten samples", the method further comprises:
and applying a random transformation conforming to the Gaussian distribution to the word spacing and the line spacing of a plurality of the target handwritten characters.
4. The method of generating handwritten samples according to claim 2, characterized in that said method further comprises:
and recording coordinate values and state values of the key points and the control points, wherein the state values comprise states of points corresponding to the coordinate values on the Bezier curve, and distinguishing the key points and the control points based on the state values.
5. The handwritten pattern generation method according to claim 4, wherein said state value is used to distinguish whether the point corresponding to said coordinate value is on or outside a Bezier curve; wherein the content of the first and second substances,
a point on the bezier curve is the key point and a point outside the bezier curve is the control point.
6. The handwriting sample generation method of claim 1, wherein "filling the target handwritten character onto a background picture" comprises:
and outputting the target handwritten character to a preset canvas to obtain a handwritten image bitmap, and filling the handwritten image bitmap on a background picture after transparentizing the handwritten image bitmap.
7. The handwriting sample generation method according to claim 1, wherein the target handwritten character is output to a preset canvas to obtain a handwritten image bitmap, and the handwritten image bitmap is enlarged or reduced to the size of the handwritten character to be processed and then filled in a background image.
8. A handwritten sample generation apparatus, comprising:
the key point acquisition module is used for acquiring a handwritten character to be processed and acquiring key points on the boundary of a writing track of the handwritten character to be processed;
the control point acquisition module is used for connecting the key points, converting the handwritten characters to be processed into a Bezier curve and recording control points of the Bezier curve;
the font processing module is used for applying random transformation conforming to Gaussian distribution to the key points and the control points to generate target handwritten characters;
the background generation module is used for acquiring a background picture, wherein the background picture is generated based on background characters written by printing fonts;
and the sample generation module is used for filling the target handwritten characters into the background picture to generate a handwritten sample.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to run the computer program to perform the method of generating handwritten samples according to any of claims 1 to 7.
10. A computer program product, characterized in that it comprises software code portions for performing the method of generating handwritten samples according to any one of claims 1 to 7, when the computer program product is run on a computer.
11. A readable storage medium, characterized in that a computer program is stored in the readable storage medium, the computer program comprising program code for controlling a process to execute a process, the process comprising the method of generating handwritten samples according to any of claims 1 to 7.
CN202210148735.3A 2022-02-18 2022-02-18 Handwritten sample generation method and device and application Pending CN114202762A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210148735.3A CN114202762A (en) 2022-02-18 2022-02-18 Handwritten sample generation method and device and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210148735.3A CN114202762A (en) 2022-02-18 2022-02-18 Handwritten sample generation method and device and application

Publications (1)

Publication Number Publication Date
CN114202762A true CN114202762A (en) 2022-03-18

Family

ID=80645667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210148735.3A Pending CN114202762A (en) 2022-02-18 2022-02-18 Handwritten sample generation method and device and application

Country Status (1)

Country Link
CN (1) CN114202762A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115859907A (en) * 2023-02-20 2023-03-28 深圳市英唐数码科技有限公司 Reading annotation zooming display method, system and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522975A (en) * 2018-09-18 2019-03-26 平安科技(深圳)有限公司 Handwriting samples generation method, device, computer equipment and storage medium
CN111612871A (en) * 2020-04-09 2020-09-01 北京旷视科技有限公司 Handwritten sample generation method and device, computer equipment and storage medium
CN112990205A (en) * 2021-05-11 2021-06-18 创新奇智(北京)科技有限公司 Method and device for generating handwritten character sample, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522975A (en) * 2018-09-18 2019-03-26 平安科技(深圳)有限公司 Handwriting samples generation method, device, computer equipment and storage medium
CN111612871A (en) * 2020-04-09 2020-09-01 北京旷视科技有限公司 Handwritten sample generation method and device, computer equipment and storage medium
CN112990205A (en) * 2021-05-11 2021-06-18 创新奇智(北京)科技有限公司 Method and device for generating handwritten character sample, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUN DONG 等,: "The Creation Process of Chinese Calligraphy and Emulation of Imagery Thinking", 《IEEE INTELLIGENT SYSTEMS》 *
陈光 等,: "基于余弦整形变换的手写汉字训练样本生成方法", 《北京邮电大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115859907A (en) * 2023-02-20 2023-03-28 深圳市英唐数码科技有限公司 Reading annotation zooming display method, system and readable storage medium
CN115859907B (en) * 2023-02-20 2023-06-16 深圳市英唐数码科技有限公司 Reading annotation zoom display method, system and readable storage medium

Similar Documents

Publication Publication Date Title
US20190304066A1 (en) Synthesis method of chinese printed character images and device thereof
CN100511225C (en) Translated document image production device and translated document image production method
RU2394268C2 (en) Simplification of symbols to allow eligibility
CN103488711B (en) A kind of method and system of quick Fabrication vector font library
CN111291629A (en) Method and device for recognizing text in image, computer equipment and computer storage medium
CN110969681B (en) Handwriting word generation method based on GAN network
JP7303844B2 (en) DATA EXTENSION SYSTEM, DATA EXTENSION METHOD, AND PROGRAM
US20210406615A1 (en) Learning device, classification device, learning method, classification method, learning program, and classification program
CN111310156B (en) Automatic identification method and system for slider verification code
CN111310155B (en) System architecture for automatic identification of slider verification code and implementation method
CN112507260A (en) Webpage loading method and device, electronic equipment and computer readable storage medium
CN110414523A (en) A kind of identity card recognition method, device, equipment and storage medium
US9159147B2 (en) Method and apparatus for personalized handwriting avatar
JP2019028094A (en) Character generation device, program and character output device
CN110399760A (en) A kind of batch two dimensional code localization method, device, electronic equipment and storage medium
CN108319578B (en) Method for generating medium for data recording
CN112347288A (en) Character and picture vectorization method
CN114202762A (en) Handwritten sample generation method and device and application
CN114332895A (en) Text image synthesis method, text image synthesis device, text image synthesis equipment, storage medium and program product
WO2023284670A1 (en) Construction method and apparatus for graphic code extraction model, identification method and apparatus, and device and medium
CN113095058B (en) Method and device for processing page turning of streaming document, electronic equipment and storage medium
CN112836467B (en) Image processing method and device
CN112183019B (en) Display method, computing equipment and computer storage medium of electronic book handwritten notes
CN115331236A (en) Method and device for generating handwriting whole-line sample
CN113936187A (en) Text image synthesis method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220318

RJ01 Rejection of invention patent application after publication