CN115331236A - Method and device for generating handwriting whole-line sample - Google Patents

Method and device for generating handwriting whole-line sample Download PDF

Info

Publication number
CN115331236A
CN115331236A CN202210688488.6A CN202210688488A CN115331236A CN 115331236 A CN115331236 A CN 115331236A CN 202210688488 A CN202210688488 A CN 202210688488A CN 115331236 A CN115331236 A CN 115331236A
Authority
CN
China
Prior art keywords
handwritten
character
line
generating
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210688488.6A
Other languages
Chinese (zh)
Inventor
朱军民
王勇
沈达伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yidao Boshi Technology Co ltd
Original Assignee
Beijing Yidao Boshi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yidao Boshi Technology Co ltd filed Critical Beijing Yidao Boshi Technology Co ltd
Priority to CN202210688488.6A priority Critical patent/CN115331236A/en
Publication of CN115331236A publication Critical patent/CN115331236A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a method and a device for generating a handwritten whole-line sample, and belongs to the field of automatic image sample generation methods. Learning the handwriting style, learning the handwriting style of a user through a neural network, and establishing a handwriting style migration model library; randomly generating handwritten character images, randomly selecting a print text and a handwritten style migration model, and generating handwritten character images of all characters of the selected print text; and synthesizing the whole line of handwritten character sample images, and combining the handwritten character images of all the characters of the generated selected print form text to obtain the whole line of handwritten character sample images. The technical scheme of the invention can simulate the whole line of samples written manually to the maximum extent. But the program can learn the writing style of the character set without writing the complete character set by individuals and only needing to write a few character subsets (1-5%) by individuals, thereby automatically generating single characters in other character sets which are not written.

Description

Method and device for generating handwriting whole-line sample
Technical Field
The invention relates to the field of automatic generation methods of image samples, in particular to a generation method and a device of an offline handwritten whole-line OCR recognition training sample.
Background
In the OCR recognition field, the development of random technology has achieved a higher accuracy level for the character recognition of the printed form, and the single character recognition rate reaches more than 99.5%; however, the recognition of handwritten characters is far from a practical level, and the main bottleneck is that the handwritten samples are difficult to obtain, and training samples of the order of magnitude like a printed matter cannot be obtained. The collection work of the writing samples mainly depends on the manual handwriting mode for obtaining, and the speed and the number of the samples are greatly limited. The print sample can be generated by a font file, a corpus and a background image, and various samples can be continuously obtained.
Therefore, it is desirable to provide a sample generation method for full-line handwriting.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and an apparatus for generating a handwriting whole-line sample, which can simulate a whole-line sample of manual writing to the maximum extent. But the program can learn the writing style of the character set without writing the complete character set by individuals and only needing to write a few character subsets (1-5%) by individuals, thereby automatically generating single characters in other character sets which are not written.
According to a first aspect of the present invention, there is provided a method for generating a handwritten whole line sample, the method including the steps of:
step 1: learning the handwriting style, learning the handwriting style of a user through a neural network, and establishing a handwriting style migration model library;
step 2: randomly generating handwritten character images, randomly selecting a print text and a handwritten style migration model, inputting the selected print text into the handwritten style migration model, and generating handwritten character images of all characters of the selected print text;
and step 3: and synthesizing the whole line of handwritten character sample images, and combining the handwritten character images of all the characters of the generated selected print form text to obtain the whole line of handwritten character sample images.
Further, the step 1 specifically includes:
step 11: collecting and marking handwriting of a user, cutting out all characters and storing;
step 12: and learning the handwriting style of the user through a neural network according to all the stored characters to obtain a handwriting style migration model of the user, collecting the handwriting style migration models of all the users, and establishing a handwriting style migration model library.
Further, in step 11, labeling the handwriting of the user specifically includes locating coordinates and codes of each handwritten character.
Further, in the step 11, all the characters cut out are stored according to the codes.
Further, the step 2 specifically includes:
step 21: randomly selecting a line of print text from a corpus, and generating print images of all characters in sequence;
step 22: establishing a background image library, and randomly selecting one background image as a background image;
step 23: randomly selecting a handwriting style migration model from the handwriting style migration model library;
step 24: and inputting each character image of the print form text line into a randomly selected handwriting style migration model to generate a handwriting character image of each character corresponding to the print form text line.
Further, in the step 24,
if the randomly selected handwritten style migration model has parameter control, randomizing the parameters, generating handwritten character images of all characters of the print text line in real time, and enabling the handwritten character images of all the characters to be different;
if the randomly selected handwritten font style migration model does not have parameter control, the handwritten character images of all characters in the character set can be pre-generated in advance, and index query can be directly carried out according to character codes in the step.
Further, the step 3 specifically includes:
step 31: generating random parameters related to character rendering;
step 32: rendering each handwritten character image sequence to the randomly selected background image according to random parameters to generate a fusion image;
step 33: clipping a handwritten text line sample image from the fused image;
step 34: and outputting the generated whole line handwritten character sample image and a true value, wherein the true value is text line coding.
Further, the step 31 specifically includes:
step 311: generating a character position center curve;
step 312: randomly generating intervals among the characters;
step 313: the character size is randomly generated.
Further, the step 311 specifically includes:
on a circle with the radius of R, rotating N times at a random angular speed omega, randomly sampling at random intervals, and obtaining N continuous points on the curve as offset position sampling points, wherein N is a positive integer and is equal to the number of handwritten characters;
and shifting the center points of the N handwritten characters according to the coordinates of the N shifting position sampling points.
Further, the step 312 specifically includes:
the space between the characters is randomly generated, and satisfies Gaussian distribution:
μ=CharHeight/7;
σ=CharHeight/3,
wherein CharHeight is the character height.
Further, step 313 specifically includes:
randomly generating character height Newheight, and randomly generating aspect ratio AspectRatio, wherein the character width NewWidth is as follows: newWidth = NewHeight/AspectRatio;
and calculating each character to obtain the width and height parameters of each character.
Further, in the step 32, the step of,
when the pixel value of the handwritten character image is larger than a threshold value T, the handwritten character image is a background pixel and is directly filtered out during rendering;
and when the pixel value of the handwritten character image is less than or equal to the threshold value T, fusing the pixel value with the background image to highlight the pixel value of the handwritten character image.
Further, the threshold T has a value range [1,255], preferably a value greater than 128, and more preferably a value of 220.
Further, the step 33 specifically includes:
and synthesizing all the handwritten character rectangular boxes in the fusion image into a text line external rectangular box and cutting to obtain a handwritten text line sample image.
Further, after the step 33, a step of performing an image transformation process on the sample image of the handwritten text line is also included, where the image transformation process includes, but is not limited to, image gray level re-transformation, image enhancement, noise increase, and image deformation (cv 2. Remap).
Further, the step 34 specifically includes:
and outputting the handwritten text line sample image and the codes of the corresponding handwritten text line character strings to obtain samples required in the later period, wherein the samples comprise the whole line handwritten character sample image and truth values, namely text line codes.
According to a second aspect of the present invention there is provided a handwritten whole line sample generating device, said device operating in accordance with a method as provided in any of the preceding aspects, said device comprising:
the handwriting style learning unit is used for learning the handwriting style of the user through a neural network and establishing a handwriting style migration model library;
a unit for randomly generating handwritten character images, which is used for randomly selecting a print text, a background and a handwritten style migration model and generating handwritten character images of each character of the selected print text;
and the whole line handwritten character sample image synthesizing unit is used for combining the handwritten character images for generating each character of the selected print text to obtain a whole line handwritten character sample image.
According to a third aspect of the present invention, there is provided a handwriting whole line sample generation system, the system comprising: a processor and a memory for storing executable instructions; wherein the processor is configured to execute the executable instructions to perform the handwriting whole line sample generation method of any of the above aspects.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a method for generating handwritten whole-line samples as described in any of the above aspects.
The invention has the beneficial effects that:
the invention provides a method for generating a handwritten whole-line sample by using a handwritten font style migration scheme and adding a dynamic parameter generation and image processing mode.
1. The verisimilitude is as follows: the method comprises the steps of generating samples, wherein the samples needing to be generated are as vivid as possible, and the whole line of hand-written images generated by the method can reach the degree of falseness and falseness;
2. high efficiency: the writer with the handwriting style only needs to write a certain number of samples according to requirements without writing all characters of the character set, so that the workload of manual writing is greatly reduced;
3. richness: because a series of random parameters exist in the generation process, the samples generated each time are different, so that the generated samples are very rich, and the method can be directly used by a training program.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the embodiments or technical solutions of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a flow chart of a method for generating handwritten whole-line samples according to an embodiment of the present invention.
FIG. 2 illustrates a font style migration diagram according to an embodiment of the present invention.
Fig. 3a and 3B are sample schematic diagrams illustrating handwritten characters of a sampler a and a sampler B, respectively, according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating a center curve of a character position according to an embodiment of the present invention.
Fig. 5a and 5b respectively show schematic diagrams of a text line cropped from a rendered image according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms "first," "second," and the like in the description and in the claims of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A plurality, including two or more.
And/or, it should be understood that, for the term "and/or" as used in this disclosure, it is merely one type of association that describes an associated object, meaning that three types of relationships may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone.
Examples
1. Learning style of handwritten characters
The generation of handwriting samples is difficult mainly because the number of Chinese characters (or English words) is huge, the number of Chinese characters in common use is about 7000, the number of GB18030 Chinese characters is 27000, the number of expanded Chinese characters added together is 11 thousands, and it takes several months to write a Chinese character library (or English word library) of a full character set. At present, some people at home and abroad have already studied, one person only needs to write a few characters by hand to learn the handwriting style of the person through a neural network, and then a print character image is input to generate a handwriting character image corresponding to the person, which is specifically shown in fig. 2.
1.1 offline handwriting style Single character acquisition
Each person writes a certain number of handwritten single character characters for later learning the font style of the person. The sampling form can be designed as required and sent to the hand of each person who receives the sampler, and after the sampler writes, the sampling form is collected uniformly and marked, mainly the coordinates and the codes of each handwritten character are positioned, then all characters written by each person are cut out, and the characters are stored according to the codes. The sample after writing and cutting is shown in fig. 3a and 3 b.
1.2 learning style of handwritten single character
The handwriting style of each person is learned through a program to obtain a handwriting style model of the person, so that a handwriting character image (such as a regular script) similar to handwriting of the person can be obtained according to the handwriting style model (such as a picture 2) when the character image of the print (such as a regular script) is input.
Many people currently researching style migration, including various languages and characters, are a relatively hot class. Here, the technical solution of the present application adopts the following method:
the user writes some Chinese characters specified by the system, photographs and uploads the Chinese characters to the system, after receiving the text images, the system divides the characters on the sample according to the grids to obtain the character handwriting images of the specified Chinese characters, and the system automatically generates a user handwriting font library within about five hours. Firstly, based on a non-rigid point set registration method and a plurality of heuristic rules (secondary heuristic rules), a writing track of each stroke is extracted from each handwritten character image segmented from an input text image. And then, learning and reconstructing the overall handwriting style of the user by utilizing a neural network, and decomposing the handwriting style of the user into a stroke shape style and a stroke layout style. Meanwhile, handwriting details such as stroke connectivity, outline shape and the like are correctly described and restored. Finally, a complete personal font library can be generated by vectorizing both the manual writing samples and the machine-generated images of all other characters of the handwriting.
By the method, a user only needs to spend 10-30 minutes writing a small number of Chinese characters (no less than 200 characters are recommended) on paper, and the system can automatically generate a handwritten Chinese character library which meets GB18030-2000 national standard and has the writing style of the user by taking a picture and uploading the written Chinese characters.
And collecting the font style migration models of each person together, and establishing a handwritten font style migration model library for later use.
2. Randomly selecting generated data
2.1 selecting a line of text from a corpus
A line of text is selected from the prepared corpus, and different types of corpora can be prepared according to needs, such as science and technology, news, education, agriculture, sports and the like, and the corpus can be further divided into addresses, names, capital amounts, lower case digit strings and the like, and can also be Chinese, english, japanese, korean and the like.
Certain rules can be added to the selected text lines according to needs, so that the requirements of a real scene are met, for example, the number of characters contained in the text lines cannot exceed a certain number, the text lines cannot be all punctuations, and the text lines cannot contain characters except the set character set.
2.2 randomly selecting a background image
In order to produce a real sample, real writing background paper needs to be simulated, so paper background images of various environments need to be prepared, and the paper background images have the following categories:
a common A4 paper background, white being the main color, and also appropriate amounts of prepared red, yellow, etc.;
removing the characters to obtain a clean background image;
various types of document backgrounds, such as passports, identity cards, passports, property cards, and the like;
various types of billing contexts, such as value-added tax invoices, airline travel tickets, quota invoices, and the like;
various bank bill backgrounds, bank invoices, transfer checks, issue application forms, acceptance bills, and the like;
background for various insurance documents, insurance policy, insurance claim sheet, medical invoice, medical document, and so on.
The document background with various formats in life is collected, so that the writing background of the document in the real scene is restored to the maximum extent;
these background images are collected and stored in a background image library, and the program randomly selects a background image from the library to serve as the writing background of the sample in the line in the subsequent processing.
2.3 selecting a handwriting font style model
And randomly selecting a style migration model from a handwritten font style model library.
2.4 generating handwritten images of individual characters in corresponding character strings according to models
And inputting the corresponding print character image by using the selected handwriting style model, and obtaining the corresponding handwriting character image through reasoning.
The style transition model generally has some parameters to control the size of the change of the generated model, so the step adopts real-time generation, and utilizes the randomized parameters to make the handwritten image of the same coded character generated each time different.
If the style migration model does not have parameter control, the handwritten character images of all characters in the character set can be pre-generated, and the step can be directly used for index query according to character codes.
3. Synthesizing a full line of handwritten samples
3.1 generating random parameters related to character rendering
The printed text lines are printed by a printer, basically, other changes are few except the changes of the size, the font, the bolding, the underlining, the italic and the like, and the printed or generated text lines are on the same horizontal line and generally have no great randomness unless the paper is bent and deformed or the shooting generates distortion;
the generation of handwritten text lines requires near-maximum simulation of human-written text, so numerous randomness considerations, such as character size variation, character font distortion, character stroke weight variation, written character angle variation, written character baseline variation, character-to-character gap variation, character stroke color variation, etc., will allow parameterization via parameter variation control.
In practice, the following random parameters were performed:
a) Generating a character position center curve:
when simulating the handwriting of the whole line, the central line of the character is not horizontal or vertical, but has fluctuation and continuity.
Various frequency-dependent functions can be used for production, and a sin function-based generation method is described herein to obtain the offset of the center line, as shown in fig. 4, on a circle with radius R, the center line is rotated at a random angular velocity ω, the starting position of the rotation angle is also random (generally [ 0-2 π ]), and N times of rotation are performed to obtain N consecutive points on the curve, assuming N characters. Corresponding to the right sin curve in fig. 4, the start position is random, the sampling interval is random, the offset value of the center point of a line of text is removed from the curve, the black point in the figure is the offset position sampling point of N (13) characters, and the center point of the written character can be offset according to the value.
b) Randomly generating gaps among characters:
the gaps between characters are different when the characters are written by hands, the character height CharHeight of a handwritten text line can be referred as a reference, and the character distance is randomly selected from [ 0-CharHeight ], but the character distance random output meets the Gaussian distribution:
μ=CharHeight/7
σ=CharHeight/3
c) Character size scaling random generation:
for each character size of style migration output is the same and preset according to the generation network, such as width 64 and height 96, we can perform aspect ratio conversion and character size change on each generated character to obtain a new character height and width, but require that the character deformation not be too large. Here, a new height of a character can be obtained, the height Newheight can be randomly generated within a preset range, then a reasonable aspect ratio AspectRatio can be randomly generated, and the width can be obtained as a new width NewWidth = NewHeight/AspectRatio
And calculating each character to obtain the width and height parameters of each character.
3.2 rendering the digital images sequentially into the background image according to the parameters
Rendering the digital image to the background image randomly selected in the step 2.2 according to the parameters set in the previous step, in the process of generating the font, it can be appointed that the pixel value of the character image which is larger than the threshold value T (T =220 in the implementation of the invention) is the background pixel, filtering directly during rendering, and when the pixel value is smaller than or equal to T, fusing the pixel value of the character image with the background image to highlight the pixel value of the character image.
3.3 cropping handwritten text line images from fused images
Combining all the rendered rectangular frames of the characters into a large circumscribed rectangular frame of the text line, and cutting out the required whole line of text image from the rendered image, as shown in fig. 5a and 5b, so that the center lines of the characters are changed, the intervals of the characters are also changed, and the sizes of the characters are also changed randomly. In other words, the state of the character is random, thereby making the reality higher.
3.4 image transformation of handwritten text line images
Because images are generated, it is possible to simulate for training, but some image processing transformations may be performed on the text line images, such as: image grayscale remapping, image enhancement, noise addition, image deformation (cv 2. Remap), etc. If the deep learning training is performed, the step may be put into pre-processing of the image in the deep learning training instead of being implemented here.
3.5 output Generation of handwritten text line images and truth
And outputting the processed handwritten text line image and the code of the text line character string corresponding to the handwritten text line image to obtain samples (an image and a true value) required in the later period:
one is to save it in the disk for different learning and training programs;
the other method is to directly butt joint the training program, the generating program is used as a learning training program sample generator, and the samples do not fall to the ground.
Here, the true value is text line coding.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one of 8230, and" comprising 8230does not exclude the presence of additional like elements in a process, method, article, or apparatus comprising the element.
The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the foregoing implementation method can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (16)

1. A method for generating handwritten samples of a whole line, the method comprising the steps of:
step 1: learning the handwriting style, learning the handwriting style of a user through a neural network, and establishing a handwriting style migration model library;
and 2, step: randomly generating handwritten character images, randomly selecting a print text and a handwritten style migration model, inputting the selected print text into the handwritten style migration model, and generating handwritten character images of all characters of the selected print text;
and 3, step 3: and synthesizing a whole line of handwritten character sample images, and combining the handwritten character images of all the characters of the generated selected print form text to obtain a whole line of handwritten character sample images.
2. The method for generating handwritten whole-line samples according to claim 1, wherein the step 1 specifically comprises:
step 11: collecting and marking handwriting of a user, cutting out all characters and storing;
step 12: and learning the handwriting style of the user through a neural network according to all the stored characters to obtain a handwriting style migration model of the user, collecting the handwriting style migration models of all the users, and establishing a handwriting style migration model library.
3. The method for generating handwritten whole-line samples of claim 1, wherein the step 2 specifically includes:
step 21: randomly selecting a line of print text from a corpus, and generating print images of all characters in sequence;
step 22: establishing a background image library, and randomly selecting one background image;
step 23: randomly selecting a handwriting style migration model from the handwriting style migration model library;
step 24: and inputting each character image of the print form text line into a randomly selected handwriting style migration model to generate a handwriting character image of each character corresponding to the print form text line.
4. The method of claim 1, wherein in step 24,
if the randomly selected handwriting style migration model has parameter control, randomizing the parameters, generating a handwriting character image of each character of the print text line in real time, and making the handwriting character images of each character different;
if the randomly selected handwritten font style migration model does not have parameter control, the handwritten character images of all characters in the character set are pre-generated in advance, and the step directly carries out index query according to character codes.
5. The method for generating handwritten whole-line samples according to claim 1, wherein said step 3 specifically includes:
step 31: generating random parameters related to character rendering;
step 32: rendering each handwritten character image into the background image according to random parameters in sequence to generate a fusion image;
step 33: clipping a handwritten text line sample image from the fused image;
step 34: and outputting the generated whole line handwritten character sample image and a true value, wherein the true value is text line coding.
6. The method for generating handwritten whole-line samples according to claim 5, wherein said step 31 specifically includes:
step 311: generating a character position center curve;
step 312: randomly generating intervals among the characters;
step 313: the character size is randomly generated.
7. The method for generating a handwritten whole-line sample as claimed in claim 6, wherein said step 311 is specifically:
on a circle with the radius of R, rotating at a random angular speed omega for N times, wherein the initial position of the rotation angle is random, and the sampling interval is random, so that N continuous points on the curve are obtained and serve as offset position sampling points, N is a positive integer and is equal to the number of handwritten characters;
the center points of the N handwritten characters are shifted according to the coordinates of the N shifted position sampling points.
8. The method according to claim 6, wherein the step 312 is specifically:
the space between the characters is randomly generated, and satisfies Gaussian distribution:
μ=CharHeight/7;
σ=CharHeight/3,
wherein CharHeight is the character height.
9. The method according to claim 6, wherein the step 313 specifically comprises:
randomly generating character height Newheight and then randomly generating aspect ratio AspectRatio, wherein the character width NewWidth is as follows: newWidth = NewHeight/AspectRatio;
and calculating each character to obtain the width and height parameters of each character.
10. The method of claim 5, wherein in step 32,
when the pixel value of the handwritten character image is larger than a threshold value T, the handwritten character image is a background pixel and is directly filtered out during rendering;
and when the pixel value of the handwritten character image is less than or equal to the threshold value T, fusing the pixel value with the background image to highlight the pixel value of the handwritten character image.
11. The method of generating handwritten whole line samples according to claim 10, characterized in that said threshold T is a range of values [1,255].
12. The method for generating handwritten whole-line samples according to claim 5, wherein said step 33 is specifically:
and synthesizing all the rectangular boxes of the handwritten characters in the fusion image into a text line external rectangular box and cutting to obtain a handwritten text line sample image.
13. The method for generating handwritten whole-line samples according to claim 5, wherein said step 34 is specifically:
and outputting the handwritten text line sample image and the codes of the corresponding handwritten text line character strings to obtain samples required in the later period, wherein the samples comprise the whole line handwritten character sample image and truth values, namely text line codes.
14. A handwritten whole line sample generation device, which operates based on the handwritten whole line sample generation method according to any one of claims 1 to 13, the device comprising:
the handwriting style learning unit is used for learning the handwriting style of the user through a neural network and establishing a handwriting style migration model library;
a unit for randomly generating handwritten character images, which is used for randomly selecting a print text, a background and a handwritten style migration model and generating handwritten character images of each character of the selected print text;
and the whole line handwritten character sample image synthesizing unit is used for combining the handwritten character images of all the characters of the selected print text to obtain a whole line handwritten character sample image.
15. A handwriting whole line sample generation system, said system comprising: a processor and a memory for storing executable instructions; wherein the processor is configured to execute the executable instructions to perform the handwriting whole line sample generation method of any of claims 1 to 13.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for generating handwritten whole-line samples according to any of claims 1 to 13.
CN202210688488.6A 2022-06-17 2022-06-17 Method and device for generating handwriting whole-line sample Pending CN115331236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210688488.6A CN115331236A (en) 2022-06-17 2022-06-17 Method and device for generating handwriting whole-line sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210688488.6A CN115331236A (en) 2022-06-17 2022-06-17 Method and device for generating handwriting whole-line sample

Publications (1)

Publication Number Publication Date
CN115331236A true CN115331236A (en) 2022-11-11

Family

ID=83915548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210688488.6A Pending CN115331236A (en) 2022-06-17 2022-06-17 Method and device for generating handwriting whole-line sample

Country Status (1)

Country Link
CN (1) CN115331236A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058693A (en) * 2023-10-13 2023-11-14 深圳市上融科技有限公司 Intelligent handwriting recognition method of electromagnetic touch screen

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058693A (en) * 2023-10-13 2023-11-14 深圳市上融科技有限公司 Intelligent handwriting recognition method of electromagnetic touch screen
CN117058693B (en) * 2023-10-13 2024-01-26 深圳市上融科技有限公司 Intelligent handwriting recognition method of electromagnetic touch screen

Similar Documents

Publication Publication Date Title
Altwaijry et al. Arabic handwriting recognition system using convolutional neural network
CN106384094B (en) A kind of Chinese word library automatic generation method based on writing style modeling
Lian et al. EasyFont: a style learning-based system to easily build your large-scale handwriting fonts
CN109871851B (en) Chinese character writing normalization judging method based on convolutional neural network algorithm
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
CN111652332B (en) Deep learning handwritten Chinese character recognition method and system based on two classifications
Krishnan et al. Textstylebrush: transfer of text aesthetics from a single example
Mahmoud et al. Online-khatt: an open-vocabulary database for Arabic online-text processing
CN110969681A (en) Method for generating handwriting characters based on GAN network
CN111581367A (en) Method and system for inputting questions
CN113901952A (en) Print form and handwritten form separated character recognition method based on deep learning
CN111832573A (en) Image emotion classification method based on class activation mapping and visual saliency
CN112446259A (en) Image processing method, device, terminal and computer readable storage medium
CN112381082A (en) Table structure reconstruction method based on deep learning
CN112529989A (en) Image reconstruction method based on bill template
CN112861864A (en) Topic entry method, topic entry device, electronic device and computer-readable storage medium
Lang et al. Transforming Information Into Knowledge: How Computational Methods Reshape Art History.
CN115331236A (en) Method and device for generating handwriting whole-line sample
Balreira et al. Handwriting synthesis from public fonts
Zhang et al. Computational method for calligraphic style representation and classification
Alrasheed et al. Character Recognition Of Seventeenth-Century Spanish American Notary Records Using Deep Learning.
Dai et al. Classification of calligraphy style based on convolutional neural network
CN114372128A (en) Automatic solving method and system for rotationally symmetric geometric volume problem
Weichselbaumer et al. New approaches to ocr for early printed books
Zeng et al. Zero-Shot Chinese Character Recognition with Stroke-and Radical-Level Decompositions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination