CN109522975A - Handwriting samples generation method, device, computer equipment and storage medium - Google Patents

Handwriting samples generation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN109522975A
CN109522975A CN201811084484.7A CN201811084484A CN109522975A CN 109522975 A CN109522975 A CN 109522975A CN 201811084484 A CN201811084484 A CN 201811084484A CN 109522975 A CN109522975 A CN 109522975A
Authority
CN
China
Prior art keywords
text
corpus
handwriting samples
preset
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811084484.7A
Other languages
Chinese (zh)
Inventor
金晨
刘克亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811084484.7A priority Critical patent/CN109522975A/en
Publication of CN109522975A publication Critical patent/CN109522975A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
    • G06V30/2455Discrimination between machine-print, hand-print and cursive writing

Abstract

The invention discloses a kind of handwriting samples generation method, device, computer equipment and storage mediums.The described method includes: obtaining the font file in hand-writing input method;Obtain corpus of text file;Painting canvas is drawn according to preset dimension of picture, and the background color of painting canvas is set;Corpus text is extracted from preset corpus data library, and target font file is chosen from preset fontlib;Using target font file by corpus text conversion be the corresponding handwritten text of target font file;According to canvas size and handwritten text, the text size of handwritten text is determined;According to the text size of handwritten text, handwritten text is drawn on painting canvas, obtains handwriting samples picture;Using handwriting samples picture and corpus text as handwriting samples, it is saved in handwriting samples data set.Technical solution of the present invention improves the collection efficiency of handwriting samples, while can enrich the sample size of handwriting samples, and then effectively improve the recognition accuracy of handwritten text identification model.

Description

Handwriting samples generation method, device, computer equipment and storage medium
Technical field
The present invention relates to field of computer technology more particularly to a kind of handwriting samples generation methods, device, computer equipment And storage medium.
Background technique
In the research to hand-written text identification, need to prepare a large amount of handwriting samples to support handwritten text identification model Model training.
But current handwriting samples are often based on artificially collecting, and it is many kinds of due to handwriting samples, it causes artificial The heavy workload of handwriting samples is collected, meanwhile, also requiring a great deal of time to the handwriting samples being collected into, it is wrong clear to carry out It washes, the limited amount for causing the collection efficiency of handwriting samples low, and artificially collecting, handwritten text can not be supported to identify mould well It is difficult to cause model training, and then influences the recognition accuracy of model for the model training of type.
Summary of the invention
The embodiment of the present invention provides a kind of handwriting samples generation method, device, computer equipment and storage medium, to solve The problem of compiling costs of handwriting samples is high at present, and collection efficiency is low, influences the recognition accuracy of handwritten text identification model.
A kind of handwriting samples generation method, comprising:
The font file in preset hand-writing input method is obtained, and the font file is stored in preset fontlib In;
Corpus of text file is obtained, and the corpus of text file is stored in preset corpus data library;
Painting canvas is drawn according to preset dimension of picture, and the background color of the painting canvas is set;
According to preset selection mode, corpus text is extracted from the preset corpus data library, and preset from described Fontlib in choose target font file;
Using the target font file by the corpus text conversion be the corresponding hand-written text of the target font file This;
According to the canvas size and the handwritten text, the text size of the handwritten text is determined;
According to the text size of the handwritten text, the handwritten text is drawn on the painting canvas, obtains handwriting samples Picture;
Using the handwriting samples picture and the corpus text as handwriting samples, it is saved in preset handwriting samples data It concentrates.
A kind of handwriting samples generating means, comprising:
Font obtains module, protects for obtaining the font file in preset hand-writing input method, and by the font file There are in preset fontlib;
Corpus obtains module, is stored in preset language for obtaining corpus of text file, and by the corpus of text file Expect in database;
For drawing painting canvas according to preset dimension of picture, and the background color of the painting canvas is arranged in painting canvas drafting module;
Module is chosen, for extracting corpus text from the preset corpus data library according to preset selection mode, And target font file is chosen from the preset fontlib;
Conversion module is used to using the target font file be the target font file by the corpus text conversion Corresponding handwritten text;
Size computing module, for determining the text of the handwritten text according to the canvas size and the handwritten text This size;
Synthesis module draws the handwritten text for the text size according to the handwritten text on the painting canvas, Obtain handwriting samples picture;
Preserving module, for being saved in default using the handwriting samples picture and the corpus text as handwriting samples Handwriting samples data set in.
A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned handwriting samples generation method when executing the computer program Step.
A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter The step of calculation machine program realizes above-mentioned handwriting samples generation method when being executed by processor.
In above-mentioned handwriting samples generation method, device, computer equipment and storage medium, on the one hand, default by obtaining Hand-writing input method in font file and obtain corpus of text file, using the corresponding hand of font file in hand-writing input method Write body, and the corpus text extracted from corpus of text file is plotted on painting canvas, obtains handwriting samples picture, realizes automatic Handwriting samples are collected, the collection for manually carrying out handwriting samples is not needed, to reduce the compiling costs of handwriting samples, and improves The collection efficiency of handwriting samples;On the other hand, by the target font file chosen from preset fontlib, and from preset The corpus text extracted in corpus data library by target font file and is expected to be combined between text, uses target font Corpus text conversion is the corresponding handwritten text of target font file by file, includes various differences so as to automatically generate The handwriting samples picture of the handwritten text of hand-written script, so that the collection of handwriting samples has stronger flexibility, it can be very big Abundant handwriting samples sample size, for it is subsequent establish for the training of handwritten text identification model and tuning it is basic, in turn Effectively improve the recognition accuracy of handwritten text identification model.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is a flow chart of handwriting samples generation method in one embodiment of the invention;
Fig. 2 is the process that in handwriting samples generation method handwriting samples expand with processing in one embodiment of the invention Figure;
Fig. 3 is a flow chart of step S8 in handwriting samples generation method in one embodiment of the invention;
Fig. 4 is a flow chart of step S83 in handwriting samples generation method in one embodiment of the invention;
Fig. 5 is a schematic diagram of handwriting samples generating means in one embodiment of the invention;
Fig. 6 is a schematic diagram of computer equipment in one embodiment of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Handwriting samples generation method provided by the present application, can be applicable to server-side, and server-side can specifically use independent clothes The server cluster of business device or multiple servers composition is realized.
In one embodiment, as shown in Figure 1, providing a kind of handwriting samples generation method, details are as follows:
S1: the font file in preset hand-writing input method is obtained, and the font file is stored in preset fontlib In.
Preset hand-writing input method is specially the hand-writing input method that third-party platform provides on the internet, hand-writing input method In include various hand-written scripts font file, server-side presets the hand-writing input method for needing to obtain font file, sets The mode of setting, which can be, provides collocation channel for user, the hand-writing input method needed by user configuration.
Specifically, the font file of server-side hand-written script from the preset hand-writing input method of the Internet download, font text The file format of part includes but is not limited to ttf, otf, ttc etc..It should be understood that font file can have multiple, each font A kind of format of hand-written script of document definition.The font file that downloading obtains is stored in preset fontlib by server-side.In advance If fontlib be used to store the font files of various hand-written scripts.
S2: corpus of text file is obtained, and text corpus file is stored in preset corpus data library.
Specifically, server-side obtains corpus of text file, the content packet of corpus of text file by preset acquisition modes News, novel, story etc. are included but are not limited to, the file format of corpus of text file includes but is not limited to txt, doc etc..
It should be noted that preset acquisition modes specifically can be through crawler software in internet hunt and downloading, Or obtained from preset document data bank, it can also be that other can obtain the acquisition modes of writing text, specifically may be used To be configured according to the needs of practical application, herein with no restrictions.
It should be understood that corpus of text file can have it is multiple.The corpus of text file that server-side will acquire is saved in In preset corpus data library.
Further, server-side classifies to corpus of text file according to preset mode classification, and is tied according to classification Fruit stores classifiedly in corpus data library by corpus of text file according to affiliated classification.
Wherein, preset mode classification can be classifies according to field, for example, the different fields such as chemistry, literature, also It can be and classify according to text attribute, for example, the different attributes such as news, novel, specific mode classification can be according to realities The needs of border application are configured.
S3: painting canvas is drawn according to preset dimension of picture, and the background color of the painting canvas is set.
Specifically, dimension of picture may include long and wide, and unit can be pixel, and preset dimension of picture is handwriting samples The dimension of picture of picture, specific size can be preset according to the needs of practical application, and server-side is according to the figure Chip size draws painting canvas.
Server-side can directly draw the painting canvas of specified size using the functional unit of preset creation painting canvas, and can be into The background color of painting canvas is arranged using the functional unit of preset setting painting canvas color for one step.
It is, for example, possible to use imagecreate () function creation painting canvas in GD2 function library, and The background color of imagecolorallocate () function setup painting canvas.
Assuming that preset dimension of picture is 500*400, i.e. a length of 500 pixel of painting canvas, width is 300 pixels, the back of painting canvas The rgb value of scape color is (211,126,29), then specific implementation is as follows:
$ im=imagecreate (500,400);The painting canvas $ im that one dimension of picture of // creation is 500*400
$ white=imagecolorallocate ($ im, 211,126,29);// setting painting canvas $ im background color be The corresponding color of rgb value (211,126,29).
It should be noted that do not have between step S1, step S2 and step S3 it is inevitable it is successive execute sequence, can be The relationship executed side by side, herein with no restrictions.
S4: according to preset selection mode, extracting corpus text from preset corpus data library, and from preset font Target font file is chosen in library.
Specifically, preset selection mode include the selection mode of corpus of text file, the selection mode of corpus text and The selection mode of font.Wherein, the selection mode of corpus of text file can be random selection corpus of text file, be also possible to According to the corpus of text file for needing to select particular category of model training;The selection mode of corpus text specifically can be at random Select a word, a word, a word or word row of random combine etc.;The selection mode of font, which can be, randomly selects one Kind or multiple fonts are also possible to circulation and choose every kind of font, can also be specific according to the selection of the needs of model training Font.
It should be noted that the selection side of the selection mode of corpus of text file, the selection mode and font of corpus text Formula can be configured according to the needs of practical application, herein with no restrictions.
For example, if the corpus of text file in preset corpus data library is divided according to the classification of affiliated scientific domain Class storage can extract the language that classification is chemistry then when model training is directed to chemical field from preset corpus data library Expect file.
S5: using target font file by corpus text conversion be the corresponding handwritten text of target font file.
Specifically, the target font file that step S4 is obtained can be one or more, and corpus text may be one It is a or multiple, therefore, target font file and corpus text are combined, every kind of combination includes a target font Corpus text conversion in every kind of combination is the target font text in this kind of combination by file and a corpus text The corresponding handwritten text of part.
It should be noted that corpus text refers to that the text of standard letter, handwritten text refer to using target font file The text of writing.
For example, the target font file that step S4 is obtained is 3, corpus text is 10, then the combination one obtained 30 are shared, i.e., each corpus text is converted using 3 kinds of hand-written scripts, and 30 different handwritten texts are obtained.
S6: according to canvas size and handwritten text, the text size of the handwritten text is determined.
In the present embodiment, canvas size is preset dimension of picture in step S3, and the text size of handwritten text is For the size for the rectangular image area that handwritten text occupies, it is possible to understand that, the hand-written script of each character in handwritten text There is font size corresponding relationship, the font size between the size of the rectangular image area of font size and character occupancy Corresponding relationship can be embodied by formula P=f (a, c), wherein a is the hand-written script of character, and c is the font size of character, P For the size for the rectangular image area that character occupies, f is to obtain the character according to the hand-written script and font size of character to occupy Rectangular image area size mapping function.
Specifically, since the corpus text of same font size is when using different hand-written scripts, the rectangle of occupancy The picture size of image-region is not identical, i.e., the text size of handwritten text is different, and therefore, server-side need to be according to canvas size The handwritten text obtained with step S5 determines the text size of the handwritten text, can specifically use the following two kinds mode:
(1) handwritten text obtained according to step S5 counts the character quantity that the handwritten text includes, and combines painting canvas ruler Very little and font size corresponding relationship, calculates the font size of each character in the handwritten text, so that the text of the handwritten text Size is less than or equal to canvas size.
(2) handwritten text obtained according to step S5 counts the character quantity that the handwritten text includes, according to preset first Beginning font size calculates the text size of handwritten text according to character quantity and font size corresponding relationship, if text size Greater than canvas size, then gradually reduce font size according to preset unit on the basis of initial font size, until hand-written Until the text size of text is less than or equal to canvas size.
In (2) kind mode as an example, it is assumed that the corpus text that step S4 is obtained is " process is explored in AI scientific and technical innovation ", statistics The character quantity that the corpus text includes be 10, according to initial font size be No. 64 fonts, calculate " AI scientific and technical innovation explore into The picture size for the rectangular image area that journey " occupies reduces No. 1 font if the picture size is greater than canvas size 36*280 It recalculates, i.e., recalculates the image ruler for the rectangular image area that " process is explored in AI scientific and technical innovation " occupies according to No. 63 fonts It is very little, if 36*280 is still greater than in obtained picture size, continues No. 1 font of reduction and recalculate again, until " AI scientific and technical innovation Until the picture size for the rectangular image area that exploration process " occupies is less than or equal to 36*280.
S7: according to the text size of handwritten text, the handwritten text is drawn on painting canvas, obtains handwriting samples picture.
Specifically, according to the text size of the step S6 handwritten text determined, drawing on the painting canvas of step S3 creation should Handwritten text obtains handwriting samples picture.
It is, for example, possible to use the gdImageStringFT () functions in GD2 function library to carry out handwritten text drafting.
Further, when drawing handwritten text on painting canvas, position and hand of the handwritten text on painting canvas can also be set Write the font color of text.For example, being passed to using the position coordinates of handwritten text and font color as parameter In gdImageStringFT () function, the handwritten text of designated color can be drawn in the designated position of painting canvas.
Preferably, handwritten text is plotted in the center of painting canvas.
S8: it using handwriting samples picture and corpus text as handwriting samples, is saved in preset handwriting samples data set.
It specifically, will be corresponding to the handwriting samples picture that step S7 is obtained and the handwritten text that the handwriting samples picture includes Corpus text as a handwriting samples, i.e., by handwriting samples picture and corpus text correspondence be saved in handwriting samples data set In.
In the present embodiment, on the one hand, by obtaining the font file in preset hand-writing input method and obtaining corpus of text File, using the corresponding hand-written script of font file in hand-writing input method, by the corpus extracted from corpus of text file text Originally it is plotted on painting canvas, obtains handwriting samples picture, realize automatic collection handwriting samples, do not need manually to carry out handwriting samples It collects, to reduce the compiling costs of handwriting samples, and improves the collection efficiency of handwriting samples;On the other hand, by from pre- If fontlib in the target font file chosen, and the corpus text extracted from preset corpus data library, by target word Body file and expect text between be combined, using target font file by corpus text conversion be the target font file pair The handwritten text answered makes so as to automatically generate the handwriting samples picture of the handwritten text comprising various different hand-written scripts The collection for obtaining handwriting samples has stronger flexibility, can greatly enrich the sample size of handwriting samples, is directed to be subsequent The training of handwritten text identification model and tuning establish basis, and then the identification for effectively improving handwritten text identification model is accurate Rate.
In one embodiment, upon step s 2, and before step S3, the handwriting samples generation method further include as Lower step:
According to preset text dictionary, the content of corpus of text file is screened, will not be belonged in corpus of text file It is deleted from text corpus file in the content of text of text dictionary.
In the present embodiment, preset text dictionary is basic text collection, which includes handwritten text The required base text of identification model training, further, base text can classify according to preset mode classification, should Preset mode classification is corresponding with the mode classification of corpus of text file in preset corpus data library.For example, if preset Corpus of text file in corpus data library is stored classifiedly according to the classification of affiliated scientific domain, then in text dictionary Base text is also classified according to the classification of affiliated scientific domain.
Specifically, server-side is to each corpus of text file, whether detects in text corpus file each character in text Exist in this dictionary, and if it exists, then the character is retained in text corpus file, if it does not exist, then by the character from this It is deleted in corpus of text file.
For example, it is assumed that the base text for including in text dictionary includes: 0123456789abcdefghijklmn, if text The content of corpus file are as follows: " one two three 123 four five six abc " then deletes the content " 1 being not present in text dictionary After six ", the content of updated corpus of text file is obtained are as follows: " 123abc ".
In the present embodiment, server-side screens the content of corpus of text file according to preset text dictionary, will be literary The content of text that preset text dictionary is not belonging in this corpus file is deleted from text corpus file, to will be not suitable for Carry out handwritten text identification model training corpus text in advance is deleted from corpus of text file, avoid it is subsequent to be not suitable for into The corpus text of row handwritten text identification model training carries out the building of handwriting samples, is improving the collection efficiency of handwriting samples It is directed to meanwhile, it is capable to more to have when the handwriting samples using building carry out trained handwritten text identification model and tuning Property, to be conducive to improve the recognition accuracy of handwritten text identification model.
In one embodiment, as shown in Fig. 2, after step s8, which further includes to hand-written sample This expansion processing, details are as follows:
S9: being handled hand-written samples pictures according to preset picture effect processing mode, obtains updating picture.
Specifically, server-side is handled hand-written samples pictures according to preset picture effect processing mode, and will The new picture arrived is as the corresponding update picture of the handwriting samples picture.Wherein, preset picture effect processing mode is pair Handwriting samples picture carries out the transformation of graphical effect, or the transformation of font is carried out to the handwritten text in hand-written samples pictures.
For example, picture effect processing mode can be using random Gaussian fuzzy algorithmic approach to hand-written in hand-written samples pictures Text carries out Fuzzy Processing, or carries out angle to the handwritten text in hand-written samples pictures using random angles angled manner Adjustment, or increase watermark, increase frame using to hand-written samples pictures, increase background patterns or change the background of painting canvas Color etc..
It should be noted that picture effect processing mode can be specifically configured according to the needs of practical application, herein With no restrictions.
S10: picture and corpus text will be updated as new handwriting samples, be saved in handwriting samples data set.
Specifically, the corresponding update picture of handwriting samples picture and the handwriting samples picture that server-side obtains step S9 Corresponding corpus text will update picture and corpus text correspondence be saved in handwriting samples data set as new handwriting samples In.
In the present embodiment, hand-written samples pictures are handled to obtain new figure by preset picture effect processing mode Piece, and using new picture corpus text corresponding with the handwriting samples picture as new handwriting samples, it is stored in hand-written sample Notebook data is concentrated, and effective expansion to handwriting samples is realized, to further increase the collection efficiency of handwriting samples.
In one embodiment, as shown in figure 3, in step S9, according to preset picture effect processing mode to handwriting samples Picture is handled, and is obtained updating picture, be specifically comprised the following steps:
S91: the pixel value of each pixel in handwriting samples picture is obtained.
Specifically, according to the size of handwriting samples picture, i.e., preset dimension of picture traverses each in handwriting samples picture Pixel, and obtain the pixel value of each pixel.
S92: N number of pixel is randomly choosed from the image-region where handwritten text in handwriting samples picture, obtains N number of mesh Mark pixel, wherein N is positive integer.
Specifically, position of the handwritten text determined in the step s 7 when drawing handwritten text on painting canvas on painting canvas, Image-region as where handwritten text, server-side randomly choose N number of pixel from the image-region, obtain N number of target picture Vegetarian refreshments.
S93: Gaussian Blur processing is carried out to the pixel value of each target pixel points, obtains the target of each target pixel points Pixel value.
Specifically, according to the pixel value of the obtained each pixel of step S91, each target that step S92 is obtained is determined The pixel value of pixel, and Gaussian Blur processing is carried out to the pixel value of each target pixel points, and after Gaussian Blur is handled Target pixel value of the pixel value as each target pixel points.
Wherein, Gaussian smoothing is also in Gaussian Blur (Gaussian Blur) processing, is commonly schemed in image procossing As effect processing method, it is commonly used to reduce picture noise and reduces level of detail.The Gaussian Blur treatment process of image is exactly Image and normal distribution carry out the process of convolutional calculation, are equivalent to low-pass filter.
It is understood that when the pixel value to target pixel points carries out Gaussian Blur processing, it can be to object pixel The pixel value in tri- channels RGB of point does random Gaussian Fuzzy Processing respectively.
S94: replacing the pixel value of the target pixel points using the target pixel value of target pixel points, obtains updating picture.
Specifically, the pixel value of target pixel points each in handwriting samples picture is revised as step S83 and obtains by server-side The target pixel points target pixel value after, obtained new handwriting samples picture is to update picture.
In the present embodiment, N number of object pixel is randomly choosed from the image-region where handwritten text in handwriting samples picture Point, and Gaussian Blur processing is carried out to the pixel value of each target pixel points, and use obtained target pixel value replacement preimage Element value realizes effective expansion to hand-written samples pictures, also, using the processing of Gaussian Blur to obtain updating picture Mode is changed the display effect of handwritten text, is conducive to the enhancing pair when carrying out the training of handwritten text identification model The training effect of handwritten text identification model, and then improve the recognition accuracy of handwritten text identification model.
In one embodiment, as shown in figure 4, in step S93, Gaussian Blur is carried out to the pixel value of each target pixel points Processing, obtains the target pixel value of each target pixel points, specifically comprises the following steps:
S931: centered on target pixel points, K pixel around target pixel points and target pixel points is constituted into power Weight region, wherein K is positive integer.
Specifically, centered on the position of target pixel points, K pixel around target pixel points is chosen, and by mesh The pixel region of pixel and K pixel composition is marked as weight regions.
For example, 8 pixels adjacent with the target pixel points around target pixel points can be chosen when K is equal to 8, 8 pixels and the target pixel points are constituted to the picture element matrix region of 3*3, the picture element matrix region of the 3*3 is weight Region.
S932: the weighted value of each pixel in weight regions is calculated according to following formula:
Wherein, (xi,yi) be weight regions in ith pixel point position coordinates, P0(xi,yi) it is i-th in weight regions The probability density of a pixel, P (xi,yi) be weight regions in ith pixel point weighted value, σ be preset Gaussian parameter, i ∈[1,K+1]。
Specifically, using the position of target pixel points as coordinate origin, according to pixel each in weight regions and target picture Relative position between vegetarian refreshments determines the position coordinates (x of each pixel in weight regionsi,yi), for example, in weight regions In with the position coordinates of the right adjacent pixel of target pixel points be (1,0).
Using in above-mentioned formulaCalculate pixel (xi,yi) meet normal distribution Probability density, and continue to use in above-mentioned formulaCalculate pixel (xi,yi) weighted value P (xi,yi), so that the sum of weighted value of K+1 pixel is 1.
Wherein, preset Gaussian parameter specifically can be the standard deviation being just distributed very much.
S933: the K that the pixel value of pixel each in weight regions is multiplied with the weighted value of the pixel, and will obtained + 1 product addition obtains the target pixel value of target pixel points.
Specifically, according to the weighted value of each pixel in step S932 determining weight regions, by each pixel Pixel value is multiplied to get to K+1 product with the weighted value of the pixel, and then K+1 obtained product adds up, obtains Cumulative and as target pixel points the target pixel value arrived.
In the present embodiment, centered on target pixel points, by K pixel around target pixel points and target pixel points Point constitutes weight regions, calculates the weighted value of each pixel in weight regions, and by the picture of pixel each in weight regions It after element value is multiplied with the weighted value of the pixel, adds up to K+1 obtained product, adds up and by what is obtained as target The target pixel value of pixel is realized and is adjusted using the processing mode of Gaussian Blur to the pixel value of target pixel points, Make to update the blur effect that picture reaches handwritten text region, be conducive in the training for carrying out handwritten text identification model When, enhance the training effect to hand-written text identification model, and then improve the recognition accuracy of handwritten text identification model.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
In one embodiment, a kind of handwriting samples generating means are provided, the handwriting samples generating means and above-described embodiment Middle handwriting samples generation method corresponds.As shown in figure 5, the handwriting samples generating means include that font obtains module 51, language Material obtains module 52, painting canvas drafting module 53, chooses module 54, conversion module 55, size computing module 56,57 and of synthesis module Preserving module 58.Detailed description are as follows for each functional module:
Font obtains module 51, protects for obtaining the font file in preset hand-writing input method, and by the font file There are in preset fontlib;
Corpus obtains module 52, is stored in preset language for obtaining corpus of text file, and by text corpus file Expect in database;
For drawing painting canvas according to preset dimension of picture, and the background color of painting canvas is arranged in painting canvas drafting module 53;
Module 54 is chosen, for according to preset selection mode, extracting corpus text from preset corpus data library, and Target font file is chosen from preset fontlib;
Conversion module 55 is used to using target font file be the corresponding hand of target font file by corpus text conversion Write text;
Size computing module 56, for determining the text size of the handwritten text according to canvas size and handwritten text;
Synthesis module 57 draws the handwritten text for the text size according to handwritten text on painting canvas, obtains hand-written Samples pictures;
Preserving module 58, for being saved in preset hand-written using handwriting samples picture and corpus text as handwriting samples Sample data is concentrated.
Further, the handwriting samples generating means further include:
Corpus screening module will be literary for being screened to the content of corpus of text file according to preset text dictionary The content of text that text dictionary is not belonging in this corpus file is deleted from text corpus file.
Further, the handwriting samples generating means further include:
Effect process module is obtained for handling according to preset picture effect processing mode hand-written samples pictures To update picture;
Update module is saved in handwriting samples data for that will update picture and corpus text as new handwriting samples It concentrates.
Further, effect process module includes:
Pixel value acquisition submodule, for obtaining the pixel value of each pixel in handwriting samples picture;
Pixel selects submodule, for randomly choosing N from the image-region where handwritten text in handwriting samples picture A pixel obtains N number of target pixel points, wherein N is positive integer;
Fuzzy Processing submodule carries out Gaussian Blur processing for the pixel value to each target pixel points, obtains each The target pixel value of target pixel points;
Pixel value updates submodule, and the pixel of the target pixel points is replaced for the target pixel value using target pixel points Value obtains updating picture.
Further, Fuzzy Processing submodule includes:
Area determination unit is used for centered on target pixel points, around the target pixel points and the target pixel points K pixel constitute weight regions, wherein K is positive integer;
Weight calculation unit, for calculating the weighted value of each pixel in weight regions according to following formula:
Wherein, (xi,yi) be weight regions in ith pixel point position coordinates, P0(xi,yi) it is i-th in weight regions The probability density of a pixel, P (xi,yi) be weight regions in ith pixel point weighted value, σ be preset Gaussian parameter, i ∈[1,K+1];
Target pixel value computing unit, for by the weight of the pixel value of pixel each in weight regions and the pixel Value is multiplied, and the K+1 product addition that will be obtained, and obtains the target pixel value of target pixel points.
Specific about handwriting samples generating means limits the limit that may refer to above for handwriting samples generation method Fixed, details are not described herein.Modules in above-mentioned handwriting samples generating means can fully or partially through software, hardware and its Combination is to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be with It is stored in the memory in computer equipment in a software form, in order to which processor calls the above modules of execution corresponding Operation.
In one embodiment, a kind of computer equipment is provided, which can be server, internal structure Figure can be as shown in Figure 6.The computer equipment includes processor, the memory, network interface sum number connected by system bus According to library.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory of the computer equipment includes Non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing handwriting samples.The network interface of the computer equipment is used to pass through network with external terminal Connection communication.To realize a kind of handwriting samples generation method when the computer program is executed by processor.
In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory simultaneously The computer program that can be run on a processor, processor realize that above-described embodiment handwriting samples generate when executing computer program The step of method, such as step S1 shown in FIG. 1 to step S8.Alternatively, processor realizes above-mentioned reality when executing computer program Apply the function of each module/unit of handwriting samples generating means in example, such as module 51 shown in Fig. 5 is to the function of module 58.For It avoids repeating, details are not described herein again.
In one embodiment, a kind of computer readable storage medium is provided, computer program, computer are stored thereon with Handwriting samples generation method in above method embodiment is realized when program is executed by processor, alternatively, the computer program is located Manage the function that each module/unit in handwriting samples generating means in above-mentioned apparatus embodiment is realized when device executes.To avoid repeating, Details are not described herein again.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of handwriting samples generation method, which is characterized in that the handwriting samples generation method includes:
The font file in preset hand-writing input method is obtained, and the font file is stored in preset fontlib;
Corpus of text file is obtained, and the corpus of text file is stored in preset corpus data library;
Painting canvas is drawn according to preset dimension of picture, and the background color of the painting canvas is set;
According to preset selection mode, corpus text is extracted from the preset corpus data library, and from the preset word Target font file is chosen in body library;
Using the target font file by the corpus text conversion be the corresponding handwritten text of the target font file;
According to the canvas size and the handwritten text, the text size of the handwritten text is determined;
According to the text size of the handwritten text, the handwritten text is drawn on the painting canvas, obtains handwriting samples picture;
Using the handwriting samples picture and the corpus text as handwriting samples, it is saved in preset handwriting samples data set In.
2. handwriting samples generation method as described in claim 1, which is characterized in that in the acquisition corpus of text file, and The corpus of text file is stored in after the step in preset corpus data library, and described according to preset picture Size draws painting canvas, and before the step of background color of the painting canvas is arranged, the handwriting samples generation method further include:
According to preset text dictionary, the content of the corpus of text file is screened, it will be in the corpus of text file The content of text for being not belonging to the text dictionary is deleted from the corpus of text file.
3. handwriting samples generation method as claimed in claim 1 or 2, which is characterized in that described by the handwriting samples figure Piece and the corpus text are saved in after the step in preset handwriting samples data set as handwriting samples, described hand-written Sample generating method further include:
The handwriting samples picture is handled according to preset picture effect processing mode, obtains updating picture;
Using the update picture and the corpus text as new handwriting samples, it is saved in the handwriting samples data set.
4. handwriting samples generation method as claimed in claim 3, which is characterized in that described to be handled according to preset picture effect Mode handles the handwriting samples picture, obtains update picture and includes:
Obtain the pixel value of each pixel in the handwriting samples picture;
Image-region where the handwritten text described in the handwriting samples picture randomly chooses N number of pixel, obtains N number of mesh Mark pixel, wherein N is positive integer;
Gaussian Blur processing is carried out to the pixel value of each target pixel points, obtains the target of each target pixel points Pixel value;
The pixel value that the target pixel points are replaced using the target pixel value of the target pixel points obtains the update figure Piece.
5. handwriting samples generation method as claimed in claim 4, which is characterized in that described to each target pixel points Pixel value carries out Gaussian Blur processing, and the target pixel value for obtaining each target pixel points includes:
Centered on the target pixel points, by K pixel structure around the target pixel points and the target pixel points At weight regions, wherein K is positive integer;
The weighted value of each pixel in the weight regions is calculated according to following formula:
Wherein, (xi,yi) be the weight regions in ith pixel point position coordinates, P0(xi,yi) it is in the weight regions The probability density of ith pixel point, P (xi,yi) be the weight regions in ith pixel point weighted value, σ be preset height This parameter, i ∈ [1, K+1];
The K+ that the pixel value of the pixel each in the weight regions is multiplied with the weighted value of the pixel, and will obtained 1 product addition, obtains the target pixel value of the target pixel points.
6. a kind of handwriting samples generating means, which is characterized in that the handwriting samples generating means include:
Font obtains module, is stored in for obtaining the font file in preset hand-writing input method, and by the font file In preset fontlib;
Corpus obtains module, is stored in preset corpus number for obtaining corpus of text file, and by the corpus of text file According in library;
For drawing painting canvas according to preset dimension of picture, and the background color of the painting canvas is arranged in painting canvas drafting module;
Module is chosen, is used for according to preset selection mode, the extraction corpus text from the preset corpus data library, and from Target font file is chosen in the preset fontlib;
Conversion module, for using the target font file that the corpus text conversion is corresponding for the target font file Handwritten text;
Size computing module, for determining the text ruler of the handwritten text according to the canvas size and the handwritten text It is very little;
Synthesis module is drawn the handwritten text on the painting canvas, is obtained for the text size according to the handwritten text Handwriting samples picture;
Preserving module, for being saved in preset hand using the handwriting samples picture and the corpus text as handwriting samples Write sample data concentration.
7. handwriting samples generating means as described in claim 1, which is characterized in that the handwriting samples generating means are also wrapped It includes:
Corpus screening module, for being screened to the content of the corpus of text file, by institute according to preset text dictionary It states and is not belonging to the content of text of the text dictionary in corpus of text file and is deleted from the corpus of text file.
8. handwriting samples generating means as claimed in claims 6 or 7, which is characterized in that the handwriting samples generating means are also Include:
Effect process module is obtained for handling according to preset picture effect processing mode the handwriting samples picture To update picture;
Update module, for being saved in described hand-written using the update picture and the corpus text as new handwriting samples Sample data is concentrated.
9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to The step of any one of 5 handwriting samples generation method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realizing the handwriting samples generation method as described in any one of claim 1 to 5 when the computer program is executed by processor Step.
CN201811084484.7A 2018-09-18 2018-09-18 Handwriting samples generation method, device, computer equipment and storage medium Pending CN109522975A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811084484.7A CN109522975A (en) 2018-09-18 2018-09-18 Handwriting samples generation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811084484.7A CN109522975A (en) 2018-09-18 2018-09-18 Handwriting samples generation method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN109522975A true CN109522975A (en) 2019-03-26

Family

ID=65771262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811084484.7A Pending CN109522975A (en) 2018-09-18 2018-09-18 Handwriting samples generation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109522975A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135413A (en) * 2019-05-08 2019-08-16 深圳前海达闼云端智能科技有限公司 Method for generating character recognition image, electronic equipment and readable storage medium
CN110263301A (en) * 2019-06-27 2019-09-20 北京百度网讯科技有限公司 Method and apparatus for determining the color of text
CN110866501A (en) * 2019-11-19 2020-03-06 中国建设银行股份有限公司 Training data generation method, data identification method and computer storage medium
CN111275794A (en) * 2019-12-09 2020-06-12 佛山欧神诺云商科技有限公司 Method, device and storage medium for automatically generating expression picture
CN111445545A (en) * 2020-02-27 2020-07-24 北京大米未来科技有限公司 Text-to-map method, device, storage medium and electronic equipment
CN111612871A (en) * 2020-04-09 2020-09-01 北京旷视科技有限公司 Handwritten sample generation method and device, computer equipment and storage medium
CN111860389A (en) * 2020-07-27 2020-10-30 北京易真学思教育科技有限公司 Data processing method, electronic device and computer readable medium
CN112183296A (en) * 2020-09-23 2021-01-05 北京文思海辉金信软件有限公司 Simulated bill image generation and bill image recognition method and device
CN112183020A (en) * 2020-10-26 2021-01-05 阳光保险集团股份有限公司 Multi-font sample synthesis method and device, electronic equipment and storage medium
CN112861471A (en) * 2021-02-10 2021-05-28 上海臣星软件技术有限公司 Object display method, device, equipment and storage medium
CN113095167A (en) * 2021-03-25 2021-07-09 北京有竹居网络技术有限公司 Image acquisition method, device and equipment
CN114202762A (en) * 2022-02-18 2022-03-18 城云科技(中国)有限公司 Handwritten sample generation method and device and application
CN114598893A (en) * 2020-11-19 2022-06-07 京东方科技集团股份有限公司 Text video implementation method and system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034973A (en) * 2012-12-05 2013-04-10 焦点科技股份有限公司 Self-adaptive image scaling method based on bicubic interpolation
CN105677646A (en) * 2014-11-17 2016-06-15 北京大学 Word stock generation method and system, and server
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN107644006A (en) * 2017-09-29 2018-01-30 北京大学 A kind of Chinese script character library automatic generation method based on deep neural network
CN108388560A (en) * 2018-03-17 2018-08-10 北京工业大学 GRU-CRF meeting title recognition methods based on language model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034973A (en) * 2012-12-05 2013-04-10 焦点科技股份有限公司 Self-adaptive image scaling method based on bicubic interpolation
CN105677646A (en) * 2014-11-17 2016-06-15 北京大学 Word stock generation method and system, and server
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN107644006A (en) * 2017-09-29 2018-01-30 北京大学 A kind of Chinese script character library automatic generation method based on deep neural network
CN108388560A (en) * 2018-03-17 2018-08-10 北京工业大学 GRU-CRF meeting title recognition methods based on language model

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135413A (en) * 2019-05-08 2019-08-16 深圳前海达闼云端智能科技有限公司 Method for generating character recognition image, electronic equipment and readable storage medium
CN110263301A (en) * 2019-06-27 2019-09-20 北京百度网讯科技有限公司 Method and apparatus for determining the color of text
CN110263301B (en) * 2019-06-27 2023-12-05 北京百度网讯科技有限公司 Method and device for determining color of text
CN110866501B (en) * 2019-11-19 2022-04-29 中国建设银行股份有限公司 Training data generation method, data identification method and computer storage medium
CN110866501A (en) * 2019-11-19 2020-03-06 中国建设银行股份有限公司 Training data generation method, data identification method and computer storage medium
CN111275794A (en) * 2019-12-09 2020-06-12 佛山欧神诺云商科技有限公司 Method, device and storage medium for automatically generating expression picture
CN111275794B (en) * 2019-12-09 2023-07-11 佛山欧神诺云商科技有限公司 Method, device and storage medium for automatically generating expression picture
CN111445545A (en) * 2020-02-27 2020-07-24 北京大米未来科技有限公司 Text-to-map method, device, storage medium and electronic equipment
CN111445545B (en) * 2020-02-27 2023-08-18 北京大米未来科技有限公司 Text transfer mapping method and device, storage medium and electronic equipment
CN111612871A (en) * 2020-04-09 2020-09-01 北京旷视科技有限公司 Handwritten sample generation method and device, computer equipment and storage medium
CN111860389A (en) * 2020-07-27 2020-10-30 北京易真学思教育科技有限公司 Data processing method, electronic device and computer readable medium
CN112183296A (en) * 2020-09-23 2021-01-05 北京文思海辉金信软件有限公司 Simulated bill image generation and bill image recognition method and device
CN112183020A (en) * 2020-10-26 2021-01-05 阳光保险集团股份有限公司 Multi-font sample synthesis method and device, electronic equipment and storage medium
CN114598893A (en) * 2020-11-19 2022-06-07 京东方科技集团股份有限公司 Text video implementation method and system, electronic equipment and storage medium
CN114598893B (en) * 2020-11-19 2024-04-30 京东方科技集团股份有限公司 Text video realization method and system, electronic equipment and storage medium
CN112861471A (en) * 2021-02-10 2021-05-28 上海臣星软件技术有限公司 Object display method, device, equipment and storage medium
CN113095167A (en) * 2021-03-25 2021-07-09 北京有竹居网络技术有限公司 Image acquisition method, device and equipment
CN114202762A (en) * 2022-02-18 2022-03-18 城云科技(中国)有限公司 Handwritten sample generation method and device and application

Similar Documents

Publication Publication Date Title
CN109522975A (en) Handwriting samples generation method, device, computer equipment and storage medium
CN109933756A (en) Image based on OCR turns shelves method, apparatus, equipment and readable storage medium storing program for executing
CN111401371B (en) Text detection and identification method and system and computer equipment
CN109493400B (en) Handwriting sample generation method, device, computer equipment and storage medium
US10191889B2 (en) Systems, apparatuses and methods for generating a user interface by performing computer vision and optical character recognition on a graphical representation
CN111428457B (en) Automatic formatting of data tables
CN109117760B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN112734641A (en) Training method and device of target detection model, computer equipment and medium
CN109522898A (en) Handwriting samples picture mask method, device, computer equipment and storage medium
CN109189390B (en) Method for automatically generating layout file and storage medium
CN108898092A (en) Multi-spectrum remote sensing image road network extracting method based on full convolutional neural networks
CN111208998A (en) Method and device for automatically laying out data visualization large screen and storage medium
CN115424101A (en) Disease identification method, device, equipment and storage medium
CN105808682A (en) Relational graph display method and apparatus
CN107122785B (en) Text recognition model establishing method and device
CN114820157A (en) Decision tree model-based pmml file editing method, device, equipment and medium
JP2021182441A (en) Method for processing image, device, apparatus, medium, and program
CN114332895A (en) Text image synthesis method, text image synthesis device, text image synthesis equipment, storage medium and program product
CN114092938A (en) Image recognition processing method and device, electronic equipment and storage medium
CN116610304B (en) Page code generation method, device, equipment and storage medium
CN106709490A (en) Character recognition method and device
CN113255767A (en) Bill classification method, device, equipment and storage medium
CN111709338A (en) Method and device for detecting table and training method of detection model
CN113283432A (en) Image recognition and character sorting method and equipment
CN113936187A (en) Text image synthesis method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination